AnthropicinfoUpdated 2026-04-17orig. 2026-03-298 min

Claude Mythos 2 Preview: What Anthropic Just Shipped for Cybersecurity

Anthropic's April 2026 preview of Claude Mythos 2 claims breakthrough autonomous vulnerability research. We dig into what it actually does, what it does not, and what it means for pentest firms, bug bounty programs, and the 0-day market.

Phillip (Tre) Bucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.

# Claude Mythos 2 preview: what Anthropic just shipped for cybersecurity (and the March leak that forced their hand)

The history of "autonomous hacking agent" demos is mostly glorified scan runners. Every AI security vendor for the last three years has promised an agent that finds novel vulnerabilities without human guidance. Most of those demos collapse on close inspection. Anthropic's April 2026 Claude Mythos 2 preview deserves closer attention because the published benchmarks are a step-function different from what prior entrants delivered.

The April 14 preview went out to AWS, Cisco, CrowdStrike, Palo Alto Networks, and a short list of pentest firms in the new Cyber Verification Program. But here's the part most coverage missed: the model was leaked first. On March 26, 2026, Fortune reported that researchers Roy Paz (LayerX Security) and Alexandre Pauwels (University of Cambridge) had discovered an unsecured, publicly-searchable Anthropic CMS datastore exposing roughly 3,000 unpublished assets, including a draft blog post describing the then-unreleased model. The leaked draft called the model "Capybara" internally and described it as positioning above Opus as a new top tier. Fortune informed Anthropic, who locked the datastore, but by then the cat was out of the bag. The preview announcement three weeks later was in part a response to the leak timeline.

Then on March 31, Anthropic had a second incident: the full source code for Claude Code shipped as a 59.8 MB source map bundled into the public @anthropic-ai/claude-code v2.1.88 npm package. ~513K lines of unobfuscated TypeScript across 1,906 files, including feature flags for unreleased capabilities, got mirrored to GitHub and forked tens of thousands of times within hours. Two major SaaS misconfiguration / build-packaging incidents at the same company in under a week. Not a great look during a capability announcement cycle.

Anthropic's stock-exposed investors and enterprise customers felt the leak aftermath too. The Mythos leak moved price chatter around Anthropic-exposed holdings and triggered a wave of "is AI security maturing?" sell-side notes. Worth remembering when you're watching any vendor's big model launch: the announcement you read is rarely the first time the claims were public.

This post works from the published benchmarks, the leaked draft content, the announcement language, and comparison to prior Anthropic behavior on cybersecurity claims, and calls out what's probably real versus what's marketing.

Anthropic's claims are strong. "Autonomously chained Linux kernel vulnerabilities to achieve local privilege escalation." "Discovered thousands of novel security issues across major operating systems and browsers." This post covers what that actually means, what the limits are, and what it changes if you run a security program or a pentest firm.

What Anthropic is claiming

From the April 2026 announcement:

"Breakthrough cyber capability". Mythos 2 autonomously finds, understands, and exploits security vulnerabilities in complex codebases.
Discovered 595 previously unknown crashes across 1,000 open source repositories during an internal evaluation.
Chained multiple Linux kernel vulnerabilities into a working privilege escalation proof-of-concept in a controlled evaluation environment.
Identified a 17-year-old memory corruption bug in FreeBSD's NFS client code that was assigned CVE-2026-4747.
Capable of reading entire codebases, producing security reviews, and writing exploit code for the findings it surfaces.

Anthropic frames this as a safety-positive capability. Defenders get the benefit, the model refuses to operate against unauthorized targets. And a tightly-scoped partner program controls access.

What it's doing

Claude Mythos 2 is a frontier language model with an agent harness wrapped around it. The harness gives the model tool access to:

A sandboxed shell for running compiled code
A fuzzer harness (almost certainly something in the libFuzzer / AFL family)
Source code search tools (probably ripgrep + structural search)
Git, to navigate history and bisect
A patch-generation and verification loop

The model coordinates these tools to run what looks like an unusually patient and thorough human researcher's workflow. Read code, form hypotheses, test hypotheses, instrument the code, fuzz it, analyze crashes, construct minimal reproducers, attempt exploitation.

This isn't new research in the "novel algorithm" sense. Google's OSS-Fuzz has been running continuous fuzzing against open source for years. Mayhem, Zellic, Oasis, and academic continuous-integration fuzzing projects have all claimed thousands of bugs. What changes with Mythos 2 is the quality of the *reasoning* wrapped around the fuzzing. The agent is demonstrably better at:

Knowing where to point the fuzzer
Writing targeted harnesses that reach deep code paths
Triaging crashes into reachable-from-attacker versus benign
Producing minimal reproducers and suggested patches
Explaining findings in language a developer can act on

The Linux kernel exploitation claim

The "autonomously chained Linux kernel vulnerabilities" claim got the most press. Read carefully, the Anthropic materials say:

The evaluation environment was a controlled Linux VM with a known set of installed packages.
The agent was told the goal was root on the machine.
The agent identified three separate memory-safety bugs, produced exploits for them, and chained them.

What it did NOT claim:

The bugs were new. Anthropic hasn't disclosed whether these were previously-known CVEs or novel discoveries.
The chain was reliable across kernel configurations. Controlled-VM exploits often break across real-world targets.
Other agents couldn't do this. The lab setting may favor the Mythos harness specifically.

Still a meaningful capability. A model that can be handed a compiled system and produce a working local-root exploit, even a fragile one, in a controlled environment is well past what was public 18 months ago. The relevant question for defenders is: can attackers with similar compute and tooling do the same, against real systems they care about?

CVE-2026-4747: the 17-year-old FreeBSD NFS bug

The specific CVE mentioned is real and was disclosed April 10, 2026 per the FreeBSD Security Advisory FSA-2026:06.nfsclient. The bug is a stack buffer overflow in the client-side NFS_ATTRBIT_V_MASK parsing, reachable via a malicious NFS server responding with crafted attribute data. It had been sitting in the FreeBSD tree since around 2009.

Anthropic credits Claude Mythos 2 for identifying the issue. Whether the issue was "discovered" by the model versus surfaced from the model's training data is a distinction the Anthropic materials don't clarify. Either way, the fix shipped, the CVE was assigned, and FreeBSD users should update (FreeBSD 13.4-RELEASE-p5 or 14.2-RELEASE-p2).

What this means for pentest firms

The near-term impact on professional pentesting work is narrower than the press cycle suggests.

Mythos-class tools will commoditize parts of the work that were already commoditized.

Automated scanning and the "did you check the obvious" parts of a pentest have been trending toward automation for years. Nuclei, OWASP ZAP, commercial DAST/SAST, and newer AI-assisted entries like ProjectDiscovery's agents already cover this space. Mythos makes the automated coverage better and broader. But the ceiling on what automation catches is the same ceiling scanners have always hit: context, business logic, chained vulnerabilities across systems, and the specific authorization mistakes a particular organization makes.

The economic impact lands on low-tier bug bounty and cheap pentest shops.

Bug bounty triage already grapples with LLM-generated submissions. Low-quality, plausible-sounding reports that fail reproduction. Mythos-class agents make the low-end either better (agents that verify before submitting) or worse (more agents submitting unverified). Platforms like HackerOne and Bugcrowd are investing heavily in automated triage to handle the volume.

The cheap "run Nessus and write up the PDF" pentest market will compress hard. Buyers who only need compliance checkbox pentests can use AI-assisted tooling directly or retain a firm that wraps it in a report.

Real offensive security work still needs humans.

Business logic flaws, authorization chains specific to an application, the IDOR that requires understanding of workflow semantics, the OAuth misconfiguration that depends on knowing what the organization's partners are doing, the deployment pipeline abuse that chains across four different SaaS vendors. This work doesn't reduce to codebase fuzzing. Mythos makes a good assistant for these problems but the synthesis is the hard part.

Pentest firms that do real work will stay busy.

The regulatory curve is pushing hard in the direction of required annual penetration testing. PCI DSS 4.0 requires it. HIPAA's proposed Security Rule update would. CMMC 2.0 practically requires it. NYDFS 23 NYCRR 500 explicitly mandates it. These requirements all specify industry-accepted methodology with manual human work. Clients can't hand this to an AI and claim compliance. A firm delivering real human-led testing with AI-augmented tooling is positioned well.

What this means for security buyers

Your attack surface didn't get smaller.

Every useful defensive AI capability is available to attackers. If Mythos can find bugs, attacker-operated equivalents can find bugs. If the defensive side has a legitimate Mythos license and the 0-day broker has a cheaper bootleg agent, the asymmetry favors attackers who don't respect authorization.

The 0-day market got more liquid.

Brokers like Zerodium historically pay $2M for full iOS zero-click chains, $500K for iMessage exploits, $100K-$1M for browser sandbox escapes. If agents make it cheaper to produce those chains, prices drop and supply increases. More supply at lower prices means more targeted attacks.

Defensive priorities should emphasize what AI is worst at.

Network segmentation that assumes compromise (lateral movement is still hard even for agents)
Identity-centric access control with continuous verification (Zero Trust proper, not VPN-with-identity)
Phishing-resistant MFA everywhere (agents write great phishing emails but FIDO2 is still unphishable)
Backup and recovery capability (ransomware is faster with agents. Recovery time is your survivable window)
Assume any automated scan available to defense is in adversary hands too

Vendors who won't talk about what they run.

Ask your vendors whether they use AI agents in their security programs, and whether those agents have access to customer data. Many do. Some have disclosed it. Many haven't. This is a new vendor risk management question that didn't exist in 2023 questionnaires.

Anthropic's Cyber Verification Program

Not much public information on this yet. Based on the April 2026 preview materials and conversations from partners:

Intended to verify pentest firms and cybersecurity consultancies as authorized Mythos-class tool users.
Verified firms will get access to models tuned for offensive security work without the refusals Anthropic applies to general consumer access.
Initial cohort includes AWS, Palo Alto Networks, CrowdStrike, and other Fortune 500 security partners.
Upcoming cohort opens to pentest firms with documented track records. Exact criteria not yet published.

If you run a pentest firm, signing up for the verification program when it opens broadly is worth your time. If the program concentrates agent access among verified firms, it becomes a moat against the "AI replaced pentesting" narrative. Verified firms have the tools, unverified firms don't.

The defensive AI counterpart

The same model architecture that does offensive security research does defensive work:

Code review for security issues during PR review (GitHub Copilot Autofix, Semgrep Assistant, Snyk Code Fix)
Incident response triage and playbook execution (emerging)
Threat hunting and log analysis (Crowdsight, Chronicle)
Configuration review for infrastructure-as-code (Checkov, Trivy, Semgrep IaC)

Defenders should be running agents on their own environments continuously. The only defensive deployment that fails here is the one that assumes attackers don't have the same capability.

Where to watch this

Anthropic's Claude Mythos landing page: https://www.anthropic.com/glasswing
CVE-2026-4747 FreeBSD NFS: FreeBSD Security Advisory FSA-2026:06
Upcoming Anthropic Cyber Verification Program announcement expected Q3 2026
OpenAI's equivalent capability (different vendor, similar arc)
Google Project Zero public disclosures for agent-assisted findings

Hire Valtik Studios

We run penetration tests and compliance readiness engagements for clients who need human-led expert work augmented by modern tooling. Not vendor demos dressed up as pentests. We track what tools work, test them against our lab environments, and deploy the ones that produce real findings faster. If you need a pentest firm that's going to exploit what it finds in 2026, not run a scanner and call it security, get in touch.

Reach us at valtikstudios.com.

Sources

anthropicclaude mythosai securityvulnerability researchautonomous agentsthreat intelligence

Want us to check your Anthropic setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Free Security Check Request Full Audit

Get new research in your inbox

No spam. No newsletter filler. Only new posts as they publish.

Claude Mythos 2 Preview: What Anthropic Just Shipped for Cybersecurity

#What Anthropic is claiming

#What it's doing

#The Linux kernel exploitation claim

#CVE-2026-4747: the 17-year-old FreeBSD NFS bug

#What this means for pentest firms

#Mythos-class tools will commoditize parts of the work that were already commoditized.

#The economic impact lands on low-tier bug bounty and cheap pentest shops.

#Real offensive security work still needs humans.

#Pentest firms that do real work will stay busy.

#What this means for security buyers

#Your attack surface didn't get smaller.

#The 0-day market got more liquid.

#Defensive priorities should emphasize what AI is worst at.

#Vendors who won't talk about what they run.

#Anthropic's Cyber Verification Program

#The defensive AI counterpart

#Where to watch this

#Hire Valtik Studios

#Sources