Zero-day discovery at machine speed: what changes when AI does the bug hunting
CybersecurityNews reports threat actors automating zero-day discovery and exploitation at machine speed. From the offensive seat: a working pentester walks through what AI-augmented vuln research actually looks like in production, why short-term offense advantages will flip to long-term defensive economics within 5 years, and the 7-step priority list every defender should run in the next quarter. Includes specifics on Anthropic's Mythos / Glasswing program, GitHub Advanced Security with Copilot Autofix, Snyk DeepCode, Semgrep Pro, and what to do if your CI pipeline still doesn't run AI-driven SAST in 2026.
Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.
# Zero-day discovery at machine speed: what changes when AI does the bug hunting
The question isn't whether AI agents can find zero-days anymore. CybersecurityNews reported this morning that threat actors are doing it in production, at machine speed, against real targets. The question is whether your defensive program is built for a threat model where the time from CVE-public to weaponized PoC is measured in hours, not weeks.
For most defenders, the honest answer is no.
I'll get to what actually changes for the working defender, and what to do about it. But first — let me describe what "machine-speed zero-day discovery" actually looks like from the operator's seat, because I've been running it on the offensive side for most of 2026 and the gap between the threat model and the popular discourse is enormous.
What an AI-augmented vuln researcher does in a day
Setup, my own from this morning:
- One open-source target, picked by checking GitHub for repos matching
language:typescript stars:>500 archived:falsewith low recent commit activity. Maintenance gaps create vuln gaps. The list is 140 candidates. - Claude Code, locally, with a CLAUDE.md instructing it to walk each repo, identify auth-relevant code paths, and flag any pattern matching the OWASP Top 10 plus a custom checklist of LLM-assisted-app footguns.
- Cursor open in the side, for fast diff review on whatever Claude Code surfaces.
- A scratch directory with each candidate's git history exported as JSON, so the AI can diff between versions and flag where security-sensitive code changed (regressions are how 80% of real bugs get reintroduced).
Total runtime to triage 140 repos and produce a priority list of seven worth manual review: 3 hours and 14 minutes. Of those seven, four had a real flaw (authentication bypass via JWT alg confusion, SSRF in a webhook handler, prototype pollution in a query parser, NoSQL injection in a MongoDB filter passthrough). One I'd already known about from prior work. Three were genuinely new.
That's three previously-unknown vulnerabilities — let's not call them zero-days because they aren't in mass-deployed software, but they're the same class of finding — in three hours of mostly-passive supervision. Pre-AI, that workflow was a week per single repo, and most of the week was reading code that turned out to be irrelevant. The AI eats irrelevance for breakfast.
Now imagine the same workflow run by:
- A nation-state with 200 GPUs and a frontier model API budget.
- A mid-size ransomware operation that's figured out how to chain Claude or GPT against a fleet of patched-vs-unpatched binary diffs from Microsoft Patch Tuesday.
- A bug bounty hunter (ethical version of the same thing) who's willing to put 80 hours/week on a single program for 18 months.
The math changes. Vulnerability discovery used to be a manual artisan-craft labor pool of maybe 5,000 capable practitioners worldwide. The AI multiplier doesn't just make those 5,000 more productive — it pulls in another 50,000 who would never have learned binary analysis or formal-methods code review but are perfectly capable of supervising an LLM that does the boring parts.
The defensive economics changed in 2026. Most defenders haven't internalized that yet.
Where the asymmetry actually breaks
The narrative I see most often goes "AI is good for offense AND defense, so it's a wash." That's wrong, and it's wrong in a specific way that matters.
In the short term (next 12-18 months) AI heavily favors offense. Three reasons:
- The pipeline length is asymmetric. Offense is: find bug → write exploit → use it. Defense is: find bug → coordinate with vendor → vendor patches → release patch → defenders test → defenders deploy. The offense path is 1 step at machine speed. The defense path is 5 steps with at least 3 of them gated on humans. Even if the defense AI is just as good as the offense AI, the offense gets to skip the human bottlenecks.
- Patches are public templates. Microsoft Patch Tuesday's diffs ARE the offensive intel. Run a frontier model over each patch, get a diff explanation in plain English, ask "now write me an exploit for the unpatched version" — done. This was possible in 2024 with significant skill. In 2026 it's possible with significant patience. The AI handles the skill part.
- Detection is rate-limited by humans. AI can produce 1,000 high-confidence findings per day. Triaging 1,000 findings still takes a human SOC analyst at least 5 minutes each, and most teams have one analyst for every 10,000 alerts. The funnel constricts at human attention even when AI fills the funnel.
In the long term (2-5 years) AI favors defense. Three reasons:
- Defenders own the code. The same model the attacker uses to find your bug, you can run on your own codebase 24/7 in CI. Done right, every PR gets reviewed by an AI vuln-hunter before it lands. The asymmetry of "attacker has to find the bug, defender knows where the code is" inverts and works for defense for the first time.
- Continuous beats episodic. Manual pen-tests run quarterly or annually. AI-driven SAST runs on every commit. The frequency multiplier is 100-1000x. Bugs introduced in code get caught the day they're committed instead of the day after a breach.
- Auto-patching becomes possible. "Find the bug, write a patch, open a PR" is one prompt for a frontier coding model. The bottleneck is the human-review pipeline for the patch, but that's a problem of process, not capability. Companies that lean into AI patch-suggestion gain a multi-week reduction in dwell time.
The transition from short-term offense advantage to long-term defense advantage isn't automatic. It happens for organizations that invest in continuous AI-augmented defense in the next 12 months. It does NOT happen for organizations that wait for the threat to mature before reacting. By the time you're reacting, the asymmetry has become the steady state and you're paying for an extra two-year gap.
What's actually deployed today, on both sides
Let me name names so this isn't theoretical.
On offense, in production, today:
- Claude Mythos / Anthropic's Glasswing program — the legitimate side. Anthropic has partnered with AWS, Apple, Cisco, CrowdStrike, JPMorgan, Microsoft, Palo Alto Networks, NVIDIA, Linux Foundation, Broadcom, and SpaceX (per the announcement). The agreement is reportedly to run a frontier model with elevated cyber capabilities on partners' infrastructure to find vulnerabilities, then disclose them through Anthropic's responsible-disclosure pipeline. This is a billion-dollar program, and the published number of pre-disclosure 0-days the Mythos preview has found across major OS and browser code bases is "thousands."
- Multiple known-state operators are running fine-tuned open-weight models (Llama-3-70B, Qwen-2.5-Coder, DeepSeek-Coder-V3) for code review and exploit generation, per several public threat-intel reports. The fine-tunes lower model refusals and add domain-specific code patterns. This is sub-frontier capability but cheap to scale.
- Bug bounty platforms report a steep increase in low-effort high-volume submissions over the last year, attributed to AI-assisted hunters. HackerOne and Bugcrowd are both actively researching how to triage AI-generated reports without losing the legitimate signal.
On defense, in production, today:
- GitHub Advanced Security with Copilot Autofix rewrites vulnerable code automatically and opens a PR. Released GA in 2024, in widespread use across enterprise GitHub orgs.
- Snyk DeepCode and Semgrep Pro ship LLM-augmented SAST. Better signal than rule-based SAST, fewer false positives, comparable or higher catch rate.
- Anthropic's own internal tooling runs Claude against the codebase continuously. They've publicly described the workflow at developer events — every PR runs through a security-focused Claude before review.
- Most enterprise SOCs are still on rule-based SIEMs without LLM augmentation. This is the gap.
What to do, in the order that matters
If you're a defender at a small-to-mid org, here's the priority list. None of these are speculative; they're all deployable in the next quarter.
1. Add AI-driven SAST to CI. Pick one: Semgrep Pro, Snyk DeepCode, GitHub Advanced Security. Wire it to fail PRs on findings above medium severity. The friction is real for the first 2 weeks while you tune false positives. After that, it's a permanent reduction in dwell time.
2. Make Patch Tuesday a hard SLA. If you're not deploying Microsoft + Google + Adobe + browser patches within 7 days of release, you're already in the offensive AI's sweet spot. The patch IS the exploit roadmap. 7-day SLA is achievable for most fleets if you commit. Some shops do same-day. Pick a number, write it down, hold to it.
3. Continuous AI-driven internal pentest. Either hire it (we do it; so do a half-dozen good firms) or build it (Claude Code or Cursor pointed at your repo, with a custom prompt instructing it to red-team). Run it monthly minimum, weekly for high-risk apps. The cost is small relative to what you save in incident response when the same AI catches the bug 3 weeks before an attacker would have.
4. Threat-model for AI-assisted attackers specifically. If your auth flow has any feature designed to slow down a human attacker (CAPTCHAs, rate limits tuned for human typing speed, security questions a human would have to research), that defense is now defeated by GPT-4-level capability. Re-threat-model assuming the attacker is supervising 50 parallel exploitation attempts at machine speed.
5. Invest in detection signal that survives AI scale. Behavioral baseline detection (UEBA), strong network egress controls (an AI-assisted attacker still has to exfiltrate data, and that's still detectable by network anomaly), and identity-centric authorization (zero-trust, session-aware authorization) all hold up under AI-scale pressure. Signature-based detection alone does not.
6. Train your humans on AI-assisted defensive work. Your analysts need to be using the same tools the attackers are. Not because they'll write exploits but because the cognitive lift of "what would an AI agent try here?" only develops with practice. Buy your team Claude Pro or Cursor subscriptions. Make them use it on real work. Watch them get faster at the things that matter.
7. Subscribe to the threat intel feeds that matter. Mandiant, Recorded Future, and Anthropic's own threat intelligence releases (which have been excellent, particularly on agentic-AI threat patterns) are now must-reads. The signal-to-noise ratio is high.
Where Valtik fits
I run Valtik because I think the defensive side of AI-augmented security is enormously under-resourced relative to offensive capability — and because I'm a working pentester who's used Claude on every engagement for the last 14 months and wants to bring that capability to small and mid-sized clients who can't afford a $200K/year senior security engineer.
If you're an SMB and you've read this far thinking "we don't even have basic CI, let alone AI-driven SAST," that's exactly who I work with. Three-tier engagements: $500 baseline external audit, $1,500 platform-specific deep dive (Supabase, Clerk, Auth0, Vercel, AWS — pick your stack), $3,500 full-stack with continuous monitoring add-on. I do the work, I write the report, I tell you the truth about your exposure in plain English.
The Cyber Verification Program at Anthropic is something I'm pursuing because legitimate independent practitioners need adjusted access to frontier models for real defensive work. If you're another small firm in the same boat — defender side, want to use AI capabilities for real engagements, hitting the standard policy guardrails — I want to hear from you. The only way the long-term defense advantage materializes is if the defensive practitioner community ships the same caliber of AI tooling the attackers are already shipping.
The honest bottom line
I don't think the sky is falling for defenders. I think the next 12 months are a window where the offensive side has a real edge, and the orgs that move first on continuous AI-augmented defense will have a durable advantage when the long-term defensive economics come around. The orgs that don't move will end up in a 5-year period of accelerated breach disclosures.
Choose now, not when the headline is your own incident.
If you want a 30-minute call to figure out where to start — no sales pitch, just a conversation about your stack and your real risk — email phil@valtikstudios.com with "AI threat model" in the subject. I'll respond within a day.
Move now.
Want us to check your AI setup?
Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.
