Delafosse Olivier

Posted on Mar 20 • Originally published at coreprose.com

How Claude Opus 4 6 Found 22 Firefox Vulnerabilities In 2 Weeks

#ai #machinelearning #llm #programming

Originally published on CoreProse KB-incidents

Anthropic and Mozilla ran a live‑fire experiment: could an AI model find serious, previously unknown vulnerabilities in one of the most tested browsers on earth?

In a focused two‑week sprint in January 2026, Claude Opus 4.6 uncovered 22 new Firefox vulnerabilities, all confirmed by Mozilla and assigned CVEs. Fourteen were rated high severity, seven moderate, and one low, with most fixes shipped to hundreds of millions of users in Firefox 148.[5][6][8]

💡 Why this matters: Those 14 high‑severity bugs represent almost a fifth of all high‑severity Firefox vulnerabilities remediated in 2025, concentrated into a single AI‑augmented engagement.[5][6][8] That discovery rate forces security leaders to rethink how software will be attacked—and defended—over the next few years.

1. What Anthropic and Mozilla Actually Achieved

Instead of testing Claude Opus 4.6 on synthetic benchmarks, Anthropic embedded it into a real security partnership with Mozilla, giving it targeted access to the Firefox codebase as a production‑grade testbed.[6][8]

Over two weeks in January 2026, Claude identified 22 previously unknown Firefox vulnerabilities that Mozilla triaged, confirmed, and assigned CVEs.[5][6][8]

14 high‑severity
7 moderate
1 low[5][8]

📊 Impact in context

22 new Firefox CVEs in 14 days
14 high‑severity issues—almost 20% of all high‑severity Firefox bugs patched in 2025[3][5][6][8]
More Firefox vulnerabilities reported in February 2026 by Claude than in any single month of 2025 from all sources combined[6][7][9]

Mozilla shipped fixes for most of these vulnerabilities in Firefox 148, with the rest scheduled for upcoming releases, showing that AI‑found bugs can move rapidly from discovery to remediation at internet scale.[5][6][8]

💼 Strategic takeaway: In a codebase that has been fuzzed, audited, and hardened for decades, one AI‑augmented sprint matched a substantial fraction of a full year’s human‑driven high‑severity discovery.[3][5][6][9] For most enterprise software, the untouched backlog is likely much larger.

2. How Claude Opus 4.6 Actually Found the Bugs

Claude’s first major win came quickly: within 20 minutes, it identified a use‑after‑free vulnerability in Firefox’s SpiderMonkey JavaScript engine, later confirmed and patched by Mozilla.[3][5][8] That early success evolved into a systematic pipeline.

flowchart LR
A[Firefox codebase] --> B[Claude Opus 4.6]
B --> C[Crash inputs & hypotheses]
C --> D[Human validation & VMs]
D --> E[Bugzilla reports]
E --> F[Mozilla triage]
F --> G[Firefox patches]
style B fill:#22c55e,color:#fff
style G fill:#22c55e,color:#fff

Over the two‑week engagement, Claude:[3][5][8]

Scanned nearly 6,000 C++ files
Generated dozens of crashing inputs during early triage
Contributed to 112 unique bug reports
Proposed candidate patches that Mozilla engineers sometimes used as starting points[3][6][9]

💡 Quality, not just quantity

Mozilla engineers noted that, unlike most low‑quality AI bug reports, Claude’s submissions typically included:[6][8][9]

Minimized test cases
Detailed, step‑by‑step proofs of concept
Candidate fixes mapped to specific files and functions

This sharply reduced validation workload, letting the Firefox security team reproduce and assess issues far faster than with typical external reports.[6][8][9]

Some vulnerabilities overlapped with issues reachable by existing fuzzers, while others were new classes of logic errors that fuzzing had failed to expose—even in heavily fuzzed code paths.[3][6][9]

⚠️ Key implication: If Firefox, one of the most continuously fuzzed and reviewed browser codebases, still harbored this many serious issues, then typical enterprise applications—with less rigorous testing—almost certainly contain a larger, AI‑discoverable vulnerability backlog.[3][6][8][9]

3. What the Firefox Results Signal for Software Security

Anthropic positions the Firefox collaboration as evidence that modern AI models can independently identify high‑severity vulnerabilities in mature, complex software at speeds that exceed traditional techniques.[6][9][10]

Mozilla’s data supports this: Claude’s 22 CVEs in February 2026 exceeded the monthly vulnerability count for any month in 2025, across all human and automated sources.[6][7][9] PCMag noted that Claude effectively found more high‑severity bugs in two weeks than human teams typically uncover over much longer periods.[9][10]

📊 Beyond Firefox

Anthropic reports that, beyond this project, Claude has surfaced more than 500 zero‑day vulnerabilities in other well‑tested open‑source software, focusing on complex, security‑sensitive components.[6][8]

Mashable notes that open‑source projects are particularly well‑suited to AI analysis because models can correlate:[2][4][6]

Full source code
Rich version history
Historical CVEs and patches

That combination lets AI learn patterns of insecure coding and configuration that static tools miss.[2][4][6]

Mozilla engineers observed that Claude’s Firefox findings fell into two categories:[3][6][9]

Bugs overlapping with fuzzing‑accessible paths
Novel logic and state‑handling errors beyond fuzzer coverage

💡 Mini‑conclusion: AI analysis is emerging as a powerful complement to fuzzing, SAST, and manual review—not a replacement. Security programs that orchestrate these methods together will gain the most from AI‑accelerated discovery.[3][6][9][10]

4. Limits, Exploit Tests, and Dual‑Use Concerns

After confirming the 22 CVEs, Anthropic asked a harder question: could Claude also weaponize its own discoveries? Researchers provided Claude with details of the new Firefox bugs and asked it to craft practical exploits, including attempts to read and write local files on a target system, emulating a real attacker.[2][5][9]

flowchart TB
A[Discovered CVEs] --> B[Claude exploit attempts]
B --> C[Task verifier]
C --> D{Working exploit?}
D -->|No| B
D -->|Yes| E[Exploit sample]
style C fill:#f59e0b,color:#000
style E fill:#ef4444,color:#fff

Despite several hundred exploit‑generation trials and about $4,000 in API credits, Claude produced reliable, end‑to‑end exploits in only two cases.[5][8][9] One targeted CVE‑2026‑2796, a JIT miscompilation in Firefox’s JavaScript WebAssembly component.[5][9]

Mashable highlighted this asymmetry: Claude was highly effective at finding vulnerabilities but comparatively weak at automating full exploit chains, suggesting that—for now—AI is more beneficial for defense than for fully automated offensive operations.[2][4][5]

⚠️ But not purely defensive

InfoQ stresses the dual‑use reality: with enough steering and iteration, Anthropic did obtain working exploits for some bugs.[5][8][9] The same discovery capabilities that help defenders can also accelerate attacker workflows.

Anthropic situates the Firefox work within its Frontier Red Team efforts and policy commitments, arguing that:[6][8][9]

AI‑assisted research should go through responsible disclosure channels
Partnerships with maintainers (like Mozilla) are essential to keep net impact positive
Safety controls such as task verifiers and rate limits must evolve with capability

Security commentators expect that as models improve, adversaries will adopt similar tools for vulnerability triage and exploit research, making time‑to‑patch and continuous monitoring even more critical risk metrics.[5][9][10]

5. How Security Teams Can Operationalize These Lessons

The Firefox experiment is not a stunt. Anthropic is productizing its techniques through Claude Code Security, a service that scans code for vulnerabilities and suggests targeted fixes for human review.[6][8][10]

💼 Partnership as a pattern

The Mozilla engagement showcased an operating model security teams can emulate:[6][8]

Tight collaboration between AI researchers and maintainers
Shared criteria for what counts as a reportable bug
Rapid patch deployment once issues are confirmed

This pattern can be replicated inside enterprises by pairing AI tools with in‑house AppSec engineers and clear triage rules.

flowchart LR
A[Codebase] --> B[AI security scan]
B --> C[Findings list]
C --> D[Human triage]
D --> E[Patch dev]
E --> F[CI/CD & release]
F --> G[Monitoring]
style B fill:#22c55e,color:#fff
style D fill:#0ea5e9,color:#fff

Given that Firefox is far more fuzzed and reviewed than typical enterprise applications, Anthropic’s results imply that internal codebases—especially legacy C/C++ and complex JavaScript—are prime candidates for AI‑assisted review.[3][5][8]

Practical first steps for organizations include:[6][8][9]

Targeting high‑risk components (parsers, auth flows, memory‑unsafe modules) for AI‑assisted audits
Using Claude‑style tools to generate minimized test cases and candidate patches
Integrating AI findings into CI pipelines and secure coding playbooks

⚠️ Triage remains essential

AI‑generated bug reports can still include false positives or low‑impact issues. The Firefox case underlines the need for:[6][8][9][10]

A human triage layer staffed by experienced security engineers
Severity scoring aligned with business risk
Governance that treats AI as an accelerator, not a replacement, for secure development practices

Mini‑conclusion: Mozilla’s experience suggests that embedding AI into established security workflows can dramatically expand coverage and speed without forcing wholesale changes to governance or release processes.[6][9][10]

Conclusion: A Blueprint for AI‑Augmented Defense

The Anthropic–Mozilla experiment shows that Claude Opus 4.6 can uncover high‑severity vulnerabilities in a world‑class, heavily tested browser at speeds humans cannot match: 22 Firefox CVEs, including 14 high‑severity issues, found in two weeks and rapidly patched for hundreds of millions of users.[3][5][6][8][9]

Security leaders should treat this as a blueprint. Pilot AI‑assisted code review on your most critical applications. Embed model findings into existing triage and patch workflows. Establish strong disclosure channels with vendors and open‑source maintainers. As AI makes vulnerability discovery cheaper and faster for everyone—including adversaries—organizations that operationalize these capabilities now will be positioned to benefit before attackers do.[5][6][9][10]

Sources & References (10)

1Anthropic’s Claude found 22 vulnerabilities in Firefox in two weeks Anthropic’s Claude found 22 vulnerabilities in Firefox in two weeks. Fourteen of the security vulnerabilities detected were classified as high risk....

2Claude AI discovered 22 Firefox flaws. Here's how many it figured out how to exploit. Claude AI discovered nearly two dozen vulnerabilities in Firefox, the Mozilla web browser.

Anthropic teamed up with Mozilla to test the security of its browser, allowing its AI tool to probe for vuln...3How Claude Opus 4.6 Discovered 22 CVEs in the World’s Most Tested Browser Twenty minutes.

That’s how long it took Claude Opus 4.6 to find a Use-After-Free vulnerability in Firefox’s SpiderMonkey JavaScript engine. By the time Anthropic’s researchers had validated and filed...4Claude AI discovered 22 Firefox flaws. Here's how many it figured out how to exploit. Claude AI discovered nearly two dozen vulnerabilities in Firefox, the Mozilla web browser.

Anthropic teamed up with Mozilla to test the security of its browser, allowing its AI tool to probe for vuln...5Anthropic Finds 22 Firefox Vulnerabilities Using Claude Opus 4.6 AI Model Anthropic on Friday said it discovered 22 new security vulnerabilities in the Firefox web browser as part of a security partnership with Mozilla.

Of these, 14 have been classified as high, seven have...6Partnering with Mozilla to improve Firefox’s security Policy

Partnering with Mozilla to improve Firefox’s security

Mar 6, 2026

AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Cla...7Opus 4.6 found 22 vulnerabilities in Firefox in two weeks Opus 4.6 found 22 vulnerabilities in Firefox in two weeks

Blog post: Partnering with Mozilla to improve Firefox’s security: https://www.anthropic.com/news/mozilla-firefox-security...8[Anthropic Claude Opus AI model discovers 22 Firefox bugs ](https://securityaffairs.com/189131/ai/anthropic-claude-opus-ai-model-discovers-22-firefox-bugs.html)Pierluigi Paganini · March 09, 2026

Anthropic used Claude Opus 4.6 to identify 22 Firefox vulnerabilities, most of which were high severity, all of which were fixed in Firefox 148, released in Januar...9AI Model Discovers 22 Firefox Vulnerabilities in Two Weeks Mar 19, 2026 — by Steef-Jan Wiggers

Recently, Claude Opus 4.6 found 22 security vulnerabilities in Firefox in just two weeks. Fourteen earned high-severity classifications, which is almost 20% of all...- 10Anthropic's Claude Finds More Bugs in Firefox Than Human Teams | PCMag As more and more industries wake up to the threat of AI-based automation, new data from browser maker Mozilla shows that AI is proving proficient at identifying cybersecurity vulnerabilities in popula...

Generated by CoreProse in 1m 56s

10 sources verified & cross-referenced 1,424 words 0 false citationsShare this article

X LinkedIn Copy link Generated in 1m 56s### What topic do you want to cover?

Get the same quality with verified sources on any subject.

Go 1m 56s • 10 sources ### What topic do you want to cover?

This article was generated in under 2 minutes.

Generate my article 📡### Trend Radar

Discover the hottest AI topics updated every 4 hours

Explore trends

About CoreProse: Research-first AI content generation with verified citations. Zero hallucinations.

🔗 Try CoreProse | 📚 More KB Incidents

DEV Community