DEV Community

Toni Antunovic
Toni Antunovic

Posted on • Originally published at lucidshark.com

Project Glasswing Found 35 CVEs in March. Here Is the Quality Gate You Need Before AI Agents Touch Your Codebase.

This article was originally published on LucidShark Blog.


In January 2026, Anthropic's Project Glasswing found 6 real CVEs in production software using AI-driven vulnerability research. In February, that number climbed to 15. In March, it hit 35.

These are not theoretical findings. They are confirmed, submitted, acknowledged vulnerabilities in codebases that millions of developers depend on. Glasswing is finding them faster than any human security team can patch them.

The implication that the AI security community has been slow to say out loud: if an AI system can find 35 zero-days per month in production software, then AI-generated code, written at scale, shipped without local quality gates, is the most attractive attack surface on the internet right now.

This post is about what you do about that on your end, before your code ships.

⚠️
The Numbers: Project Glasswing's CVE discovery rate grew 483% from January to March 2026 (6 to 35 per month). The acceleration curve is not slowing. Security researchers expect this capability to be commoditized and available to threat actors within 18 months.

What Project Glasswing Actually Does

Glasswing is Anthropic's internal AI security research system. Unlike traditional static analysis tools, it does not match patterns. It reasons about code semantics: what is the intent of this function, what assumptions does it make about its inputs, and where do those assumptions break down under adversarial conditions?

The system uses a multi-agent pipeline: one agent reads documentation and builds a threat model, a second agent explores the codebase with structured shell access (similar to how N-Day-Bench works, which appeared on Hacker News this week with 86 points), and a third agent scores and validates findings.

The reason Glasswing finds more vulnerabilities than traditional SAST tools is not raw intelligence. It is the combination of semantic reasoning with the ability to explore cross-file and cross-service data flows that rule-based tools cannot follow. A SQL injection that passes through three helper functions before reaching the database is invisible to a simple grep. Glasswing follows the taint.

The Attack Surface That Glasswing Reveals

Here is the uncomfortable inference. Every CVE Glasswing finds is a class of vulnerability that:

  • Existed in code written by professional developers who were trying to write secure code

  • Was not caught by existing SAST tools, peer review, or CI/CD pipelines

  • Is now discoverable by an AI system in hours

AI coding agents generate code at 10-100x the velocity of a solo developer. They make the same classes of mistakes as human developers because they were trained on human code. The difference is volume. A developer who introduces one logic flaw per 500 lines of code, running at 100x velocity, introduces 100 logic flaws per 500 lines.

The quality gate that was barely sufficient for human velocity is nowhere near sufficient for agent velocity.

The Core Insight: Glasswing's capability is offense-side validation that the vulnerability classes it finds are real, discoverable, and exploitable. Your defense needs to catch those same classes before they reach production. The gap between "agent wrote it" and "Glasswing found it" is your attack window.

The Five Checks That Close the Gap

These are not theoretical. They are the checks that catch the specific vulnerability classes that appear most frequently in Glasswing's disclosed findings.

1. Semantic Taint Tracking for Injection Flaws

Glasswing finds SQL injection, command injection, and path traversal by following data flow from user input to dangerous sinks. Your SAST setup should do the same. Semgrep's taint mode handles this for most languages:

# .semgrep/taint-injection.yml
rules:
 - id: user-input-to-sql-sink
 mode: taint
 pattern-sources:
 - pattern: request.args.get(...)
 - pattern: request.form.get(...)
 - pattern: request.json.get(...)
 pattern-sinks:
 - pattern: db.execute(...)
 - pattern: cursor.execute(...)
 - pattern: $CONN.execute(...)
 pattern-sanitizers:
 - pattern: sqlalchemy.text(...)
 message: "Unsanitized user input reaches SQL sink"
 languages: [python]
 severity: ERROR
Enter fullscreen mode Exit fullscreen mode

Run this as a pre-commit check. Every commit from your AI coding agent gets taint analysis before it touches your branch.

2. Authentication Bypass Pattern Detection

A consistent finding class in Glasswing disclosures is authentication checks that can be bypassed through type confusion, parameter pollution, or logic inversions. The AI agent that wrote the auth check was not malicious. It was probabilistic. The check that looks right in isolation fails under adversarial input.

# Common auth bypass patterns an agent generates
# Pattern: checking truthy value instead of strict equality
if user_role: # WRONG: any non-empty role passes
 allow_access()

if user_role == "admin": # RIGHT: explicit check
 allow_access()

# Semgrep rule to catch the pattern
rules:
 - id: weak-auth-truthy-check
 pattern: |
 if $VAR:
 $ALLOW(...)
 pattern-where:
 - metavariable-regex:
 metavariable: $VAR
 regex: ".*(role|auth|admin|permission|access).*"
 message: "Possible weak auth check: $VAR is truthy but not compared to expected value"
Enter fullscreen mode Exit fullscreen mode

3. Secrets in Scope at Commit Time

AI agents frequently pull credentials into scope for convenience, then commit them. Glasswing has disclosed vulnerabilities that were directly enabled by hardcoded credentials in AI-generated scaffolding code. This is the simplest check and the one teams skip most often.

# Install once, runs forever
pip install detect-secrets
detect-secrets scan --all-files > .secrets.baseline

# Add to .pre-commit-config.yaml
- repo: https://github.com/Yelp/detect-secrets
 rev: v1.4.0
 hooks:
 - id: detect-secrets
 args: ['--baseline', '.secrets.baseline']
Enter fullscreen mode Exit fullscreen mode

The baseline file is checked in. New secrets trigger a failure. Existing (approved) patterns are ignored. Zero false positives for secrets your team has explicitly reviewed.

4. Dependency Vulnerability Scanning at Install Time

Glasswing's vulnerability research often reveals that a disclosed CVE has been silently present in a popular library for months. Your AI coding agent, running npm install or pip install autonomously, does not check whether the version it is installing has known vulnerabilities.

# npm: audit on every install
echo "audit=true" >> .npmrc
echo "audit-level=moderate" >> .npmrc

# Python: pip-audit as pre-commit hook
- repo: https://github.com/pypa/pip-audit
 rev: v2.7.3
 hooks:
 - id: pip-audit
 args: [--strict, --require-hashes]

# Or run inline before agent sessions
pip-audit -r requirements.txt --format json | \
 python3 -c "import json,sys; d=json.load(sys.stdin); \
 vulns=[v for dep in d for v in dep['vulns']]; \
 [print(f'VULN: {v[\"id\"]} in {dep[\"name\"]}') for dep,_ in \
 [(dep,dep['vulns']) for dep in d] for v in dep['vulns']]; \
 sys.exit(1) if vulns else None"
Enter fullscreen mode Exit fullscreen mode

5. Coverage Threshold Enforcement

This one surprises people. Why is test coverage a Glasswing-relevant check?

Because Glasswing finds vulnerabilities in code paths that are never exercised by the existing test suite. An AI agent that generates code with no test coverage has created unvalidated surface area. That unvalidated code is statistically where the vulnerabilities live.

Enforcing a coverage threshold does not make code secure. It makes unvalidated code impossible to ship silently.

# pytest with coverage threshold
pytest --cov=src --cov-fail-under=80 --cov-report=term-missing

# In pyproject.toml
[tool.coverage.report]
fail_under = 80
show_missing = true

# In your MCP tool config (Claude Code / LucidShark)
{
 "tools": {
 "run_tests": {
 "command": "pytest --cov=src --cov-fail-under=80",
 "on_failure": "block_commit"
 }
 }
}
Enter fullscreen mode Exit fullscreen mode

Putting It Together: The Pre-Commit Stack

These five checks run in sequence on every commit your AI coding agent produces. Together they take under 20 seconds on a typical project. You configure them once. They run forever.

# .pre-commit-config.yaml
repos:
 - repo: https://github.com/Yelp/detect-secrets
 rev: v1.4.0
 hooks:
 - id: detect-secrets
 args: ['--baseline', '.secrets.baseline']

 - repo: https://github.com/returntocorp/semgrep
 rev: v1.70.0
 hooks:
 - id: semgrep
 args: ['--config', '.semgrep/', '--error']

 - repo: https://github.com/pypa/pip-audit
 rev: v2.7.3
 hooks:
 - id: pip-audit

 - repo: local
 hooks:
 - id: pytest-coverage
 name: pytest with coverage
 entry: pytest --cov=src --cov-fail-under=80
 language: system
 pass_filenames: false
Enter fullscreen mode Exit fullscreen mode

The Semgrep config directory holds your taint rules and auth bypass patterns. Everything else is off-the-shelf tooling wired together.

The Local-First Principle: Every check in this stack runs on your machine, not in a cloud service. This matters for two reasons. First, your code does not leave your environment before you have decided it is safe to share. Second, these checks run whether or not your CI/CD provider is having an outage. The April 13 Claude Code outage that generated multiple "Tell HN" posts this week is a reminder that cloud dependency is a reliability risk, not just a privacy risk.

How This Relates to What Glasswing Finds

Glasswing is finding vulnerabilities in production software written by professional developers using conventional tooling. The five checks above do not make your code Glasswing-proof. No static analysis does. But they do close the specific vulnerability classes that appear most frequently in AI-generated code:

  • Injection flaws (caught by taint tracking)

  • Auth bypass (caught by pattern detection)

  • Credential exposure (caught by secrets scanning)

  • Known-vulnerable dependencies (caught by SCA)

  • Untested surface area (bounded by coverage thresholds)

Glasswing's findings are also a calibration signal. When a new class of vulnerability appears in Glasswing disclosures, you can write a Semgrep rule for it and add it to your local config. The offense-side research becomes your defense-side ruleset.

⚠️
The Velocity Problem: AI coding agents generate code faster than human code review can process it. The math does not work in favor of manual review at agent velocity. Automated local checks are not a nice-to-have. They are the only mechanism that scales to the rate at which agents produce output.

The Broader Picture

Project Glasswing's CVE acceleration curve is the clearest evidence yet that AI-powered vulnerability research is approaching a capability threshold. The security community has known for years that the offense/defense balance was tilting toward attackers. Glasswing is the quantified proof.

The defensive response is not to stop using AI coding agents. The response is to build quality gates that match the velocity at which agents produce output. Local, automated, fast, blocking.

The code gets written by agents. The gates still need a human to design and an automated system to enforce.


Start with LucidShark: LucidShark provides the pre-commit pipeline and MCP tool integration described above, wired together and ready to run against Claude Code and other AI coding agents. It is open source under Apache 2.0 and runs entirely locally. No cloud service, no per-seat pricing, no data leaving your machine.

Install: lucidshark.com or npx lucidshark init in any project directory.

Top comments (0)