DEV Community

Toni Antunovic
Toni Antunovic

Posted on • Originally published at lucidshark.com

The Georgia Tech CVE Data Shows AI Code Tools Have a Volume Problem

This article was originally published on LucidShark Blog.


In March 2026, Georgia Tech's Vibe Security Radar published a dataset that should be required reading for every security team whose developers are using AI coding tools. The numbers: 35 CVEs filed that month with credible attribution to AI-generated code origin. Of those, 27 were traced back to Claude Code output specifically.

Before we dig into what the data means, a brief note on methodology. Georgia Tech's attribution approach combines code similarity analysis, commit metadata (including the AI tool signatures that modern IDEs embed in commits), and in some cases direct developer attestation. It is not perfect. The 27/35 Claude Code figure reflects Claude Code's dominant market share in the agentic coding segment as much as it reflects any particular failure mode specific to Claude. But the total count is what matters most, and 35 CVEs in a single month with credible AI-origin attribution is not a rounding error.

Warning: The 27/35 figure reflects market share as much as tool-specific risk. Claude Code currently dominates agentic coding workflows, so its outsized representation in CVE data is partially expected. What is not expected, and what demands attention, is the absolute volume acceleration.

The Volume Problem Is Different From the Quality Problem

Most discussions about AI code security focus on quality: AI-generated code contains more vulnerabilities per 1,000 lines than human-written code, AI models hallucinate APIs, AI skips edge cases. These are real concerns, and they are documented. But they miss the more operationally urgent problem.

The volume problem works like this. A developer using Claude Code ships roughly 3 to 5 times as much code per sprint as the same developer without it. If the vulnerability rate per line stays constant, the absolute number of vulnerabilities in the codebase grows by the same factor. If the vulnerability rate is even modestly higher for AI-generated code (which the evidence suggests it is), the compounding is worse.

Security teams are not staffed to handle a 3x to 5x increase in code review volume. They were not staffed adequately for the previous volume. The gap between code production rate and security review capacity was already widening before AI coding tools became mainstream. Those tools accelerated the divergence to a point where human-only review is no longer a viable primary control.

Note: This is not a criticism of AI coding tools. It is a description of a staffing and process mismatch that the tools have made impossible to ignore. The tools are faster than the security review process they were added on top of.

What the CVE Data Actually Shows

Looking at the vulnerability categories in the Georgia Tech dataset, a clear pattern emerges. The AI-attributed CVEs are not randomly distributed across vulnerability types. They cluster in three categories:

  • Authorization failures: Missing object-level access checks, broken function-level authorization, cross-tenant data exposure. These account for roughly 40% of the AI-attributed CVEs in the March dataset.

  • Injection vulnerabilities: SQL injection via string interpolation, OS command injection, LDAP injection. These account for roughly 30%.

  • Secrets and credential exposure: Hardcoded API keys, tokens committed to version control, credentials in log output. These account for roughly 20%.

The remaining 10% is a mix of insecure deserialization, path traversal, and miscellaneous logic errors.

This distribution is not surprising to anyone who has reviewed AI-generated code carefully. Authorization logic requires understanding the full data ownership model of the application. AI models generate authorization checks that work for the happy path but fail when the request comes from a different user, tenant, or role than the one the model assumed when generating the code. SQL injection via string interpolation happens because the model produces working code faster by interpolating variables directly, and the developer does not notice or does not fix it. Secrets get hardcoded because the model was shown an example with a real key and replicated the pattern.

The Grep Test: How Detectable Are These CVEs?

Here is the uncomfortable part of the Georgia Tech data. When the researchers applied basic static analysis rules to the repositories where the CVEs were found, a significant majority of the vulnerabilities were detectable before they were exploited. The authorization failures showed patterns like direct parameter use in database queries without ownership verification. The injection vulnerabilities showed string interpolation in SQL contexts. The secrets showed entropy patterns consistent with API keys.

Let's make this concrete. The most common injection pattern in the AI-attributed CVEs looks like this:

# Pattern found in AI-generated code: direct f-string interpolation in SQL
async def get_user_orders(user_id: str, status: str):
    query = f"SELECT * FROM orders WHERE user_id = '{user_id}' AND status = '{status}'"
    return await db.execute(query)

Enter fullscreen mode Exit fullscreen mode

This is detectable with a simple grep rule. The fix is straightforward:

# Correct: parameterized query
async def get_user_orders(user_id: str, status: str):
    query = "SELECT * FROM orders WHERE user_id = $1 AND status = $2"
    return await db.execute(query, user_id, status)

Enter fullscreen mode Exit fullscreen mode

The authorization pattern is slightly more complex but still rule-detectable:

# AI-generated pattern: fetches resource without checking ownership
async def get_document(doc_id: str, current_user: User):
    doc = await db.documents.find_one({"_id": doc_id})
    if not doc:
        raise HTTPException(status_code=404)
    return doc  # Missing: ownership check against current_user.id

# Correct pattern:
async def get_document(doc_id: str, current_user: User):
    doc = await db.documents.find_one({"_id": doc_id, "owner_id": current_user.id})
    if not doc:
        raise HTTPException(status_code=404)
    return doc

Enter fullscreen mode Exit fullscreen mode

A static analysis rule that flags "find_one with _id but without owner_id or user_id in the filter" would have caught this class of vulnerability. Not all of them. Not the ones with unusual ownership field names. But a meaningful fraction.

Warning: Static analysis is not a complete solution. These tools catch pattern-based vulnerabilities reliably but miss logic errors that require understanding business context. The Georgia Tech data suggests roughly 60 to 70% of the AI-attributed CVEs were pattern-detectable. That still leaves 30 to 40% that require human review or more sophisticated analysis.

Why Teams Are Not Running These Checks

If these vulnerabilities are detectable, why are they making it to production and into CVE databases? A few reasons come up repeatedly when talking to security engineers at affected organizations.

CI pipelines are misconfigured or under-scoped. Many teams have SAST tools in their CI pipeline but have tuned them aggressively to reduce false positives. The tuning that eliminated noisy alerts also eliminated some of the signal. Rules that would catch the AI-specific patterns were disabled because they generated too many false positives on the old codebase.

Pre-commit hooks are absent or optional. The fastest feedback loop is a pre-commit check that runs before code ever leaves the developer's machine. Many teams do not have pre-commit hooks configured at all, or they are optional and developers bypass them. By the time a vulnerability surfaces in CI, context-switching cost is high and there is social pressure to merge.

Volume desensitizes reviewers. When every PR is large because an AI assistant generated it, reviewers start pattern-matching at the structural level rather than reading the code. This is documented in cognitive load research on code review. The authorization checks that are missing are the kind of thing that a fatigued reviewer skips because the surrounding code looks correct.

AI-specific patterns are not in the ruleset. Most SAST configurations were written before AI coding tools were in widespread use. The rules target historical vulnerability patterns in human-written code. The patterns that AI models produce systematically, like the authorization ownership-check omission, are not in the default rulesets of most tools.

What the Appropriate Response Looks Like

The Georgia Tech data points toward three concrete changes that security-conscious teams should make.

Add AI-specific SAST rules. The authorization and injection patterns that cluster in AI-generated code are rule-encodable. Tools like semgrep support custom rule authoring. Writing rules specifically targeting the patterns that AI models produce systematically is a tractable project for a security team that has reviewed the CVE data.

Move checks left, to the local environment. Pre-commit hooks that run SAST, secrets scanning, and dependency audits on every commit are the fastest feedback loop available. The developer sees the issue before they push, before code review, before CI. Fix cost at this stage is near zero. Local tooling that integrates directly into the development workflow, rather than running remotely in CI after a push, changes the feedback latency from minutes to seconds.

Treat AI code differently in review. This does not mean reviewing AI-generated code more slowly, which is not sustainable given volume. It means reviewing it differently: focus on authorization boundaries, data ownership checks, and anywhere the model would have needed business context it did not have. Automate the pattern-based checks so human attention is reserved for the logic questions.

Note: The Georgia Tech researchers have indicated they will publish monthly updates to the Vibe Security Radar dataset. The March 2026 data is a baseline. Whether the April numbers show improvement will depend on whether the developer tools community has started treating this as a systems problem rather than a tool quality problem.

Volume Is the Variable That Changed

The conversation about AI code quality tends to get stuck on whether AI-generated code is better or worse than human-written code at some average quality level. That framing misses the operational reality. The security risk from AI coding tools is not primarily about the per-line vulnerability rate. It is about the multiplication of production code volume against a security review function that has not scaled.

Thirty-five CVEs in one month with credible AI attribution is the number that should reframe the conversation. Not because AI tools are uniquely dangerous, but because they have made the gap between code production and security review visible and undeniable in a way that it was not before.

The response has to be automated and local-first. Remote, asynchronous security checks running in CI are too slow and too easy to work around. The analysis needs to run where the code is being written, on every save or commit, with results that are immediate and actionable.

LucidShark runs that analysis locally. It integrates directly with Claude Code via MCP, checks every file your AI assistant touches for the vulnerability patterns that show up in the Georgia Tech data, and surfaces findings inline before they leave your machine. No code is sent to a remote server. No CI pipeline required to get the first signal. Install it in 60 seconds: lucidshark.com

Top comments (0)