DEV Community

Cover image for AI Writes Vulnerable Code 45% of the Time. Here's Why That's a Systems Problem.
Exact Solution
Exact Solution

Posted on

AI Writes Vulnerable Code 45% of the Time. Here's Why That's a Systems Problem.

Let me put the numbers on the table before anything else, because they are worse than most developers I talk to realise.
Veracode's 2025 research found that 45% of AI-generated code introduces known OWASP vulnerabilities. CodeRabbit's analysis showed AI-produced code carries a 2.74x higher vulnerability rate than human-written code.

CVSS 7.0+ vulnerabilities appear 2.5x more often in AI-generated code than in human-written code. By June 2025, AI-generated code was adding more than 10,000 new security findings per month across studied repositories — a 10x jump from December 2024.

Production incidents per pull request increased 23.5% between December 2025 and early 2026.

And yet: 92% of US developers now use AI coding tools daily. Gartner forecasts 60% of new code will be AI-generated by end of 2026. At Google and Microsoft, 30% of new code already is.

We have a situation where almost every developer is using tools that produce vulnerable code at a rate significantly higher than human-written code — and the adoption curve is going straight up while the security metrics are going in the other direction.

This is not a story about individual developers making bad choices. It is a story about systems failing at the architectural level.

The Incident Data Makes This Concrete

The statistics above are bad enough. The real-world incidents make them impossible to dismiss.

CVE-2025-48757 revealed that Lovable had been generating Supabase schemas without Row Level Security, exposing over 170 production applications. Moltbook leaked 1.5 million authentication tokens and 35,000 email addresses because API endpoints returned sensitive data without checking authorisation. The Tea App exposed 72,000 user images and 1.1 million private messages through missing access controls.

Security researchers scanning close to 5,600 vibe-coded applications discovered over 2,000 vulnerabilities and 400+ exposed secrets.

None of these incidents required a sophisticated attacker. They required someone to notice that the access controls were missing entirely. The AI tools generated plausible-looking code that did not implement the security primitives it appeared to implement. The code compiled. The tests passed. The PR got approved. The vulnerability shipped.

The Language Distribution Is Not Uniform

Java had a 72% security failure rate for AI-generated code. Python, C#, and JavaScript ranged from 38–45%. The 2026 update showed limited improvement: security pass rates remain stagnant at 55% industry-wide.

If your team is using AI heavily on Java backend services — authentication, payment processing, data access — you are operating with a failure rate that should concern any security review process. The language historically considered "enterprise safe" is the one where AI assistance is performing worst on security metrics.

41% of AI-generated backend code includes overly broad permission settings, increasing attack surfaces. AI tools frequently generate default admin-level access controls without role restriction.

This is the specific failure pattern that produced the incidents above. Not subtle cryptographic weaknesses. Not advanced injection techniques. Missing access controls on endpoints that should have them. Admin-level permissions where restricted permissions were required. The foundational primitives, absent, in code that otherwise looks entirely reasonable.

What Actually Helps — And What Doesn't

What doesn't work:

Adding a line to your engineering culture doc that says "review AI-generated code carefully." Code review is optimised to catch logical errors, not to audit threat models. Reviewers read for correctness in the expected case. The security failures live in the unexpected cases the reviewer is not actively imagining.

What the research shows actually works:

** 1. Automated SAST on AI-generated code before it reaches review**

Scan all AI-generated code with SAST and SCA tools before it reaches a pull request. Not after. Before. The review process assumes the code is basically safe and checks for correctness. The SAST scan assumes the code might be dangerous and checks for known vulnerability patterns. Both are necessary. Currently most teams have one.

# GitHub Actions — SAST pre-review gate
name: Security Scan
on: [pull_request]
jobs:
  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Semgrep
        uses: semgrep/semgrep-action@v1
        with:
          config: >-
            p/owasp-top-ten
            p/secrets
            p/default
      - name: Block on findings
        if: failure()
        run: |
          echo "Security findings detected.
          Review before merging."
          exit 1
Enter fullscreen mode Exit fullscreen mode

2. Explicit security context in prompts

The Backslash research showed Claude 3.7 Sonnet going from 6/10 to 10/10 secure outputs with security-focused prompting. Most developers use prompts like "write a user authentication endpoint." The security-aware version:

Write a user authentication endpoint.
Security requirements:

  • Validate and sanitise all inputs server-side
  • Use parameterised queries — no string concatenation
  • Return 401 with no detail on auth failure
  • Rate limit to 5 attempts per IP per minute
  • Log failed attempts without logging credentials
  • Never return password hashes or internal IDs

The model produces significantly different output when the threat model is explicit in the prompt. This is not obvious to most developers because most AI tool UX does not surface the security dimension of the task.

3. Restricted zones for AI assistance

Prohibit AI-generated code in high-risk areas — authentication, encryption, payment processing — without mandatory human review that specifically audits threat model coverage, not just code correctness.

This is not about distrusting AI tools. It is about routing them to tasks where their failure modes are least catastrophic. Classification logic, data transformation, UI components, test generation — high value, lower security criticality. Authentication flows, payment handling, access control — where failure modes are catastrophic and where the current research shows the highest vulnerability rates.

4. Treat access controls as a mandatory checklist item

Given that missing or overly broad access controls are the specific failure pattern producing the real-world incidents, a simple checklist applied to every AI-generated endpoint catches the most damaging class of issue:

Before any AI-generated endpoint merges:
□ Does this endpoint verify the caller is authenticated?
□ Does it verify the caller is authorised for this specific resource?
□ Does it validate that the resource belongs to the authenticated user?
□ Does it return only the minimum data required?
□ Does it use parameterised queries for all database operations?
□ Does it log the request without logging sensitive parameters?

Six questions. Applied consistently. They would have prevented the Moltbook leak, the Tea App exposure, and the Lovable RLS issue.

The Uncomfortable Projection

If the doubling trend in AI-introduced CVEs continues, AI-generated CVEs will become a dominant category in vulnerability databases by late 2026.

We are at the beginning of a period where the majority of new code being written contains a structural security gap — not because developers are careless, but because the tools they are using are not designed with adversarial conditions as a first-class concern.

The productivity gains from AI coding tools are real. Studies show up to 55% faster completion times on task-level benchmarks. But teams are also reporting 41% higher code churn and 7.2% decreased delivery stability.

The output is faster. Whether it is better depends entirely on what review infrastructure sits between the model output and production. HP
The developers who navigate this period well are not the ones who reject AI tools or the ones who accept AI output uncritically. They are the ones who understand specifically where the failure modes are concentrated — design-level security decisions, access control, permission scoping — and build their review process around those specific gaps rather than assuming general code review covers it.

The 45% number is not an argument against AI-assisted development. It is an argument for understanding what your review process is and is not actually catching.

If you found this useful, the six-question checklist above is free to copy, adapt, and add to your team's PR template. The research links are in the citations throughout — worth reading the primary sources directly.
Drop your team's approach to AI code security in the comments — genuinely curious what is working and what is not.

Building with AI tools daily and writing honestly about what works, what doesn't, and what the research actually says versus what the demos show. Follow for AI security, production reality checks, and the occasional uncomfortable statistic.
exactsolution.com

Top comments (0)