I shipped a production bug last month that I would have caught in 30 seconds if I had actually read the code. But I did not read it. Claude wrote it, the tests passed, and I merged it. The feature worked perfectly. The bug was not in the feature logic. It was in an endpoint I had asked Claude to create alongside it, an endpoint with no authentication check, sitting quietly in my codebase for nine days before a security scan caught it.
I am not sharing this to be dramatic. I am sharing it because I suspect you have a similar story, or you will soon.
Here is the situation in 2026: 85% of professional developers use AI coding tools, and more than half use them daily. Vibe coding is not a fringe experiment anymore. It is how code gets written. AI tools write entire features, not just autocomplete. And the research on what that means for security is genuinely alarming.
The Numbers You Need to Know
Veracode published their Spring 2026 GenAI Code Security update after testing over 150 large language models, the most comprehensive longitudinal study of its kind. The headline finding is this: across all models and all tasks, only 55% of code generation tasks result in secure code. In 45% of cases, the model introduces a known security flaw.
Sit with that for a second. Nearly one in two AI-generated code outputs has a security problem. And this number has not improved in two years despite enormous advances in everything else these models do.
Here is the part that makes it worse. AI models now exceed 95% syntax correctness. They almost never write code that fails to run. The gap between "code that works" and "code that works securely" is not closing. It is widening. Models are getting better at producing functional code faster, which means developers are reviewing less of it, which means more vulnerable code is reaching production.
Even the best performers in Veracode's testing, the GPT-5 series with extended reasoning enabled, only hit 70 to 72% security pass rates. That is the ceiling right now. The highest-performing category in the most comprehensive study available still means roughly one in three outputs has a vulnerability.
CodeRabbit did a separate analysis of 470 pull requests that mixed AI-generated and human-written code. Their finding: AI code has 1.7x more major issues overall and 2.74x higher security vulnerability rates compared to human-written code for equivalent tasks.
And from the academic side, SWE-Agent with Claude 4 Sonnet solved 61% of tasks functionally correctly. Only 10.5% of those solutions were also secure.
These numbers are not coming from critics of AI development. They are coming from organizations that actively use and support AI coding tools. The point is not to stop using AI. The point is to understand what it is not good at yet, and security is near the top of that list.
The Most Common Vulnerabilities
The same categories of security flaws show up across every model, every language, and every study. Knowing what to look for is half the battle.
Missing Input Sanitization
This is the most common flaw identified across all languages and models in Veracode's testing. AI generates route handlers, form processors, and API endpoints that handle user input without sanitizing it. The code works. You can test it with clean inputs all day. It only breaks when someone sends it something it was not expecting, and attackers send things it was not expecting.
A typical example from an Express route:
app.post('/search', async (req, res) => {
const { query } = req.body;
const results = await db.raw(`SELECT * FROM posts WHERE title LIKE '%${query}%'`);
res.json(results);
});
This is SQL injection waiting to happen. AI generated this because it technically answers the question asked. The developer did not sanitize query in the prompt, so the model did not sanitize it in the output. It mirrored the incomplete specification back perfectly.
Hardcoded Credentials
AI systems produce example code. Example code has example values. The problem is that "example" API keys and connection strings look real enough that developers commit them, especially when they are generated in the middle of a larger feature and get lost in the noise.
// Common pattern AI generates during scaffolding
const stripe = new Stripe('sk_live_4eC39HqLyjWDarjtT1zdp7dc');
const db = new Pool({ connectionString: 'postgresql://admin:password123@localhost/myapp' });
GitHub's secret scanning catches some of these but not all, especially when developers are moving fast and not thinking about what the AI just dropped into their file.
Over-Permissive Defaults
AI defaults to configurations that work, not configurations that are restrictive. When you ask it to create an IAM role, an S3 bucket policy, or a database user, it gives you something that functions correctly for the task at hand. It does not think about least-privilege access.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
This is a real IAM policy pattern that AI generates when asked to "make a role that can access my bucket." It works. It also gives that role access to everything in your AWS account.
Hallucinated Dependencies
This one is particularly insidious. AI models suggest packages that do not exist. They generate import statements for libraries that sound plausible but were never published to npm, PyPI, or any registry. If a developer runs bun add or npm install on a hallucinated package name and an attacker has registered that name, the developer just installed malware.
This is called a dependency confusion or typosquatting attack, and AI-generated code has made the attack surface significantly larger. You used to have to mistype a package name. Now you can import a perfectly spelled package that simply does not exist as a legitimate library yet.
Always verify every package before installing it. Check the npm page. Check the GitHub. Check the download count. If it has under 1,000 weekly downloads and no clear maintainer, be skeptical.
Incomplete Access Control
AI implements business logic without consistently enforcing authorization. It builds the feature you described. It does not think about who should and should not be able to use it.
app.get('/admin/users', async (req, res) => {
const users = await db.query('SELECT * FROM users');
res.json(users);
});
This endpoint works. It returns users. What it does not do is check whether the person calling it has admin privileges. The developer asked for an admin endpoint to list users. The model created an endpoint that lists users. The "admin" part was a naming hint, not a security enforcement.
SQL Injection Through String Concatenation
Despite parameterized queries being the established solution for decades, AI still generates raw SQL string interpolation with frustrating regularity, especially in dynamic query scenarios.
// AI-generated, dangerous
const getUserPosts = async (userId: string, status: string) => {
return db.query(`SELECT * FROM posts WHERE user_id = ${userId} AND status = '${status}'`);
};
// What it should look like
const getUserPosts = async (userId: string, status: string) => {
return db.query('SELECT * FROM posts WHERE user_id = $1 AND status = $2', [userId, status]);
};
The first version is one curl command away from a full database dump.
Real Incidents: When This Goes Wrong in Production
These are not theoretical risks. They are documented incidents from 2025 and 2026 where AI-generated code created real security breaches.
Lovable, CVE-2025-48757. A security researcher audited 1,645 apps built entirely with the Lovable AI builder. 170 of them, 10.3%, had critical row-level security flaws. These were apps that users trusted with sensitive data, built by developers who shipped whatever the AI produced without auditing the database access patterns.
CurXecute, CVE-2025-54135. A remote code execution vulnerability discovered in Cursor, the AI code editor itself. The flaw allowed attackers to execute arbitrary code on developers' machines with no user interaction required. The irony of an AI coding tool being the vector for RCE is not lost on anyone in the security community.
The Moltbook incident. Security firm Wiz identified a misconfigured database exposing 1.5 million authentication tokens, 35,000 email addresses, and private messages between users. The site in question was Moltbook, a social platform. The owner publicly stated he had not written a single line of code. The entire application was vibe coded. The misconfiguration was a default configuration that any experienced developer would have caught, but when no experienced developer ever reads the code, it ships as-is.
The CVE count tells the same story at scale. In January 2026, six new CVEs were directly traced to AI-generated code. In February, fifteen. In March, thirty-five. The curve is not flattening.
Why AI Models Are Structurally Bad at Security
Understanding why this happens makes it easier to compensate for it.
AI models are optimized for functional correctness. The training signal rewards code that works. Security is orthogonal to correctness in most cases. A vulnerable SQL query runs fine. A hardcoded credential authenticates successfully. An over-permissive IAM role grants access without errors. Nothing breaks at the code level.
Security requires adversarial thinking. It requires asking "how would someone abuse this?" Models are trained to be helpful. They produce what was asked for and assume good faith on the part of the caller. Adversarial reasoning is not in their training objective.
Security also requires context that models do not have. What threat model is this application operating under? Who are the users and how much should they be trusted? What other systems does this endpoint connect to? What is the blast radius if this role is compromised? Models cannot answer these questions because they only see the code they are generating, not the system it lives in.
The training data problem compounds everything. The corpus that these models trained on includes every insecure Stack Overflow answer, every tutorial that skips authentication for brevity, every open source project that cut corners on input validation. When a model has seen "works but insecure" patterns millions of times, it reproduces them.
Finally, benchmarks do not measure security. Models are evaluated on HumanEval, SWE-bench, and MMLU. None of these measure whether the generated code is secure. So models get optimized for the benchmarks they are evaluated on, and security is not on the scorecard.
My Actual Review Process
I changed how I review AI-generated code after the incident I opened with. Here is what I actually do now.
Read every line before merging. This sounds obvious and I did not do it consistently before. If a function is too long to read carefully, it is too long to trust. I break large AI-generated outputs into smaller pieces and review each one before asking for the next.
Run SAST on every PR. Static analysis tools like Semgrep, Snyk, and Veracode catch patterns your eyes miss. I have Semgrep running in my CI pipeline on every pull request. It catches injection patterns, hardcoded secrets, and unsafe deserialization before they reach the branch.
Verify every import. Before bun add-ing anything AI suggested, I check npm for the package. Does it exist? Does it have real downloads? Does it have a GitHub repository with commits? One hallucinated package can compromise your entire development environment.
Audit auth separately. I make a specific pass through every AI-generated route handler focused only on authentication and authorization. Is there a session check? Is there a role check? Does the check happen before or after the database query? This is the category where AI is most consistently wrong.
Check environment variable handling. Look for hardcoded fallbacks like process.env.API_KEY || 'default_key_here'. Look for debug endpoints that expose internal state. Look for CORS configurations that default to *.
Use AI to review AI. Once I have a working implementation, I paste it back into a model and explicitly prompt for security review: "What security vulnerabilities exist in this code? Think like an attacker." The model that generated the code is not always the best reviewer, but a second model looking specifically for problems catches things the first pass missed.
Keep a vulnerability log. Every time I find a security issue in AI-generated code, I write it down. Patterns emerge. I know that Claude tends to miss rate limiting. I know that GPT tends to produce over-permissive CORS. Knowing your tools' specific blind spots makes you a better reviewer.
The Junior Developer Problem
There is a harder conversation underneath this one.
Junior developers are the heaviest users of AI coding tools. They are also the least equipped to review code for security issues. When an AI tool gives a junior developer the output speed of a senior, it gives them the speed without the judgment.
This is not a criticism of junior developers. The tooling is failing them. AI coding tools present generated code with enormous confidence. They do not flag uncertainty about security implications. They do not prompt developers to review the authentication logic. They just produce a clean, well-formatted, syntactically correct output and move on.
The more dangerous version of this problem: organizations cutting senior engineering headcount because AI makes their junior teams "productive enough." The seniors who would have reviewed that code in a PR, who would have caught the missing auth check, who have the mental model of what can go wrong, are gone. The AI creates the productivity illusion. The security review capacity disappears quietly.
If you are a senior engineer reading this, your security judgment is one of the things AI cannot replace right now. Make sure your organization understands that.
What I Actually Want from These Tools
The security gap is not inevitable. AI coding tools could address this directly if vendors prioritized it.
What I want to see: built-in security scanning before code is presented to the developer. Not after you accept it. Before. If the model is about to generate an endpoint with no authentication check, it should flag that before I see the output.
Threat model awareness based on project context. If I have described a multi-tenant SaaS, the model should know that row-level security is critical. If I have an admin panel, it should assume elevated risk on every generated endpoint.
Automatic matching against known CVE patterns. The top 25 CWEs are well-documented. There is no technical reason AI tools cannot check generated code against these patterns before surfacing the output.
Honest uncertainty. "This code is functional, but I am not confident it handles all edge cases in your auth flow" would be more valuable than a clean output that hides the gap.
Default to restriction. Generate the most restrictive version of any configuration and let developers loosen it explicitly rather than generating permissive defaults.
Some of these are starting to appear in tools like Cursor with their Security Rules feature and in Copilot's latest enterprise integrations. But they are not standard, not consistent, and not enough.
Where This Leaves Us
AI coding tools are the most significant productivity improvement in software development since the IDE. I use them every single day and I am not going back to writing everything by hand.
But the current reality is that AI tools are excellent at writing code that functions and unreliable at writing code that is secure. That gap is the developer's responsibility to close until the tools close it themselves.
The developers who thrive in this environment are the ones who use AI for speed and bring their own security judgment to every output. They read the code. They run the scans. They think about who should not be able to call this endpoint. They treat AI-generated code the way you would treat code from a talented intern who has never thought about security before: grateful for the output, but obligated to review it carefully.
That is not a criticism of the technology. It is a description of where the technology is right now. And knowing where it is means you can use it safely.
The bug I shipped last month cost me two hours to find and fix. The bigger cost was what I learned from it: speed without review is not a productivity gain. It is a delayed liability.
Top comments (0)