AI coding tools have changed how fast we ship.
Copilot, Cursor, Claude — they write working code in seconds.
But "working" and "secure" are not the same thing.
SQL injection, hardcoded secrets, SSRF, broken object-level
authorization — these patterns show up in AI-generated code
all the time. Not because the tools are bad, but because they
optimize for correctness, not security policy.
And right now, that code is merging into your main branch.
The Patterns AI Gets Wrong
I've been running static analysis on AI-generated PRs for a while
now, and the same issues keep coming up.
1. Hardcoded Secrets
This one's almost embarrassingly common. AI tools have seen
millions of examples where inlining credentials "works" — so
they do it without hesitation.
client = OpenAI(api_key="sk-proj-abc123...")
Once this merges, it lives in your git history. Forever.
2. SSRF (Server-Side Request Forgery)
Ask an AI to "fetch data from a URL the user provides" and it
writes exactly that — no validation, no allowlist.
response = requests.get(user_provided_url)
Point that at http://169.254.169.254 and you're pulling cloud
credentials out of the metadata service. Classic.
3. Broken Object Level Authorization (BOLA)
This is the sneaky one. The endpoint looks totally fine at first
glance.
@app.get("/orders/{order_id}")
def get_order(order_id: int):
return db.query(Order).filter(Order.id == order_id).first()
Any authenticated user can access any order just by changing
the ID. It's OWASP API Top10 #1, and it's basically invisible
in a normal code review.
4. SQL Injection via String Formatting
Even in 2026, AI still reaches for f-strings when building
queries — especially in less common ORMs or raw SQL contexts.
query = f"SELECT * FROM users WHERE username = '{username}'"
Not much to say here. We've known about this for 25 years.
The Fix: A Policy Gate at the PR Level
The standard CI pipeline checks if code works.
It doesn't check if code is safe.
Linters catch style. Tests catch regressions. Neither of them
catches "this endpoint has no ownership check."
What you actually need is a layer that runs security policy
against the PR diff — before merge, every time, automatically.
That means static analysis rules tuned to your threat model,
some AI-assisted context on top (not just pattern matching),
and a clear verdict on every PR: BLOCK, FLAG, or PASS.
Quarterly pentests and post-merge audits don't cut it anymore.
The enforcement has to happen at the pull request.
How vorsken Does It
I built vorsken to solve exactly this. It's a GitHub Action
that runs Semgrep + Claude AI on every PR diff and posts a
verdict as a PR comment.
Setup takes about two minutes:
# .github/workflows/vorsken.yml
- uses: zetide/vorsken@v0.2.6
with:
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
You can configure what gets blocked and what gets flagged:
# .stacksecai.yml
policy:
block_on: ["ERROR"]
flag_on: ["WARNING"]
claude:
model: "claude-haiku-4-5"
severity_block: ["CRITICAL", "HIGH"]
severity_flag: ["MEDIUM"]
On every PR, you get something like this:
🚨 vorsken Policy Gate — BLOCK
Finding: Hardcoded API key detected
Risk: Credential exposure via git history
Fix: Use environment variables or a secrets manager
Rule: OWASP API8 – Security Misconfiguration
The PR can't merge until the finding is resolved. That's the
point.
Wrapping Up
AI coding tools aren't going away — and honestly, I don't want
them to. But the volume of AI-generated PRs is only going to
increase, and most pipelines aren't ready for what that means.
A policy gate at the PR level isn't a replacement for code
review. It's the layer that catches what humans miss when
they're moving fast.
If you're already shipping AI-generated code (and you probably
are), it's worth five minutes to see what's making it through.
Top comments (0)