DEV Community

vorsken
vorsken

Posted on • Edited on

I Built a Gate That Blocks Vulnerable AI-Generated Code Before It Merges

A PR came in last week. All checks passed. Looked fine.

Hardcoded API key on line 11. SSRF vector in the request
handler. Command injection from a Copilot suggestion.

Nothing stopped it from merging. So I built something that does.

vorsken is a GitHub Action that runs Semgrep + Claude on every
PR and posts a BLOCK verdict before bad code reaches main.


The Patterns AI Gets Wrong

I've been running static analysis on AI-generated PRs for a while
now, and the same issues keep coming up.

1. Hardcoded Secrets

This one's almost embarrassingly common. AI tools have seen
millions of examples where inlining credentials "works" โ€” so
they do it without hesitation.

client = OpenAI(api_key="sk-proj-abc123...")
Enter fullscreen mode Exit fullscreen mode

Once this merges, it lives in your git history. Forever.

2. SSRF (Server-Side Request Forgery)

Ask an AI to "fetch data from a URL the user provides" and it
writes exactly that โ€” no validation, no allowlist.

response = requests.get(user_provided_url)
Enter fullscreen mode Exit fullscreen mode

Point that at http://169.254.169.254 and you're pulling cloud
credentials out of the metadata service. Classic.

3. Broken Object Level Authorization (BOLA)

This is the sneaky one. The endpoint looks totally fine at first
glance.

@app.get("/orders/{order_id}")
def get_order(order_id: int):
    return db.query(Order).filter(Order.id == order_id).first()
Enter fullscreen mode Exit fullscreen mode

Any authenticated user can access any order just by changing
the ID. It's OWASP API Top10 #1, and it's basically invisible
in a normal code review.

4. SQL Injection via String Formatting

Even in 2026, AI still reaches for f-strings when building
queries โ€” especially in less common ORMs or raw SQL contexts.

query = f"SELECT * FROM users WHERE username = '{username}'"
Enter fullscreen mode Exit fullscreen mode

Not much to say here. We've known about this for 25 years.


The Fix: A Policy Gate at the PR Level

The standard CI pipeline checks if code works.
It doesn't check if code is safe.

Linters catch style. Tests catch regressions. Neither of them
catches "this endpoint has no ownership check."

What you actually need is a layer that runs security policy
against the PR diff โ€” before merge, every time, automatically.
That means static analysis rules tuned to your threat model,
some AI-assisted context on top (not just pattern matching),
and a clear verdict on every PR: BLOCK, FLAG, or PASS.

Quarterly pentests and post-merge audits don't cut it anymore.
The enforcement has to happen at the pull request.


How vorsken Does It

I built vorsken to solve exactly this. It's a GitHub Action
that runs Semgrep + Claude AI on every PR diff and posts a
verdict as a PR comment.

Setup takes about two minutes:

# .github/workflows/vorsken.yml
- uses: zetide/vorsken@v0.2.6
  with:
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
Enter fullscreen mode Exit fullscreen mode

You can configure what gets blocked and what gets flagged:

# .stacksecai.yml
policy:
  block_on: ["ERROR"]
  flag_on: ["WARNING"]
claude:
  model: "claude-haiku-4-5"
  severity_block: ["CRITICAL", "HIGH"]
  severity_flag: ["MEDIUM"]
Enter fullscreen mode Exit fullscreen mode

On every PR, you get something like this:

๐Ÿšจ vorsken Policy Gate โ€” BLOCK

Finding: Hardcoded API key detected
Risk: Credential exposure via git history
Fix: Use environment variables or a secrets manager
Rule: OWASP API8 โ€“ Security Misconfiguration

The PR can't merge until the finding is resolved. That's the
point.


Wrapping Up

AI coding tools aren't going away โ€” and honestly, I don't want
them to. But the volume of AI-generated PRs is only going to
increase, and most pipelines aren't ready for what that means.

A policy gate at the PR level isn't a replacement for code
review. It's the layer that catches what humans miss when
they're moving fast.

If you're already shipping AI-generated code (and you probably
are), it's worth five minutes to see what's making it through.

โ†’ vorsken on GitHub
โ†’ GitHub Marketplace
โ†’ vorsken.dev

Top comments (3)

Collapse
 
jill_builds_apps profile image
Jill Mercer

iโ€™m all in on vibe codingโ€”itโ€™s the only way i shipโ€”but the merge and pray method is a recipe for a bad weekend. learned that the hard way after my last platform went dark and i had to rebuild from scratch. now i treat every ai diff as a suggestion rather than a fact. still figuring it out in cursor, but owning the final check is the only way to keep the main branch clean.

Collapse
 
vorsken profile image
vorsken

Ha, "merge and pray" โ€” that's exactly the phrase that kept coming back
to me while building this.

The "treat every AI diff as a suggestion" mindset is the right one,
but it's exhausting to do manually on every PR. That's the gap vorsken
is trying to fill โ€” make the final check automatic so you don't have
to rely on discipline alone.

Curious: in Cursor, are you doing any kind of review step before
committing, or is it mostly post-merge that you catch things?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.