DEV Community

문세환
문세환

Posted on

I built a security scanner for AI-generated code — here's what it found

Vibe coding is everywhere. You prompt Claude or ChatGPT, paste the output, ship it. Fast. But here's the problem nobody talks about: AI models consistently produce the same security mistakes, over and over.

I spent the last few months building a scanner specifically for this pattern. Here's what I found.


The Problem With AI-Generated Code

When an LLM writes code, it optimizes for working code, not secure code. And it tends to make the same class of mistakes:

# AI loves this pattern — looks clean, is dangerous
def get_user(user_id):
    query = f"SELECT * FROM users WHERE id = {user_id}"  # SQL injection
    return db.execute(query)
Enter fullscreen mode Exit fullscreen mode
# AI generates this constantly for file handling
def read_file(filename):
    path = os.path.join(BASE_DIR, filename)  # path traversal if filename = "../../etc/passwd"
    return open(path).read()
Enter fullscreen mode Exit fullscreen mode
# The "vibe coding" stub — looks implemented, does nothing
def save_user_data(data):
    # TODO: implement database saving
    return {"status": "saved"}  # MISSING_WRITE: no actual DB write
Enter fullscreen mode Exit fullscreen mode

These aren't obscure edge cases. They're patterns that appear in AI-generated code constantly because the models learned from code that had these issues.


What I Built: VibeGuard

VibeGuard is an AST-based scanner with 48 detection patterns specifically tuned for AI-generated code:

  • 33 security patterns: SQL injection, command injection, path traversal, XSS, SSRF, hardcoded secrets, eval/exec, weak crypto...
  • 15 vibe-coding patterns: stub skeletons, missing DB writes, fake async, dead call results, hardcoded lookup tables...
  • 9 languages: Python, JavaScript, TypeScript, Go, Ruby, Java, PHP, Kotlin, C/C++

The key difference from tools like Bandit or Semgrep: VibeGuard knows what AI-generated code looks like. It doesn't just find security bugs — it finds the specific anti-patterns that emerge when LLMs write code.


Try It Right Now (30 seconds)

No install needed. Just curl:

curl -X POST https://pleasing-transformation-production-90c2.up.railway.app/v1/scan \
  -H "X-API-Key: vg_free_test" \
  -F "file=@your_file.py"
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "filename": "app.py",
  "blocks": 2,
  "warns": 5,
  "issues": [
    {
      "kind": "SQL_INJECTION_RISK",
      "severity": "BLOCK",
      "line": 23,
      "detail": "f-string interpolation in SQL — use parameterized queries"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Python:

import requests

with open("app.py", "rb") as f:
    r = requests.post(
        "https://pleasing-transformation-production-90c2.up.railway.app/v1/scan",
        headers={"X-API-Key": "vg_free_test"},
        files={"file": f}
    )
print(r.json())
Enter fullscreen mode Exit fullscreen mode

Add It to GitHub CI (2 minutes)

# .github/workflows/vibeguard.yml
name: VibeGuard Security Scan
on: [pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: Moonsehwan/aina-vibeguard-action@v1
        with:
          api-key: ${{ secrets.VIBEGUARD_KEY }}
          fail-on-block: 'true'
Enter fullscreen mode Exit fullscreen mode

Every PR gets scanned. AI-generated SQL injection or stub skeleton → merge blocked.


The 15 Vibe-Coding Patterns

This is what makes VibeGuard different. These patterns don't exist in traditional scanners:

Pattern What It Looks Like
STUB_SKELETON def process(data): return {} — LLM left a placeholder
MISSING_WRITE def save_user(data): return {"status": "saved"} — no INSERT
FAKE_ASYNC async def fetch(): return data — async without await
DEAD_CALL_RESULT Calls 3 modules, ignores all return values
HARDCODED_TABLE Replaces DB lookup with giant hardcoded dict
INPUT_OUTPUT_DISCONNECTED Parameters don't affect return value
MOCK_PATTERN unittest.mock in production code

If you've used Claude Code, Cursor, or Copilot heavily, I promise you have at least one of these.


Real Finding

I scanned a popular open-source AI coding assistant (25K+ stars):

BLOCK  COMMAND_INJECTION  agent.py:1222
       subprocess.Popen(cmd, shell=True)
       any malicious config file can execute arbitrary commands
Enter fullscreen mode Exit fullscreen mode

Found in 3 seconds. Bandit missed it. Semgrep missed it.


vs Bandit / Semgrep

VibeGuard Bandit Semgrep
AI code patterns 15 specific none none
Languages 9 Python only 30+
GitHub Action yes yes yes
Free tier 50 files/day unlimited limited

The gap is the AI-specific patterns. Bandit and Semgrep are great — they just weren't designed for LLM-generated code.


Try It

Scan your AI-generated code before it ships. 30 seconds, you'll be surprised what you find.


AST-based, deterministic. Same input always gives same output. No LLM in the scan pipeline.

Top comments (0)