Moon sehwan

Posted on Jun 23

I gave AI-generated code a score from 0–100. Most repos scored below 30.

#security #ai #programming #webdev

What if your code got a grade — like a school report card — but brutally honest?

I built exactly that. A scanner that reads your code and returns:

A score from 0 to 100
A grade from S to F-
A message like: "Your data is everyone's data now"
A per-vulnerability roast that hits different

Here's what happened when I ran it on real AI-generated repos.

The Grade System

Grade	Score	Character	Vibe
S	95–100	🦄	"Mythical. Frame this on your wall."
A+	88–94	👑	"Written by an actual human?"
A	78–87	🚀	"Pretty good. Room for polish."
B+	65–77	😎	"Not bad. PR might get approved."
B	50–64	🤔	"50/50. Could go either way."
C	35–49	😅	"Deploy with prayer."
D	20–34	😱	"AI-generated, not reviewed."
F	8–19	🤖	"AI slop, shipped raw."
F-	0–7	💣	"Your data is everyone's data now."

The score isn't random. Every vulnerability has a weighted deduction:

SQL_INJECTION_RISK    → -28 points
COMMAND_INJECTION     → -28 points
HARDCODED_SECRET      → -22 points
EVAL_EXEC_RISK        → -18 points
MISSING_WRITE         → -10 points  ← vibe coding special
STUB_SKELETON         →  -8 points  ← vibe coding special
FAKE_ASYNC            →  -6 points  ← vibe coding special

Repeat the same bug type? The deduction shrinks (60%, 40%, 20%) — so one bad pattern can't unfairly bury your whole score.

The Roasts Hit Different Per Vulnerability

This is the part people screenshot and share.

💉 SQL_INJECTION_RISK

"SELECT * FROM users WHERE hacker=1 — already queued"
"Free DB access for the world, courtesy of you"
"Your DB is readable by everyone. congrats"

🔑 HARDCODED_SECRET

"Bots harvest GitHub secrets in under 5 seconds"
"It's in your code. Not a secret anymore."
"Someone may already be using it"

💾 MISSING_WRITE (the vibe coding classic)

"save() without INSERT. Peak AI slop"
"save() that saves nothing — plot twist"
"AI forgot to implement the implementation"

⏳ FAKE_ASYNC

"async with no await. Event loop is crying"
"async keyword as decoration, not function"
"Synchronous code in async clothing"

🏗️ STUB_SKELETON

"return {} — AI gave up mid-implementation"
"All skeleton, no muscle. Decorative code"
"This function is an elaborate nothing"

🖥️ COMMAND_INJECTION

"Free server root access — thanks to you"
"rm -rf / is one payload away"
"More dangerous than handing out your SSH key"

Real Test: What Score Does This Get?

import sqlite3, subprocess

API_KEY = "sk-prod-abc123"  # whoops

def get_user(user_id):
    db = sqlite3.connect("users.db")
    query = f"SELECT * FROM users WHERE id = '{user_id}'"
    return db.execute(query).fetchall()

def run_backup(path):
    subprocess.run(f"tar -czf backup.tar.gz {path}", shell=True)

async def notify(user_id):
    import requests
    return requests.get(f"http://service/{user_id}").json()

def save_profile(data):
    return {"status": "saved"}  # ← saves nothing

I ran this through AINAScan. Here's the breakdown:

BLOCK: HARDCODED_SECRET     L3   → -22 pts
BLOCK: SQL_INJECTION_RISK   L7   → -28 pts
BLOCK: COMMAND_INJECTION    L11  → -28 pts
BLOCK: MISSING_WRITE        L15  → -10 pts
WARN:  FAKE_ASYNC           L13  → -6 pts

Final score: 6 / 100
Grade: F- 💣
Message: "Your data is everyone's data now"

6 out of 100. One typo away from a security incident.

The brutal irony? This exact pattern shows up in real vibe-coded repos. Someone asked ChatGPT to "write a user profile API." It generated this — SQL injection, hardcoded keys, and a save_profile() that saves nothing. The AI was cosplaying a backend.

Why a Score Works Better Than a Bug List

Most security scanners dump a wall of issues. Nobody reads them.

A single number changes that:

"Our codebase is a 73" → actual team conversation
"We dropped from 81 to 64 after last sprint" → real accountability
"This PR dropped the score 12 points" → concrete code review anchor

Plus: same file always gets the same score (SHA-256 dedup). Fix a bug, rescan, watch the number move. That feedback loop is addictive in a good way.

The 3 Vibe-Coding Bugs That Destroy Scores Most

1. FAKE_ASYNC (-6 pts each)

# ChatGPT's idea of "async"
async def process_items(items):
    results = []
    for item in items:
        results.append(expensive_sync_operation(item))  # blocks everything
    return results

The async keyword does nothing without await. You're blocking the event loop. This is the #1 pattern AI generates when asked to "make it async."

Fix: Either add await asyncio.to_thread(expensive_sync_operation, item) or remove async.

2. MISSING_WRITE (-10 pts each)

# AI's idea of "saving"
def save_order(order_data: dict) -> dict:
    order_id = generate_id()
    return {"status": "saved", "order_id": order_id}
    # ↑ WHERE IS THE INSERT

No SQL. No file write. No cache set. The function exists, has a name that implies persistence, and does nothing persistent. This happens when AI generates the contract before the implementation.

Fix: Actually write to something. db.execute("INSERT INTO orders ...", (...)) would be a start.

3. STUB_SKELETON (-8 pts each)

# AI's "implementation" of complex logic
def calculate_risk_score(user: dict, portfolio: list) -> float:
    # Calculate user risk score based on portfolio
    # TODO: implement
    return {}

Returns a dict typed as float. AI generated the signature and docstring, then gave up on the actual logic. Ships anyway.

How It Actually Works (The Technical Part)

Three layers, none of them are regex:

Layer 1 — Taint tracking across 9 languages
Builds a set of tainted variables from sources (request.args, sys.argv, form inputs). Tracks them through assignments. Checks if any reach dangerous sinks (execute(), subprocess.run(), eval()).

Layer 2 — AST structural analysis
Detects vibe-coding patterns: functions named save* with no write operations, async def with no await, functions where parameters never influence the return value (using def-use graphs).

Layer 3 — Causal impact scoring
Cross-references findings with a knowledge graph (133K+ causal chains) to estimate real-world impact. SQL injection → data exfiltration has a 0.94 probability in the graph. That's why it costs 28 points.

Supports: Python, JavaScript, TypeScript, Go, Ruby, Java, Kotlin, PHP, C/C++

Try It

👉 Paste your code at AINAScan — get your score in seconds

Or curl it directly (free test key, no signup):

curl -X POST https://pleasing-transformation-production-90c2.up.railway.app/v1/scan \
  -H 'X-API-Key: vg_free_test' \
  -F 'file=@your_code.py'

Nothing is stored. Code runs in memory and is discarded after the scan.

What's Your Score?

Drop it in the comments.

I want to see the distribution across Dev.to readers. My guess is most of us are in the B-C range — not catastrophically broken, but not clean either. The S-tier folks are rare. The F- club is larger than anyone wants to admit.

Are you a 🦄 or a 💣?

Try it and find out.

48 patterns, 9 languages, open source core at github.com/moonsehwan/aina-scan

DEV Community