DEV Community

wd400
wd400

Posted on

I code-reviewed 3 "vibe-coded" PRs last week. Every one had hardcoded API keys. So I built a grader.

"Vibe coding" is everywhere. You prompt an AI, it writes your whole project, you ship it.

Last week I reviewed 3 PRs from vibe-coded projects. All three had hardcoded API keys in the source. Two had no tests. One had a raw eval() on user input.

So I built vibescore.

What it does

pip install vibescore
vibescore .
Enter fullscreen mode Exit fullscreen mode

One command. Letter grade from A+ to F. Four dimensions:

Category What it checks
Security Hardcoded secrets, SQL injection, eval/exec, insecure defaults
Code Quality Function length, complexity, nesting depth, type hint coverage
Dependencies Pinning, lock files, deprecated packages, known CVEs
Testing Test count vs LOC ratio, coverage setup, CI configuration

Example output

vibescore v0.4.0 — Project Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Security     B+  (no hardcoded secrets, 2 eval() calls found)
  Code Quality C   (4 functions >50 lines, low type hint coverage)
  Dependencies A-  (all pinned, lock file present)
  Testing      D   (3 tests for 2,400 LOC)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  OVERALL      C+
Enter fullscreen mode Exit fullscreen mode

Supported languages

  • Python (AST-based analysis)
  • JavaScript/TypeScript (regex-based)
  • Rust (VC221-VC227: unwrap density, unsafe blocks, doc comments, clone detection)
  • Go (VC231-VC237: unchecked errors, goroutine leaks, naked returns, panic in library code)

Extra features

  • vibescore --init-ci — generates a GitHub Actions workflow
  • vibescore --watch — re-scans on file changes in real-time
  • vibescore --dashboard — historical grade tracking (Streamlit web UI)
  • vibescore --save-history — save scan results for trend analysis
  • Zero dependencies. 201 tests.

Comparison

  • SonarQube: requires a Java server, complex setup, enterprise pricing
  • Codacy/CodeClimate: SaaS, requires account, sends code to servers
  • pylint/ruff: lint rules only, no security/testing/dependency analysis, no single grade
  • vibescore: one pip install, one command, local-only, zero deps, covers 4 dimensions with a letter grade

GitHub: github.com/stef41/vibescore
PyPI: pypi.org/project/vibescore

Feedback welcome — especially ideas for new check categories or language support.

Top comments (3)

Collapse
 
jeffrin-dev profile image
Twisted-Code'r

This hits close to home — I built ShadowAudit for exactly
this reason. Hardcoded API keys in prompts sent to LLMs
is the same problem one layer deeper.

vibescore catches keys before they're committed.
ShadowAudit catches them before they reach an AI API at
runtime. Different layers, same problem.

Would love to explore if these could complement each other
— a vibescore check that also flags prompts likely to
contain runtime secrets would be interesting.

github.com/Jeffrin-dev/ShadowAudit if you want to look
at the approach.

Collapse
 
laura_ashaley_be356544300 profile image
Laura Ashaley

This is honestly becoming a real pattern with “vibe coding” workflows—speed goes up, but basic security discipline disappears. Hardcoded API keys in PRs is one of those issues that should never make it past local development, yet it keeps slipping through when AI-generated or rapid prototype code bypasses proper review habits. Building a grader is a smart move because this isn’t just a linting problem—it’s a workflow enforcement problem. The real win here is shifting left: catching secrets, insecure configs, and unsafe patterns before code even reaches review.

Collapse
 
terrizoaguimor profile image
Mario Gutierrez

Ohhh you should go to GitHub, it's the wild west in there...