DEV Community

ayame0328
ayame0328

Posted on

I Built a Security Scanner Because AI Code Scared Me

Three weeks ago, I was reviewing a pull request that Claude had generated for me. Authentication system, looks clean, tests pass. Ship it.

Then I looked closer.

The JWT secret was hardcoded. The password comparison used == instead of a timing-safe function. There was no rate limiting on the login endpoint. And the session token was stored in localStorage — XSS paradise.

Four security vulnerabilities in 40 lines of "working" code. And I almost merged it.

That's when I decided to build CodeHeal.

The Problem Nobody Talks About

We all use AI code assistants now. GitHub Copilot, ChatGPT, Claude — they're incredible productivity tools. I build 10x faster with them.

But here's what nobody wants to admit: we're shipping AI-generated code without proper security review.

Not because we're lazy. Because it's so much code. When AI generates 500 lines in 30 seconds, you can't manually review every line for security issues. You scan for obvious problems, check that it works, and move on.

I did a quick audit of my own projects. In the last month, AI assistants had generated roughly 3,000 lines of code for me. I found 14 security issues I had missed during review. Fourteen.

Why Existing Tools Don't Cut It

My first instinct was "just use Snyk" or "just use SonarQube." I tried. Here's what happened:

Snyk: Caught 2 of the 14 issues. Great for dependency vulnerabilities, but it's not designed to catch application-level patterns like hardcoded secrets or silent error handling in AI-generated code.

SonarQube: Caught 4 of the 14. Better, but it flagged 20+ style issues I didn't care about, burying the security findings in noise.

ESLint security plugins: Caught 3 of the 14. Close to the code, but limited rule set.

None of them caught the AI-specific patterns: the "example" API keys that look like placeholders but were actually used in production, the TODO comments masking missing security features, the overprivileged network calls that AI adds "just in case."

Building the Scanner

Week 1: The Wrong Way

I started by feeding code into an LLM and asking it to find vulnerabilities. Classic 2026 approach.

It worked about 80% of the time. The other 20%? It either missed things entirely or hallucinated vulnerabilities that didn't exist. And running the same scan twice gave different results.

I wrote about this in detail in my previous article — the non-determinism killed it for me. A security tool you can't trust is worse than no tool.

Week 2: Back to Basics

I scrapped the LLM approach and went with static analysis. Pattern matching, regex, rule-based detection. Boring? Yes. Reliable? Absolutely.

I started with the vulnerabilities I'd actually found in my own AI-generated code and worked outward:

The first 5 categories came from my own mistakes:

  1. Command injection (that eval(input()) I almost shipped)
  2. Hardcoded secrets (JWT keys, API tokens)
  3. Missing input validation (SQL injection vectors)
  4. Silent error handling (empty catch blocks everywhere)
  5. Overprivileged operations (unnecessary file system access)

The next 9 categories came from research:

  1. Obfuscation and encoding tricks
  2. Package dependency risks
  3. Persistence mechanisms
  4. Cryptographic issues
  5. Destructive operations
  6. Privilege escalation
  7. Typosquatting attacks
  8. Consent gap (silent network calls)
  9. Code quality signals (TODO markers, debug leftovers)

By the end of week 2, I had 93 detection rules across 14 categories.

Week 3: Making It Real

A scanner that only I can use isn't a product. So I built the SaaS layer:

  • Next.js for the frontend and API
  • Stripe for subscriptions
  • GitHub OAuth for authentication
  • Supabase for scan history
  • Vercel for hosting

The scanner itself runs entirely server-side. You paste code, it scans against all 93 rules, returns findings with severity scores and a composite risk assessment. No code is stored unless you're logged in and want scan history.

The whole thing — scanner engine, web app, payment system, deployment — took about 3 weeks of focused building. AI code assistants helped with the boilerplate (ironic, I know). I scanned my own AI-generated code with my own scanner throughout development. Found 6 issues in my scanner code. Fixed them all.

What I Learned

1. AI Code Has Predictable Failure Modes

After scanning hundreds of samples, I can tell you: AI doesn't make random mistakes. It makes the same mistakes, over and over. Hardcoded secrets, missing validation, silent errors — these aren't edge cases. They're the default.

This means a pattern-based approach works surprisingly well. You don't need AI to catch AI mistakes. You need a checklist — a really good, really thorough checklist.

2. Speed Matters More Than I Expected

My scanner runs in 15-50 milliseconds. That sounds like a meaningless optimization until you realize: if scanning takes more than a few seconds, people skip it.

The goal is to make security scanning feel like syntax highlighting — it just happens, no friction, no waiting.

3. Determinism Is Non-Negotiable

Same code in, same results out. Every time. This isn't just a technical requirement — it's a trust requirement. If developers see different results on different days, they stop trusting the tool.

4. The Market Gap Is Real

I built this for myself. But every developer I've shown it to has the same reaction: "I need this." We're all shipping AI-generated code, and we all know we're not reviewing it carefully enough.

The Results

Running CodeHeal against a sample of 50 AI-generated code snippets:

Metric Result
Average findings per snippet 2.3
Critical vulnerabilities found 18
High severity findings 31
Scan time (total, all 50) 1.2 seconds
Consistency across repeated runs 100%

The most common finding? Hardcoded secrets. In 68% of samples.

The most dangerous finding? A combination of eval() with user input piped through subprocess — found in an AI-generated CLI tool that was already deployed to npm.

What's Next

I'm running CodeHeal as a free tool right now. Free tier gets you 5 scans per day — enough to check your AI-generated code before you commit it.

The roadmap:

  • API access for CI/CD integration
  • VS Code extension for real-time scanning
  • Team dashboards for shared vulnerability tracking
  • Custom rule sets for organization-specific patterns

But the core is already there: paste your code, see what's wrong, fix it before it ships.


Try It

If you're using AI code assistants (and in 2026, who isn't?), you owe it to yourself to scan the output before shipping it.

CodeHeal catches what AI misses. 14 categories, 93 rules, zero API costs, instant results.

Scan your code free →

Related articles:

Top comments (0)