ayame0328

Posted on Mar 13

I Built a Security Scanner Because AI Code Scared Me

#ai #saas #security #showdev

Two months ago, I was selling Claude Code skills on Qiita. I had 75,000 page views. Zero paid purchases.

Today, I have a working SaaS that scans AI-generated code for security vulnerabilities. I built the entire MVP in one day.

This is the story of how a failed product led me to a real one.

The Pivot: From Skills to SaaS

I spent a month creating and selling Claude Code skills — reusable prompt templates and workflows. The results were brutal:

75,000+ page views on Qiita (Japanese dev platform)
49 technical articles published
0 paid purchases

The market analysis told the story: the Claude Code Skills paid marketplace had accumulated only $1,400 in total sales across all sellers. The paid market simply didn't exist yet.

But I had something valuable: a security scanner skill with 14 detection categories and 95+ vulnerability check items. It was the most comprehensive piece I'd built. And people kept reading the articles about it.

That's when it clicked: don't sell the skill as a file. Sell it as a tool.

The Problem I Couldn't Ignore

While building the scanner skill, I'd scanned hundreds of AI-generated code samples. The patterns were alarming:

Every AI assistant — ChatGPT, Copilot, Claude — routinely generates code with:

Hardcoded API keys directly in source files
Shell injection vectors via unsanitized string interpolation
Disabled security features ("just set verify=False!")
Empty error handlers that silently swallow failures
Persistence mechanisms that look like legitimate config

And the existing security tools? Snyk finds dependency CVEs. SonarQube catches language anti-patterns. Semgrep matches custom rules.

None of them are specifically looking for the patterns AI code assistants produce.

That gap was my product.

Why I Ditched the LLM Approach

My first instinct was obvious: use an LLM to analyze code. Feed it source, ask for vulnerabilities. I'd seen other tools do this.

I tried it. It was terrible.

I ran the same code through an LLM scanner 5 times and got 5 different severity scores. The API calls took 3-15 seconds each. At $0.03-0.10 per scan, the economics didn't work for a $29/month SaaS. And occasionally, the LLM hallucinated vulnerabilities that didn't exist.

So I went back to basics: regex pattern matching and static analysis.

It's not glamorous. But it's:

100% reproducible — same code, same result, every time
Instant — under 50ms per scan
Free to run — zero API costs
CI/CD friendly — deterministic output means reliable automation

I converted my 95+ detection items into regex patterns organized across 14 categories. Added a scoring system with severity weights and confidence coefficients. Built composite risk detection that flags dangerous pattern combinations.

The final engine: 93 rules, 14 categories, zero LLM dependency.

Building the MVP in One Day

Here's where it gets interesting. With the scanner engine design already proven from the skill version, I used Claude Code to build the full SaaS MVP:

Morning: Foundation

Next.js 16 + TypeScript + Tailwind CSS 4
Scanner engine ported from skill → TypeScript modules
POST /api/scan endpoint
5 initial detection categories, 40 rules

Afternoon: Features

NextAuth.js v5 with GitHub OAuth
Stripe subscription integration (Free / Pro $29 / Enterprise $99)
All 14 categories, 93 rules implemented
Landing page, pricing page, dashboard
Scan history with localStorage

Evening: Deploy

Vercel deployment
Environment variables configured
Production build verified
Live at scanner-saas.vercel.app

Was it polished? No. Was it a working product with real security scanning capability, authentication, and payment infrastructure? Yes.

The key accelerator: I wasn't starting from zero. The scanner skill had already validated the detection logic, the severity scoring, and the category structure. Converting that knowledge into a TypeScript SaaS was the fast part.

What It Detects (Without Giving Away the Secret Sauce)

I'm not going to share the specific regex patterns or scoring algorithms — that's the product's core value. But here's what the 14 categories cover:

Category	What It Catches
Command Injection	Shell execution, eval, pipe-to-shell
Obfuscation	Base64, hex encoding, unicode smuggling
Prompt Injection	Instruction override, fake system messages
Secret Leakage	API keys, tokens, hardcoded credentials
External Communication	Data exfiltration, reverse shells, tunneling
Filesystem Operations	Destructive deletes, sensitive file access
Package Operations	Suspicious installs, postinstall hooks
Persistence	Crontab, systemd, SSH key injection
Cryptocurrency	Mining pools, wallet addresses, resource hijacking
Ransomware	Encryption loops, ransom notes, shadow deletion
Privilege Escalation	Sudo abuse, setuid, container escape
Typosquatting	Known fake package names
Consent Gap	Silent network calls, clipboard/camera access
Metadata & Quality	Debug leftovers, error swallowing, disabled security

Each finding includes severity level, confidence rating, line number, and matched content. The composite risk system flags dangerous combinations across categories.

The Scoring System

Every detection has two dimensions:

Severity: How bad is this if it's real? (Critical → High → Medium → Low → Info)
Confidence: How sure are we this is actually malicious? (High → Medium → Low)

The final score multiplies severity points by confidence coefficients. This means a high-severity match with low confidence scores less than a medium-severity match with high confidence.

Plus, composite risk bonuses when multiple suspicious patterns appear together:

Secret leakage + external communication = probable data exfiltration (+15 points)
Obfuscation + command injection = likely malicious payload (+10 points)
Persistence + external connection = potential backdoor (+10 points)

The result is a risk rank: SAFE, CAUTION, DANGEROUS, or CRITICAL.

What I Learned

1. Failed products aren't wasted effort

My skills selling project "failed" — but the scanner skill became the foundation for a real SaaS. The 75K page views taught me content marketing. The Qiita articles became a template for Dev.to.

2. The boring solution wins

Regex over LLM. Static analysis over AI magic. The most reliable, cheapest, fastest approach was the one with zero hype.

3. Speed matters more than perfection

A working MVP deployed in one day beats a perfect product deployed never. I can iterate from here.

4. Sell the tool, not the file

Skills as downloadable files? $0 revenue. Skills as a running service? Real business potential.

Try CodeHeal

CodeHeal scans your AI-generated code for 93 vulnerability patterns across 14 categories.

Free tier: 5 scans/day, no account required
Pro ($29/month): 100 scans/day, scan history
Enterprise ($99/month): Unlimited scans, API access, team features

No LLM. No API costs. Deterministic results every time.

Scan your code for free →

Related articles:

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.