I Built a Security Scanner That Uses AI to Review Its Own Findings

#ai #security #python #opensource

Every AI coding tool ships code fast. None of them check if it's safe.

I built Critik — an open-source security scanner that catches what your AI writes and your review misses. Regex and AST find the candidates. An LLM reviews each one with full file context, confirms the real problems, kills the false positives, and explains why in plain English.

pip install critik and you're scanning in 30 seconds.

The Numbers Are Ugly

53% of teams that shipped AI-generated code later found security issues that passed review. Georgia Tech's Vibe Security Radar tracked 74 CVEs from AI coding tools in Q1 2026 alone — 6 in January, 15 in February, 35 in March. Accelerating.

Here's what I keep finding when I scan AI-built projects:

Hardcoded API keys — Cursor generates a Supabase client and pastes the service_role key right in the file
SQL injection via f-strings — Copilot autocompletes db.execute(f"SELECT * FROM users WHERE id = {user_id}") without blinking
Firebase rules wide open — Bolt scaffolds read: true, write: true and nobody touches it
NEXT_PUBLIC_ prefix on secrets — the env var that hands your database URL to every browser

Not edge cases. Default patterns.

The Tools That Exist Don't Fix This

Snyk charges $25-98/dev/mo. Built for enterprises with procurement budgets.

Semgrep is powerful. Also requires writing custom rules in a DSL. Steep curve. Recently relicensed behind commercial terms.

npm audit is, in Dan Abramov's words, "broken by design" — flags devDependency issues that can't touch production.

Bandit and ESLint security plugins catch patterns but have zero context. They flag eval() in a test fixture the same way they flag eval(user_input) in a request handler.

That last one is the real problem. Static scanners are noisy. Developers learn to ignore them. Which means they ignore the real findings too.

Two Passes. One Scanner.

Pass 1 — Static (fast, offline, free)

Regex patterns and Python AST parsing. Hardcoded secrets (16 patterns — AWS, Stripe, OpenAI, Anthropic), SQL injection, command injection, eval/exec, XSS, framework misconfigs. Runs in milliseconds. No API key needed.

Pass 2 — AI Review (optional, the whole point)

Add --ai and each finding goes to Groq's Llama 3.3 70B with the full file as context. The model acts as a security analyst:

Verdict: confirmed, false_positive, or needs_review
Confidence: 0-100%
Why: "This eval() parses trusted JSON config from a local file" vs "This eval() takes unsanitized user input from req.query"
Fix: actual code, not generic advice

The AI doesn't replace the scanner. It reviews the scanner's work.

What It Looks Like

$ critik scan . --ai

  Critik v0.4.0 — scanned 7 files  [AI]

  CRITICAL  nextjs-public-secret
  app/config.ts:18  Secret exposed to browser via NEXT_PUBLIC_ prefix
  | 18 | const key = process.env.NEXT_PUBLIC_SECRET_API_KEY
  CONFIRMED (95%)
  > The NEXT_PUBLIC_ prefix exposes this API key to every browser.
  Fix: Rename to SECRET_API_KEY and access server-side only

  CRITICAL  aws-access-key              (dimmed — false positive)
  tests/fixtures/bad_secrets.py:2  AWS access key detected
  FALSE POS (100%)
  > This file is in tests/fixtures — fake credentials for testing.

  HIGH      sql-fstring
  app/db.py:6  SQL injection via f-string in execute()
  | 6 | db.execute(f"SELECT * FROM users WHERE id = {user_id}")
  CONFIRMED (95%)
  > This f-string takes unsanitized input, allowing SQL injection.
  Fix: db.execute('SELECT * FROM users WHERE id = ?', (user_id,))

  ─────────────────────────────────────────────
  7 files scanned in 2714ms — 23 findings: 10 critical, 8 high, 5 medium
  AI analysis: 7 confirmed, 16 false positives

Without AI: 23 findings. Developer overwhelmed. Ignores all of them.

With AI: 7 real issues. 16 dismissed with reasons. Developer fixes what matters.

Under the Hood

Each API call sends the full file content (up to 8K chars) plus all findings for that file. One call per file — keeps tokens low, context high.

The model sees imports, function signatures, data flow. It knows test fixtures are probably safe, .env files are expected to have secrets, and NEXT_PUBLIC_ vars are intentionally client-exposed.

Temperature 0.2. Structured JSON back. If the API is down, Critik falls back to regex-only. No crash, no hang.

What It Catches

Category	Patterns	Examples
Secrets	16	AWS, GitHub, OpenAI, Anthropic, Stripe, Slack, DB URLs, JWTs, private keys
Injection	6	SQL via f-string/concat, eval(), exec(), os.system(), subprocess shell=True, XSS
Frameworks	8	Supabase RLS, Firebase rules, NEXT_PUBLIC_ secrets, NextAuth, Prisma raw, Stripe webhooks
Config	6	NODE_ENV in source, insecure cookies, open CORS, debug mode
Auth	2	Missing auth on routes, open endpoints
Dotenv	3	Exposed .env, sensitive vars unencrypted

Free. All of It.

pip install critik
critik scan .                # regex/AST only — offline, instant
critik scan . --ai           # + AI review (needs GROQ_API_KEY)
critik scan . --format fix   # copy-paste fix prompts for Cursor/Claude

Pre-commit hook: critik hook install. SARIF output for CI/CD. GitHub Action included. MIT license.

Why I Built This

I hunt bugs on HackerOne and 0din. The vulnerabilities I find in production are the same patterns AI tools ship by default. Hardcoded keys. Missing auth. SQL injection. Open configs.

The irony: AI coding tools are the biggest source of new vulnerabilities and the best tool for catching them. A regex finds eval(). Only an LLM can tell you if it's dangerous.

Critik is the scanner I wanted. Now it exists.

Website: critik.dev
GitHub: AlexlaGuardia/Critik
PyPI: pip install critik

Run critik scan . on your project. You might not like what you find.