SystAgProject

Posted on Apr 19 • Originally published at systag.gumroad.com

I ran a security audit on my own Python codebase with an LLM for $0.90. Here is what it found.

#python #security #ai #showdev

Last week I shipped a small product called VibeScan — a 49-dollar PDF security audit for apps built with Lovable / Bolt / Cursor / Replit / v0. Before I asked anyone to pay for it, I ran it on my own codebase as a smoke test.

124 scannable Python files, 4 LLM batches, 22 seconds total wall time. Audit cost: $0.90 of Opus 4.7 with prompt caching. Output: 0 critical findings, 1 high, 2 medium. One of the findings was a real bug I fixed the same hour. The other two were legitimate risk flags I had not thought about.

Here is the full report, with context on each finding.

[HIGH] Subprocess stdout/stderr written to the ledger without size cap

Location: — the function that spawns every scheduled job.

A runaway script that prints megabytes of logs (for example a scraper dumping HTML) will push all of that into your SQLite ledger, potentially bloating the database and causing memory issues during capture. A single bad run could write hundreds of MB.

Fix: In , truncate stdout/stderr to the last ~10KB before returning (e.g., ) so oversized output cannot blow up the ledger or memory.

Why this matters

I was using Python subprocess.run with . That flag tells Python to hold the subprocess full stdout and stderr in memory until the child exits. Which is fine when a cron job prints 50 lines and exits. But if one of those jobs is a web scraper that dumps the HTML of every page it visits, or an ETL that prints a row per record processed on a million-row table, every byte of that output sits in the scheduler process RAM before being written to the SQLite ledger.

I had never thought about it. The scheduler had run for weeks without hitting this because all the current jobs are well-behaved. But the next job anyone adds could be the one that dumps 500 MB on a bad day.

The fix took four minutes: cap each stream at 50 KB before returning. If a security auditor had flagged this I would have paid $200. VibeScan cost $0.90 for the whole repo.

[MEDIUM] Gmail OAuth refresh token stored in plaintext

Location: line 30 — the function that loads Google OAuth credentials.

The Gmail refresh token is saved as a plain JSON file on disk. Anyone who can read that file (backup, stolen laptop, server compromise) gains indefinite access to send and read email as the account owner — refresh tokens do not expire.

Fix: At minimum, ensure credentials/ is in .gitignore and file permissions are 0600; for stronger protection, encrypt the token at rest (for example via OS keyring or an encrypted env var) and document revocation via Google Account -> Security -> Third-party apps.

Classic defense-in-depth issue. No immediate exploitation, but the kind of thing you kick yourself over if it ever leaks.

[MEDIUM] HTML email bodies stripped with naive regex

Location: line 215 — the function that normalizes inbound email.

extract_plain_body uses a regex to strip HTML tags from inbound mail, which can leave script/style contents, encoded entities, or malformed markup in the plain text the classifier sees. If that text is later fed into an LLM prompt or surfaced to a user, attacker-crafted emails can smuggle content that was not visible as HTML.

The inbound email pipeline feeds the plain text version into an LLM classifier (to route support emails). Because downstream is an LLM, this is a prompt injection surface. A sophisticated spammer who knows we route email via LLM can craft HTML with hidden content in style tags or HTML comments that appears empty to a human recipient but becomes visible instructions in the LLM input.

Not exploited today. But it is the exact class of bug that becomes headline news in 18 months, and the fix is a 20-line swap to BeautifulSoup.

What this scan cost and what it missed

Input tokens: 176,364 with prompt caching across 4 batches
Output tokens: 779
Wall time: 22 seconds
Direct infrastructure cost: $0.90

Consultant equivalent would be 3-5 hours on a 124-file repo, billed $600-1500, producing a report that needs translation by an engineer to be actionable. The VibeScan report is in the language the buyer speaks and includes the exact line to change.

What the scan missed (honest limitations)

Business logic flaws like a checkout that trusts client-side prices.
Concurrency issues in state updates (requires runtime tracing, not static read).
Dependency vulnerabilities — we do not cross-reference package.json against CVE databases. Snyk does that better.
Production infra — we scan the code, not deployed infrastructure.

For a solo founder running an AI-coded app, the findings VibeScan catches are where the actual failures come from. For enterprise eng teams with dedicated security engineers, Snyk plus manual review plus threat modeling is the better playbook.

Try it on your own repo

If you shipped something with Lovable / Bolt / Cursor / Replit / v0 and you are about to take real money from real users — get a second set of eyes on the code first.

$49, one-time, PDF in ~10 minutes: systag.gumroad.com/l/vibescan

First 10 readers of this post get it for free — DM me the repo URL and I will send the PDF back within the day.

DEV Community