I built 14,000 lines of code before talking to a single user. Here's what I learned.

#learning #python #showdev #testing

10 weeks ago I was convinced I was building something nobody else had.

The idea: a GitHub App that reads every changed Python function in a PR, infers what it should always do — "the total should never be negative", "the result should never be None" — then attacks it with adversarial inputs in a hardened Docker sandbox. Only posts a comment when it has concrete proof something broke. No guessing. No noise. Just:

LogoMesh found 1 issue
Negative quantity bypasses checkout validation

Property: Order total should always be ≥ 0
I called: checkout(item_id=1, qty=-5)
Got: Order created with total -$49.95

207 unit tests passing. Docker sandbox with nobody user, --network=none, memory caps, randomized filenames. Time-Travel Trace that captures exact variable state at the crash frame. 14,000 lines of code.

Zero customer conversations.

Then I looked at the real numbers. 90% silence rate on real PRs. Of the findings it did post, most were sandbox artifacts — the tool generating tests that hit its own scaffolding, not real bugs. A validator that drops 86% of raw findings just to keep the noise low. p95 latency of 394 seconds against a 60 second target.

Classic mistake. Built first, validated never.

So I stopped and started talking to people. What I kept hearing wasn't "I need adversarial testing on my PRs." It was something simpler:

"When a prod bug hits I spend 30-45 minutes just writing the repro test before I can even start fixing."

Read the Sentry trace. Reconstruct the state. Write a test. Run it. Wrong inputs. Fix the test. Run it again. By the time it reproduces you've lost the whole debugging flow.

That's the actual pain. Not catching future bugs — dealing with the ones already in production.

So now I'm building something much simpler. Paste a Sentry URL, get a failing pytest that reproduces the exact crash, runs against your current branch, tells you "still reproduces" or "your branch fixed it." One command. Under a minute.

The engine I spent four weeks on isn't wasted - the Docker sandbox, the frame locals injection, the binary verdict reliability - those are exactly what make this work. The positioning just changed completely.

The question I'm still trying to answer:

Is the 30-45 minute manual repro step a universal pain or just my workflow being slow? Do you write a reproducing test before fixing a prod bug, or do you just read the trace, push the fix, and monitor?

Genuinely trying to figure out if I'm solving a real problem this time before writing another 14,000 lines nobody asked for.

DEV Community

I built 14,000 lines of code before talking to a single user. Here's what I learned.

Top comments (0)