Claude Code can build most of an app in a day. The hard part isn't getting something that runs. It's getting something you'd confidently put in front of real users. The gap between those two is where nearly every project I've shipped with it started to unravel.
The fix isn't a cleverer prompt. It's process. Here's what actually moved my builds from demo to something I'd deploy.
Write the thing down before you write any code. Not a sentence in a prompt, an actual page: what it is, who it's for, what data it touches, the threat model, and what's explicitly out of scope. This feels like procrastinating. It isn't. A clear brief is the only reason the agent makes consistent choices across a few hundred files instead of contradicting itself by file 40. If you can't write it down in a page, the agent can't build it coherently.
Decide the rules that must always hold, up front. "Every query is scoped to the tenant." "Money is integer cents, never a float." "No endpoint returns another user's row." Write those down before the build, because then they're things you can check instead of things you hoped it remembered. Pin them in a file the agent reads, or it'll quietly invent three different patterns for the same thing across three sessions.
Build in a loop, not one shot. Implement a piece, test it, run it through real checks: typecheck, the test suite, a security pass, a smoke test that actually clicks through the feature. The part that matters is what happens on a failure: it goes back and gets fixed, and then everything re-runs. A build earns "done" by passing after the last change, not by passing once at the start.
And do not trust what it tells you. This is the step everyone skips and the one that bites. An agent on a long task will report things as finished that aren't there. It'll mark a control "done" in its notes while the code behind it was never written. So you need something independent, ideally a fresh context with no memory of the plan, to grep the actual code for every control it claims. I've watched that catch encryption that three documents swore was applied and that did not exist.
The upside nobody mentions: do it this way and you don't get a black box. You get a real repo. Real stack, real tests, an architecture you can read. You own it, extend it, deploy it wherever you want. That's the whole case over a no-code generator, which is great right up until you need it to do the one thing it wasn't built for.
None of this is push-button. You're still driving, still deciding, still on the hook. The model writes the code. The process is what makes the code trustworthy.
(I build MDLC, which packages this loop so you don't reassemble it every project. The workflow works without it too.)
Top comments (0)