Shipping full-stack apps from a prompt: what AI gets right, and where you still think

#webdev #fullstack #ai #programming

AI app builders crossed a line in the last year. They went from generating throwaway demos to scaffolding apps people actually deploy and charge money for. I've been shipping full-stack apps this way, and the honest engineering picture is more interesting than either the hype or the backlash. Here's the breakdown.

What the model reliably gets right

The boring 80% is genuinely solved:

Scaffolding and routing. A coherent project structure, sane file layout, a working router. No more staring at an empty directory.
CRUD and boilerplate. Endpoints, form handling, validation stubs, an auth flow that works on the first run.
Schema-to-UI. Give it a data shape and it wires up the list/detail/edit screens that would have eaten your afternoon.
Deploy config. Build scripts, env wiring, the stuff everyone copies from a previous project anyway.

If your reaction is "that's just the easy part" - that's exactly the point. The easy part is most of the typing.

Where you still have to think

The model produces a plausible answer, not the correct one for your domain. Four places it consistently needs a human:

Data modeling. It will pick a reasonable-looking schema, but your invariants, relationships, and what-must-never-happen rules are domain knowledge it doesn't have. Review the data model before you look at a single screen.
Edge cases and error states. The happy path is free. Empty states, partial failures, concurrent edits, the 3am "what if this is null" cases - those are still yours.
Security boundaries. Authorization (not just authentication), input validation at trust boundaries, and "can user A read user B's row" are where generated code is most confidently wrong. Read every access check.
Behavior under real load. The query that's fine with 10 rows and melts at 100k. Profiling is human work.

The part people underestimate: running it

A generated app isn't done when it renders locally. It's done when it has a real backend, a database, HTTPS, a custom domain, and a way to undo a bad deploy. That last one matters most: the first time an AI-assisted change breaks prod, one-click rollback is the difference between a shrug and an outage. This is the half of the problem the "watch it build a landing page in 30 seconds" demos skip - and it's the half that decides whether you can actually trust the output. A platform like Playcode Cloud runs the generated app with the database, hosting, previews, and snapshots already wired, so "ship it" and "roll it back" are buttons, not a weekend.

A workflow that holds up

What's worked for me:

Describe the outcome, not the stack. "An appointment booking app for a salon with SMS reminders," not "a Next.js app with a Postgres table."
Review the data model first, UI second. If the schema is wrong, every screen is wrong.
Deliberately test the error paths before the happy path. That's where generated code is thin.
Ship behind previews, keep snapshots, and treat rollback as a normal operation, not an emergency.

The takeaway

AI doesn't remove engineering judgment - it relocates it. You spend less time typing boilerplate and more time reviewing decisions: is this schema right, is this access check correct, what happens when this fails. That's a better use of the hours, but it's still engineering.

If you want to try the describe-it-and-it-builds-it loop on a real full-stack app, with the backend and hosting handled for you, that's what Playcode is built for. Either way: let the model write the boilerplate, and keep your judgment on the parts that decide whether the app actually works.