DEV Community

Velobase
Velobase

Posted on

AI can write the code. It can't tell you when the product is done.

I built an AI slide-deck generator on top of Velobase Harness — type a topic, get a presentation. The AI had a demo running almost immediately: it wrote an outline, generated the slides, exported a file. On screen, it looked finished.

Here's it running:

It wasn't. All it had proven was that one user could make one deck one time.

A real product is a different animal. It has to handle a hundred people generating at once, bill each of them correctly, recover when a step dies halfway, review its own output and redraw the bad parts, and export a PPTX that actually opens in PowerPoint. None of that showed up in the demo — and the AI didn't think to add it, because I hadn't told it to.

That turned out to be the whole lesson of the project: when you build with AI, the hard part isn't describing the feature. It's defining what "done" means.

Harness gave me the boring foundation — auth, payments, credits, admin, database, queues, object storage, deployment. That stuff shouldn't be rebuilt for every new idea, so I didn't. It let me point the AI at the only part that was actually mine: the PPT generation itself.

And that's where "looks done" and "is done" kept pulling apart. Four places made it obvious.

Concurrency. AI treats "it worked once on my machine" as "the feature is complete." But 100 users generating decks of 10–20 slides each is not one big task you run serially in a single worker. The instruction that actually works: split the pipeline into plan → slide → finalize queues, generate each slide as its own job, let workers scale out, and don't let third-party image polling hog a worker slot. Then write a test that proves concurrent generation works — not that one deck succeeds.

Billing. Tell the AI "deduct credits when generating," and it'll deduct once, up front, and call it done. A real product needs a state machine: reserve credits first, settle against what was actually spent (model tokens, image generation, redraws, export), refund on failure, pause when the balance runs out, resume after top-up. That's not a button handler.

Self-review. If retries happen silently in the backend, the user just stares at a spinner. To make self-review a feature, the AI has to persist intermediate results and surface the states — generating, checking, redrawing — along with scores and attempts. Now the wait reads as "the system is improving the slide," not "it's stuck."

Export. A slide that looks perfect in the browser can come out broken as PPTX — layers, fonts, aspect ratio, cropping all drift. So the requirement can't be "generate a PPTX file." It has to be "the exported PPTX must visually match the web preview."

Once I saw the pattern, the brief I gave the AI stopped being a feature list and started being acceptance criteria:

Do not build only a working demo.
Build this as a production-ready SaaS product.

This product is based on Velobase Harness.
Harness already provides auth, payments, credits, queues, object storage, and deployment.
Implement PPT generation as an independent business module.

Requirements:
1. Support at least 100 users generating presentations concurrently.
2. Split generation into plan, slide, and finalize queues.
3. Generate each slide independently with concurrency, retries, and failure recovery.
4. Connect all model calls, image generation, editing, and export to credit billing.
5. Pause generation when credits are insufficient and resume after top-up.
6. Run AI review and deterministic layout checks after each slide.
7. Persist intermediate states and show generation, self-check, and redraw progress in real time.
8. Upload generated images to object storage instead of relying on temporary provider URLs.
9. Prioritize visual consistency for PPTX export.
10. Tests must cover concurrency, billing, failure recovery, and export fidelity — not just one successful run.
Enter fullscreen mode Exit fullscreen mode

Here's what the project made unambiguous: AI is already good at writing code. It just doesn't know what makes something shippable. It will treat one successful run as completion, a local demo as a system, a page that renders as a correct export. The part only a human can supply isn't the feature description — it's the acceptance criteria, the engineering boundaries, the failure cases, the business rules.

That's the real split. Harness covers the common SaaS foundation so it's already there. AI fills in the business logic fast — once the boundaries are clear. Faster shipping doesn't come from thinking less. It comes from turning your thinking into sharper input.

If you've shipped something AI-built past the demo stage: what was the gap between "it ran" and "I'd let customers near it" that bit you hardest?

Top comments (0)