DEV Community

Harjot Singh
Harjot Singh

Posted on

How we built a 14-agent pipeline that ships a deployed app + launch assets in ~7 minutes

Most AI app builders stop at "deployed." You prompt, you get a repo, maybe a preview URL, and then the actual work starts: wiring a domain, writing the landing copy, cutting screenshots, drafting the launch thread. We wanted the pipeline to stop at "launched" instead, so we built one. This is how it works under the hood, including the parts that broke.

The product is Moonshift. One prompt triggers 14 specialized agents across 10 phases. Average run is ~7 minutes and ~$3 in API spend, with a hard $5 ceiling that aborts the run. Everything ships to your Vercel, your GitHub, your database. This post is the engineering, not the pitch.

The core problem: parallel agents drift

The naive version of "many agents build an app" falls apart fast. If a backend agent and a frontend agent both work from a vague English spec, they invent incompatible contracts. The backend returns { user_id }, the frontend reads userId, and you find out at runtime in production.

Our fix is a planner that emits a JSON contract first. Before any code is written, one agent produces a typed contract: routes, request/response shapes, table schemas, env vars, page list. That contract is the single source of truth. Backend, frontend, database, and test agents all build against it in parallel instead of against prose.

Then a contract-validator agent runs after the parallel build and diffs the actual code against the contract. When the frontend's fetch shape doesn't match the backend's handler, the validator doesn't just flag it. It patches the mismatch. This one agent removed the largest single class of "looks done, 500s on click" failures we had.

The 10 phases

  1. Plan - generate the JSON contract.
  2. Scaffold - lay down the framework skeleton (Next.js, config, deny-globs that protect files agents shouldn't touch).
  3. Backend - API routes and server logic against the contract.
  4. Frontend - pages and components against the same contract.
  5. Database - schema + migrations (Drizzle + Turso).
  6. Validate - contract-validator reconciles 3-5 in parallel, auto-fixes drift.
  7. Test + fix - generated tests run; a fixer loop addresses failures.
  8. Deploy - ships to your Vercel via your token.
  9. Audit - security and a11y passes on the live deployment.
  10. Market + publish - a marketer agent drafts X and LinkedIn launch posts in your voice, image-gen produces hero images, and a publisher gates everything behind your one-tap approval.

The interesting phases are 6, 7, and 10. Everyone has a code-gen step. Almost nobody has a reconcile step, a real fixer loop, or a phase whose only job is launch assets.

Reliability: not every failure is equal

Long multi-agent runs fail in boring ways: a rate limit, a flaky deploy, a model that returns prose where you asked for JSON. If you retry all of them the same way, you either give up too early on transient errors or burn money death-looping on deterministic ones.

We run a failure classifier that buckets every failure into transient, deterministic, or permanent:

  • transient (429s, network blips, stream idle) - retry with backoff.
  • deterministic (a test that fails the same way every time) - hand to a fixer agent, don't blindly retry the same call.
  • permanent (bad auth, missing token) - stop and surface it. No point spending more.

Retries are capped on three axes at once: per-phase, per-agent, and a global per-run ceiling. The global cap is what keeps a single bad run from quietly turning into a $40 bill. Combined with the hard $5 abort, the worst case is bounded and visible instead of a surprise invoice.

A subtle one we hit: a long LLM stream can go idle without erroring (the upstream connection gets severed but the socket never closes). A naive loop waits forever. We added an idle watchdog in the agent loop so a silent stall is treated as a transient failure and retried, instead of hanging the whole run.

Design constraints that shaped everything

  • Your infra, zero lock-in. Code lands in your GitHub, the app deploys to your Vercel, the database is yours. Cancel the subscription and you keep a working product. This forced the deployer to operate purely through user-supplied tokens, which is more work than deploying to our own infra but is the entire point.
  • The publisher physically cannot post without you. Social publishing is gated per post, per platform, behind an explicit human tap. Autonomy ends at the point where it would speak as you in public. Generation is automatic; publishing is not.
  • Hard cost ceiling. $5/run, enforced mid-run, not reconciled after. Agents check remaining budget before expensive calls.

Stack

Next.js for web, Drizzle + Turso (libSQL) for data, Playwright for browser automation in the marketing/audit phases. The orchestrator is a separate runtime from the web app, spawned per run from source so a fix ships without a full rebuild.

Honest lessons

  • A typed contract beats a smarter prompt. We spent weeks trying to make agents "just agree." Making them agree on a machine-checkable artifact was the actual fix.
  • A reconcile phase is worth more than a better code-gen model. Catching drift after the fact, cheaply, beat every attempt to prevent it perfectly up front.
  • Classify failures before you retry them. Uniform retry is how multi-agent systems burn money and still fail.
  • Bound the blast radius in money, not just time. A per-run dollar cap is the single most important guardrail in an autonomous pipeline that calls paid APIs in a loop.

If you want to see the output, the first run is free with your own API key at moonshift.io. Happy to answer architecture questions in the comments.

Top comments (1)

Collapse
 
nimay_04 profile image
Nimesh Kulkarni

That is solid high level stuff.
Your project is awesome but I saw some of the preview websites listed on landing page.. bro it is still slop. The architecture is great but the real client intrested more on ONE SHOT ui. For that you can use Skills , there are some of the skills like

  1. Ui ux pro
  2. Design.md (not skill but design inspiration)

So LLM can do intresting genration.

What say?