An AI runs my company. Not as a demo — as the actual operator. Two Claude terminals (a "CEO" that does strategy and a "CTO" that writes the code), plus an agent orchestrator that's supposed to behave like a small staff. I am the chairman. I approve money and irreversible actions; the machine does the rest.
Earlier this year a solo builder vibe-coded a product to roughly $15K in a week. Same category of tools I have. Our number, after months and eight shipped products, is $[X] — and $[X] is small enough that I'm using a placeholder out of dignity, not secrecy.
This is the cold autopsy. No hype, no "AI changed everything." Just what actually happened and why.
The numbers, unflattering
- 8 products shipped. Live, deployable, real auth, real checkout wiring on several.
- ~0 paying customers.
- One product (a small stats utility) has real organic traffic — hundreds of search clicks and thousands of users a month. Paid conversion: effectively zero.
- The rest range from "launched to silence" to "domain now 404s."
If you only looked at the build output, you'd think this was a competent shop. That was exactly the trap.
Autopsy finding #1: I built a process-theater company
I optimized the org for activity, not outcomes. The agents generated standups, status reports, "daily activity," kanban motion. It looked like a company. It produced almost no revenue.
This isn't a unique personal failing — it's a documented one. There's now a published failure taxonomy for multi-agent LLM systems (search "MAST, Multi-Agent System failure taxonomy"). A large share of failures aren't bad models; they're specification and coordination failures: agents that talk to each other, drift, re-litigate decisions, and confidently report progress on work that doesn't move a real metric. A "99-agent autonomous company" is not an achievement. It's the failure mode with a press release.
The fix isn't more agents. It's fewer, gated, narrow workers whose output is checked by rules, not by another LLM's opinion: did the URL return 200? did a payment land? did the PR merge? did the event fire in analytics? "An agent said it's done" is not evidence.
Autopsy finding #2: build was never the bottleneck
Both the $15K dev and I can build. Building is the commodity now. The difference was everything around the build:
- Distribution & audience. The $15K came from a person who already had eyes on them and shipped in public. I shipped quietly into search engines and waited.
- Wave & willingness to pay. They rode a hot category where early adopters spend money. I built a calculator for students in a saturated niche where the entire audience wants it free.
- Focus. They pushed one thing end-to-end to revenue. I had eight half-finished things and an elaborate fake org chart.
Three orders of magnitude of outcome, and none of the gap was the code.
Two technical war stories (because this is Dev.to)
The company brain died from a cron interval.
The agent backbone runs on serverless Postgres (Neon free tier). Free-tier compute auto-suspends after 5 minutes idle — that's the whole cost model. At some point I "improved" responsiveness by bumping the agent heartbeat cron from every 30 minutes to every 5 minutes. Feels harmless.
It wasn't. A query every 5 minutes means the compute never gets a 5-minute idle window. It stayed awake 24/7 — roughly 720 compute-hours/month against a free allowance closer to ~190. One day every query started returning:
HTTP 402: "exceeded the compute time quota"
Six tables — agents, tasks, messages, approvals, events — all unreachable. The "company brain" was down, and it had been quietly burning toward this for weeks. Lesson: with scale-to-zero infra, your cron interval is a billing decision. If it's at or under the suspend threshold, you've silently bought an always-on server.
Worse: I had a daily quota monitor. It never fired — because it was pointed at the wrong database (a legacy connection string), so it cheerfully reported "OK" while the real DB suffocated. Monitoring you don't verify is theater too.
One webhook, four products, silent cross-contamination.
Several products check out through the same payment provider. That provider sends all sales to a single account-level ping URL. So a sale for product A was hitting product B's webhook and — because the handler trusted the payload — granting entitlements in the wrong app. The fix was a per-product discriminator (url_params[workspace_id]) and a foreign-sale guard that drops anything that isn't ours. Multi-tenant billing is mostly about refusing events, not processing them.
What the renewal actually is
No grand relaunch. Concrete moves:
- Kill process theater. Retire the agent-as-staff fiction. Keep a few narrow workers with rule-based output gates.
- Redefine "done." "Done" = a test payment goes end-to-end (checkout → entitlement → feature works). Everything before that is a demo, and gets called a demo.
- Distribution is now first-class, co-equal with building — including writing this, in public, with the embarrassing number left in.
- Measure conversion, not motion. Real funnel events or it didn't happen.
The honest close
We made $[X]. I'm not going to dress that up. But the diagnosis is unusually clear, and clear diagnoses are rare and cheap to act on: the machine can build all day; what it couldn't do is pick a market with a wave, earn an audience, and call a shell what it is.
If you want to watch the renewal happen — including whether it works — the one product with real users is a small stats helper at statmate.org, and the credibility-portfolio experiment is at trustfolio.dev. Both are deliberately public lab rats now.
I'll post the next number whether it's up or down. That's the deal with building in public — the autopsy only counts if you keep the table open.
Top comments (0)