I built a cross-platform AI SaaS solo in 12 weeks — 2,258 commits, and one set $50 on fire

#ai #webdev #showdev #tutorial

Twelve weeks ago: an empty repo. This morning: 2,258 commits, a desktop app shipping on macOS and Windows, a billing system, an admin dashboard, a 26-language marketing site — all live in production. Solo. No co-founder, no team, no funding.

I'm a full-stack dev, and yes, an AI coding agent did a big share of the keystrokes. But "#showdev: AI wrote my app" is the lazy headline and it's wrong. The interesting part is the engineering discipline that makes a solo + agent setup actually ship something that doesn't fall over — plus the handful of times it absolutely fell over.

Here's the stack, the dead ends, real config, and the lesson that cost me fifty dollars.

The stack (for the curious)

Backend: Cloudflare Workers (Hono + TypeScript), D1 (SQLite) for data, KV for sessions/cache, R2 for uploads, Vectorize for embeddings.
Clients: SwiftUI on macOS, Tauri 2 (React + Rust) on Windows. Real-time audio capture (system + mic), streamed to STT, answers rendered in a floating overlay.
Payments: Paddle as Merchant of Record.
The product: SubcueAI (subcue.ai) — a real-time AI interview assistant. It transcribes both sides of a call live and suggests talking points in an overlay; there's also a mock-interview practice mode.

Now the parts that hurt.

Lesson 1 — I threw away v1 on day two

The first version was a native macOS app built the "proper" way. It lasted ~36 hours. The second commit in the whole history is literally "migrate to backend (Cloudflare Workers) + macOS rewrite, remove legacy project."

Not a bold strategic call — I just realized within a day the architecture would make everything downstream (auth, sync, a website, an admin panel) three times harder. So I deleted it.

Your sunk-cost reflex is the most expensive bug you'll ship. Deleting two days of work in week one is cheap. Deleting it in month three is a funeral.

Lesson 2 — Five STT providers, and a bug you can't unit-test

Real-time transcription is where this app lives or dies. The git history is a graveyard of providers: Apple's built-in speech → whisper.cpp locally → Deepgram → ElevenLabs → OpenAI realtime transcribe. Five. Each looked great in a 90-second demo, then fell apart on latency, accents, dropped websockets, or cost.

The bug that taught me something: one provider had a silent 60-minute hard cap per session. Demos are 5 minutes, so I never hit it. Then someone does a real 70-minute interview and transcription dies at minute 60, cascading the whole pipeline into a dead state.

You don't catch that in a test. I caught it because I'd wired up client telemetry and saw failures clustering at the one-hour mark. The fix is a watchdog that proactively rotates the session before the limit:

// Rotate at 55 min — 5 minutes early, on purpose, so the handoff is invisible.
const SESSION_LIMIT_MS = 60 * 60 * 1000
const ROTATE_BEFORE_MS  = 5  * 60 * 1000

if (now - sessionStartedAt >= SESSION_LIMIT_MS - ROTATE_BEFORE_MS) {
  await rotateSttSession({ countsTowardReconnectQuota: false }) // smooth, not a "reconnect"
}

"Works in the demo" and "works in a 50-minute session on flaky hotel wifi" are different products. Instrument prod or you're flying blind.

Lesson 3 — I rebuilt the Windows UI four times (and lost a day to a rounded corner)

I wanted the Windows client to match the macOS one: a translucent, blurred, rounded-corner floating window. Sounds trivial. It is not.

I went WinUI 3 → Qt → Avalonia → Tauri 2. WinUI couldn't do the layered transparency. Avalonia, after real effort, couldn't hit macOS-grade rounded + transparent + blur on the Windows desktop. I have an entire afternoon of commits that are just me fighting window chrome: DWM constants typed as the wrong integer width, Mica vs. Acrylic, vibrancy, CSS backdrop-filter, a 1px white border that would not die. One commit message: "drop CSS chrome border — stacked with DWM edge looked 2-3px thick." A whole day. For a corner.

The fix wasn't willpower. I moved to a webview shell (Tauri) where the rounding is one line:

.window-chrome {
  border-radius: 12px;
  background: rgba(16, 16, 26, 0.72);
  backdrop-filter: blur(22px);
}

If you've spent a day fighting the framework for something cosmetic, the framework is the bug. Change the tool, not your willpower.

Lesson 4 — Payments is the real mountain (and the most boring one)

Everyone wants to talk about the AI. Nobody wants to talk about the thing that decides whether you have a business: getting paid.

I started on Stripe, then moved to Paddle as a Merchant of Record. For a solo dev that distinction is huge — the MoR becomes the legal seller and handles global sales tax / VAT. I'm not going to become a tax expert in 40 jurisdictions; I'll trade a few points of margin for not getting a letter from a tax authority I've never heard of.

Then the lifecycle, which I waded through commit by commit: daily-quota → credits balance; upgrades that prorate immediately; downgrades that have to defer to period end (the provider has no native "change my price later," so you store a pending_plan and a sweeper applies it at the boundary); cancellations that end at period close with a captured reason; resume; and webhooks reconciling all of it so the source of truth never drifts.

The unsexy 80% — billing, auth, i18n, infra — is where the real work and the real moat live. It's not the dessert, it's the meal.

Lesson 5 — A cron job set $50 on fire while I slept

My favorite disaster. I'd built an automated content pipeline. A scheduled job kicked off a publish step on a background task that got cancelled; the platform read the cancellation as failure and retried ~34 times; each retry fired ~25 paid LLM calls. I woke up to a very calm dashboard and a not-calm bill.

The damage was small. The shape of it was the lesson: automation + an AI agent moves fast in every direction, including "on fire," silently, overnight. The fix is a set of money-safety invariants I now treat as sacred:

# wrangler.toml — the money-safety contract
[[queues.consumers]]
queue       = "content-publish"
max_retries = 0            # a failed paid job must NOT auto-retry into a loop
# NO dead_letter_queue — a DLQ feeding back in is how you bill yourself 34x
max_batch_size = 1

// Consumer: idempotent, always ack, never re-throw.
export async function queue(batch, env) {
  for (const msg of batch.messages) {
    try {
      await runPublish(env, msg.body.draftId) // guarded: no-op if already published
    } catch (err) {
      logError(err)        // swallow — re-throwing tells Queues to redeliver = pay again
    } finally {
      msg.ack()            // ALWAYS
    }
  }
}

Give your robots a blast radius. Before you automate anything that costs money, ask "what's the worst case if this loops?" — and cap it.

How I actually drive the AI agent

"I used AI" tells you nothing. Here's the mechanics that made the volume sustainable:

1. A living instructions file is the whole game. The highest-leverage artifact in the repo isn't code — it's a long, dense instructions doc (mine is a CLAUDE.md) encoding invariants the agent reads every session: how auth works and must never change, the billing identity, the money-safety rules above, naming conventions. It's an external brain that survives the agent's per-session amnesia. An entry looks like:

- change-plan endpoints may ONLY PATCH an existing provider_subscription_id
  or write pending_* columns — NEVER create a second subscription.
  (This was the double-charge root cause. Do not "helpfully" refactor it.)

2. Write down the NOs, not just the YESes. Half the value is negative space — explicit vetoes like "do not add this schema type, it was removed on purpose." Without them, a well-meaning agent re-introduces the exact thing you deleted, every time.

3. Let it own whole subsystems — once the rails exist. I let the agent build the entire subscription lifecycle and the content pipeline mostly end-to-end. That only works because the invariants were written first. Rails before speed. Speed without rails is Lesson 5.

4. Persist learnings across sessions. Every "oh, that's why" gets saved — gotchas, decisions, the difference between a real bug and expected weirdness. Future-me and the agent stop relearning the same things.

5. The mental model: it's a brilliant junior who forgets everything overnight and never pushes back. Your job shifts from typing to writing the spec, drawing boundaries, reviewing every diff, and owning the judgment calls it can't make. You become the senior doing only the senior parts.

And the catch, so this doesn't read like an ad: speed without taste is dangerous. It will confidently do the wrong thing, fast (see: $50, on fire). The guardrails, architecture, and judgment are still entirely on you. The agent deletes the typing; it does not delete the engineering.

TL;DR for solo builders

Commit count isn't the flex. Knowing what to delete is.
The boring 80% (billing, auth, i18n, infra) is the moat. Anyone can wire up an LLM call.
Positioning compounds faster than features.
An AI agent removes the typing and amplifies the judgment — for better and worse. Bring the judgment.

The product this was all for is SubcueAI → subcue.ai. I'm @imaaroncao on GitHub and @real_aaron_cao on X. Questions welcome in the comments — including the dumb ones, I clearly have plenty.