I ship 15 web products solo — here's my AI-assisted workflow

#ai #buildinpublic #productivity #webdev

I'm a solo developer running 15 web products in production. SaaS tools, programmatic SEO sites, an npm package, a brand kit generator. No team, no cofounder, no agency. Just me, a standardized stack, and Claude Code as a copilot embedded in every step of the pipeline.

This is not a "look how smart I am" post. Half of these products make zero revenue. But the system that lets me ship them fast, maintain them, and kill the losers early — that part works. Here's how.

The factory mindset

I call my system "L'Usine" (French for "the factory"). It's a monorepo of markdown files, shell scripts, checklists, and prompt templates that wraps around Claude Code. The idea is simple: standardize everything so the AI can be useful.

When every project uses the same stack (Next.js, TypeScript, Tailwind, Prisma, Stripe), the same file structure, and the same conventions, the LLM doesn't have to guess. It knows where the DB singleton lives. It knows "use client" goes on leaf components only. It knows select not include on listing pages.

The whole pipeline is one command: /build. It walks through phases — Discovery, Spec, Design, Scaffold, Code, QA, Ship — with a gate at each step. The AI proposes, I decide.

What a real session looks like

Here's an actual example. I needed LMNP Facile, a one-shot tool that generates French tax declarations for rental property owners. 49 EUR, no subscription, peak season in April-May.

Prompt I gave Claude Code:

/build new
SaaS one-shot 49 EUR. Generates LMNP tax declarations (cerfa 2031/2033) for French micro-BIC landlords. User inputs property info + rental income, app generates pre-filled PDFs ready to submit to tax authorities.
Target: non-accountant landlords who don't want to pay 500 EUR/year for an expert-comptable. Season: April-May.

Claude walked me through Oracle scoring (PAIN 22/25, MONEY 20/25), generated the brief, built the Prisma schema, scaffolded routes with the standard template. The whole spec-to-scaffold phase took about 90 minutes of interactive work.

Then for the code phase, Claude worked inside the project directory with its own AI-CONTEXT.md — a machine-readable spec file that keeps the LLM on track across sessions. No "remind me what this project does." The context is always there.

The multi-agent harness

For more ambitious builds, I use a multi-agent architecture: three separate Claude processes chained via a bash script.

Planner (1M context) → plan.md → Generator (1M context) → code → Evaluator (1M context)
↑ |
└── feedback if FAIL ──────────┘

Each agent gets a fresh 1M-token context window. No pollution between them. The planner generates an ambitious feature list. The generator builds it feature by feature. The evaluator runs npm run build, reads the code, and scores it across functional completeness, infrastructure (security headers, Prisma indexes, SEO setup), and design quality.

If the score is below threshold (default 7/10), the evaluator's feedback loops back to the generator for another round. If the score stagnates after two rounds, the system kills the loop — because a fresh context will do better than patching a degraded one.

This is not autonomous AGI magic. It's structured prompt engineering with file-based communication. But it lets me prototype a full SaaS in hours, not days.

The 682-line QA checklist

Here's the thing nobody tells you about AI-generated code: it compiles but it doesn't ship.

The LLM will forget security.txt. It will use <img> instead of next/image. It will slap shadow-2xl on every card and bg-gradient-to-r from-purple-500 to-pink-500 on every heading. It will skip @@index on your Prisma userId fields. It will put Stripe secret keys in NEXT_PUBLIC_ variables.

So I built a 682-line QA checklist organized in three tiers:

T1 BLOQUANT (24 items) — no deploy without these. Legal pages, security headers, robots.ts blocking AI bots, poweredByHeader: false, correct French accents in all data.
T2 CRITICAL (first week) — structured data, OG images, Prisma select on listings, font display: "swap", loading skeletons.
T3 GROWTH (month 1-3) — CRO, content scaling, domain authority.

The evaluator agent in the harness grades against this checklist automatically. In my manual workflow, I run /qc which loads the checklist and audits the project. It catches things like: "you have 437 occurrences of missing French accents in your data files" (true story from one of my programmatic SEO sites).

I also maintain a design knowledge base extracted from auditing Linear, Vercel, and Stripe. It defines exact shadow hierarchies (shadow-sm for cards, shadow-xl for modals only), radius rules, transition durations (150ms for color, 200ms for transforms), and a list of things elite products never do. This is what keeps the output from looking like a free template.

The `/sample` command: parallel exploration

When I need creative variation — landing pages, UI components, pricing sections — I use /sample. It launches N parallel Claude agents, each in an isolated git worktree, each with a different creative brief.

Example: /sample 3 "LP minimaliste SaaS facturation" generates three candidates:

A: Minimal radical (inspired by Resend, near-zero color)
B: Warm and accessible (inspired by Notion, serif headings)
C: Data-driven and dense (inspired by Linear, dark option)

After generation, deterministic checks run on each: build pass/fail, TypeScript errors, grep for AI-slop patterns (shadow-xl, hover:scale-105, animate-bounce). Then I compare and pick. Or cherry-pick elements from multiple candidates.

I used this heavily for LeCapybara, which is now my gold standard — every new project clones its patterns.

What actually works and what doesn't

Works great:

Scaffolding. Going from idea to running npm run dev with auth, DB, Stripe, SEO setup in under 2 hours.
Repetitive infrastructure. Security headers, sitemap generation, structured data — the LLM nails this when given a checklist.
Programmatic SEO at scale. I run 10 comparison sites covering French B2B software niches. Same template, different data. AI generates the initial content, Indxel (my own npm package / SEO crawler) validates quality.
Brand generation. OneMinuteBranding started as an internal tool to generate brand kits for my own projects. Then someone paid 49 EUR for it.

Doesn't work (yet):

Revenue. 147 EUR total across 15 products. The factory ships fast but distribution is the bottleneck.
Design taste. The LLM's default aesthetic is "startup template from 2022." Without the design KB and the anti-slop patterns, every output looks the same.
Complex business logic. Tax calculation rules, regulatory compliance, financial formulas — these need manual verification every time.

The uncomfortable truth

I built an elaborate system to ship products fast. But "shipping fast" is the easy part. Finding paying customers is the hard part, and no amount of AI assistance changes that.

The system does give me one advantage: I can afford to experiment. When you can go from idea to production in a weekend, you can test 15 ideas and see what sticks instead of betting everything on one. Two products have actual sales. Most have growing search traffic. The factory is young.

If you want to try something similar, start with the boring parts: standardize your stack, write down your conventions in a machine-readable format, build checklists for the things you keep forgetting. The AI gets dramatically better when it has guardrails.

You can find my work at yann-lephay.com. The factory keeps shipping.