Yonatan Naor

Posted on Mar 28 • Originally published at thicket.sh

How We Built 23 Websites in 48 Hours with AI Agents

#ai #webdev #programming #productivity

This is not a thought experiment. In March 2026, a single developer and a team of eight Claude Code agents built and deployed 23 production utility websites in 48 hours.

No templates-as-a-service. No landing page clones. Real tools — a paycheck calculator that handles all 50 US states, fitness calculators using validated medical formulas, PDF tools that process files entirely in the browser. Here's exactly how we did it.

The Problem

The internet is full of utility websites that are either plastered with ads, painfully slow, or both. Try to calculate your paycheck withholding? You get a page that takes 8 seconds to load, shows you 4 interstitial ads, and gives you an answer you're not sure is correct.

We wanted to build fast, clean utility sites at scale. One at a time was too slow. The question: could AI agents do the heavy lifting?

The Architecture: Karpathy's Autoresearch, Applied

We took Andrej Karpathy's autoresearch pattern — AI systems that propose, execute, measure, and iterate — and applied it to an entire web business.

The key insight is the ratchet mechanism: changes are kept if metrics improve and reverted if they don't. The system can only move forward.

Here's the structure:

traffic-empire/
├── agents/           # 8 specialized agents, each with its own CLAUDE.md
│   ├── ceo/          # Orchestrates weekly cycles
│   ├── research/     # Finds high-value niches
│   ├── designer/     # Creates brand identity
│   ├── builder/      # Implements and deploys
│   ├── editor/       # Commissions and approves articles
│   ├── content/      # Publishes to site repos
│   ├── seo-geo/      # Search + LLM optimization
│   └── auditor/      # Reviews everything, improves instructions
├── registry/
│   ├── registry.json # Source of truth for all sites
│   └── eval.md       # Immutable evaluation contract
├── packages/
│   └── base-site/    # Shared components, analytics, GEO handlers
└── sites/            # 23 site repos as git submodules

The Agent Org Chart

This isn't one AI doing everything. It's a team with clear roles:

Agent	Job
CEO	Reads data, makes build/improve/deprecate decisions
Research	Scores niches on search volume, competition, monetization
Designer	Creates brand identity — colors, fonts, component patterns
Builder	Scaffolds, codes, deploys, verifies with curl checks
Editor	Runs a virtual newsroom with 5 writer personas
Content	Publishes approved articles to site repos
SEO/GEO	Optimizes for Google + LLM discovery (llms.txt, JSON-LD)
Auditor	Grades every agent A-D, rewrites underperformers' instructions

The auditor is the most important agent. When something goes wrong, it doesn't just fix the output — it fixes the agent's instructions so the same mistake doesn't happen again. This is the self-improvement loop.

The 48-Hour Timeline

Hours 0-4: Foundation

The human (one developer) wrote the master CLAUDE.md — the project constitution. He defined agent roles, the cycle protocol, the ratchet mechanism, and the evaluation contract.

Then he built the base-site package: shared Next.js components for layouts, analytics (GA4), cookie consent, JSON-LD, and GEO endpoints. Every site inherits this through a git submodule.

Hours 4-12: Research & Design

The Research agent scored niches on three dimensions:

Search volume: How many people need this tool?
Competition: How good are existing solutions?
Monetization: Can this generate revenue?

First batch approved: fitness calculators, paycheck calculators, AI directory, image tools, text utilities, color tools, VPN comparison.

For each niche, the Designer agent created a complete brand identity in a JSON spec: name, tagline, color system, typography, icon direction, component patterns. CalcFit got clinical blues. PayScale Pro got trust-signaling navy. Pixelry got creative magentas.

Hours 12-36: Building

The Builder consumed each design spec and produced a production-ready Next.js site:

Scaffold from the nextjs-base template
Apply brand config (colors, fonts, patterns)
Build all tools with real, functional logic
Wire up base-site submodule
Configure Netlify deployment
Set up DNS via Cloudflare (*.thicket.sh subdomains)
Verify with curl checks
Register in registry.json

These aren't thin wrappers. The paycheck calculator handles all 50 US states with different tax rules. The fitness calculators implement validated medical formulas (Mifflin-St Jeor, Katch-McArdle). The PDF tools process files entirely in the browser — no server uploads.

Hours 36-44: Content & SEO

Every site got:

/llms.txt and /llms-full.txt — structured endpoints for AI crawlers
Schema.org JSON-LD on every page
Optimized meta titles targeting long-tail queries
XML sitemaps, Open Graph tags, Twitter Cards

The Content agent produced deep research articles — not 300-word keyword-stuffed pieces, but 1,500+ word articles with citations and genuine analysis.

Hours 44-48: Audit & Launch

The Auditor reviewed everything: build errors, broken links, missing meta, GEO compliance. It graded every agent and wrote recommendations for the next cycle.

Final tally: 23 live sites.

What We Built

Calculators (8): CalcFit (fitness), PayScale Pro (paycheck), MoneyLens (finance), NestCalc (pregnancy), QuickPercent (percentages), TimeSnap (age/date), LoanWise (loans), KeyRate (mortgage)

Utilities (6): TextKit (text tools), Pixelry (image tools), DocForge (PDF tools), KeyForge (password tools), QRForge (QR codes), CaptionSnag (YouTube transcripts)

Finance (3): StackSats (crypto), FundDuel (ETF comparison), KeyRush (typing test)

Directories & Content (4): ToolPilot (AI directory), Chromatic (design/color), Quizzly (quizzes), TrendWatch (trending topics), ShieldVPN (VPN comparison)

All server-rendered Next.js. All with analytics. All with GEO endpoints. All on *.thicket.sh subdomains.

5 Things We Learned

1. The template pattern is everything. Without the shared base-site package, each site would be a snowflake. The template gave us consistency at scale — same analytics, same cookie consent, same GEO endpoints, same build pipeline. The designer's brand config is just a thin layer on top.

2. Agent specialization > agent count. Eight agents with clear roles outperformed one agent doing everything. The Research agent doesn't write code. The Builder doesn't write content. Focused instruction sets prevent the "jack of all trades" problem.

3. The ratchet prevents regression. Without the rule that portfolio score must not decrease, the system could make locally good but globally bad changes. Simple constraint, prevents a huge category of mistakes.

4. Git is underrated as AI memory. When an agent starts work, it reads git log. It sees what was tried before and what worked. This is institutional memory without a separate database.

5. The auditor is the most important agent. Bad outcomes don't just get fixed — they improve the system's ability to avoid similar outcomes in the future. Self-improvement made the system anti-fragile.

What's Next

The 48-hour sprint was the beginning. The system runs continuous weekly improvement cycles:

Analytics checks what's working
Research finds new opportunities
CEO decides what to build, improve, or deprecate
Specialists execute
Auditor ensures quality
Repeat

We're documenting everything transparently — wins, failures, metrics, decisions. Not a polished corporate narrative. A real-time log of AI agents with real autonomy and real accountability.

DEV Community