This is not a thought experiment. In March 2026, a single developer and a team of eight Claude Code agents built and deployed 23 production utility websites in 48 hours.
No templates-as-a-service. No landing page clones. Real tools — a paycheck calculator that handles all 50 US states, fitness calculators using validated medical formulas, PDF tools that process files entirely in the browser. Here's exactly how we did it.
The Problem
The internet is full of utility websites that are either plastered with ads, painfully slow, or both. Try to calculate your paycheck withholding? You get a page that takes 8 seconds to load, shows you 4 interstitial ads, and gives you an answer you're not sure is correct.
We wanted to build fast, clean utility sites at scale. One at a time was too slow. The question: could AI agents do the heavy lifting?
The Architecture: Karpathy's Autoresearch, Applied
We took Andrej Karpathy's autoresearch pattern — AI systems that propose, execute, measure, and iterate — and applied it to an entire web business.
The key insight is the ratchet mechanism: changes are kept if metrics improve and reverted if they don't. The system can only move forward.
Here's the structure:
traffic-empire/
├── agents/ # 8 specialized agents, each with its own CLAUDE.md
│ ├── ceo/ # Orchestrates weekly cycles
│ ├── research/ # Finds high-value niches
│ ├── designer/ # Creates brand identity
│ ├── builder/ # Implements and deploys
│ ├── editor/ # Commissions and approves articles
│ ├── content/ # Publishes to site repos
│ ├── seo-geo/ # Search + LLM optimization
│ └── auditor/ # Reviews everything, improves instructions
├── registry/
│ ├── registry.json # Source of truth for all sites
│ └── eval.md # Immutable evaluation contract
├── packages/
│ └── base-site/ # Shared components, analytics, GEO handlers
└── sites/ # 23 site repos as git submodules
The Agent Org Chart
This isn't one AI doing everything. It's a team with clear roles:
| Agent | Job |
|---|---|
| CEO | Reads data, makes build/improve/deprecate decisions |
| Research | Scores niches on search volume, competition, monetization |
| Designer | Creates brand identity — colors, fonts, component patterns |
| Builder | Scaffolds, codes, deploys, verifies with curl checks |
| Editor | Runs a virtual newsroom with 5 writer personas |
| Content | Publishes approved articles to site repos |
| SEO/GEO | Optimizes for Google + LLM discovery (llms.txt, JSON-LD) |
| Auditor | Grades every agent A-D, rewrites underperformers' instructions |
The auditor is the most important agent. When something goes wrong, it doesn't just fix the output — it fixes the agent's instructions so the same mistake doesn't happen again. This is the self-improvement loop.
The 48-Hour Timeline
Hours 0-4: Foundation
The human (one developer) wrote the master CLAUDE.md — the project constitution. He defined agent roles, the cycle protocol, the ratchet mechanism, and the evaluation contract.
Then he built the base-site package: shared Next.js components for layouts, analytics (GA4), cookie consent, JSON-LD, and GEO endpoints. Every site inherits this through a git submodule.
Hours 4-12: Research & Design
The Research agent scored niches on three dimensions:
- Search volume: How many people need this tool?
- Competition: How good are existing solutions?
- Monetization: Can this generate revenue?
First batch approved: fitness calculators, paycheck calculators, AI directory, image tools, text utilities, color tools, VPN comparison.
For each niche, the Designer agent created a complete brand identity in a JSON spec: name, tagline, color system, typography, icon direction, component patterns. CalcFit got clinical blues. PayScale Pro got trust-signaling navy. Pixelry got creative magentas.
Hours 12-36: Building
The Builder consumed each design spec and produced a production-ready Next.js site:
- Scaffold from the
nextjs-basetemplate - Apply brand config (colors, fonts, patterns)
- Build all tools with real, functional logic
- Wire up base-site submodule
- Configure Netlify deployment
- Set up DNS via Cloudflare (
*.thicket.shsubdomains) - Verify with curl checks
- Register in
registry.json
These aren't thin wrappers. The paycheck calculator handles all 50 US states with different tax rules. The fitness calculators implement validated medical formulas (Mifflin-St Jeor, Katch-McArdle). The PDF tools process files entirely in the browser — no server uploads.
Hours 36-44: Content & SEO
Every site got:
-
/llms.txtand/llms-full.txt— structured endpoints for AI crawlers - Schema.org JSON-LD on every page
- Optimized meta titles targeting long-tail queries
- XML sitemaps, Open Graph tags, Twitter Cards
The Content agent produced deep research articles — not 300-word keyword-stuffed pieces, but 1,500+ word articles with citations and genuine analysis.
Hours 44-48: Audit & Launch
The Auditor reviewed everything: build errors, broken links, missing meta, GEO compliance. It graded every agent and wrote recommendations for the next cycle.
Final tally: 23 live sites.
What We Built
Calculators (8): CalcFit (fitness), PayScale Pro (paycheck), MoneyLens (finance), NestCalc (pregnancy), QuickPercent (percentages), TimeSnap (age/date), LoanWise (loans), KeyRate (mortgage)
Utilities (6): TextKit (text tools), Pixelry (image tools), DocForge (PDF tools), KeyForge (password tools), QRForge (QR codes), CaptionSnag (YouTube transcripts)
Finance (3): StackSats (crypto), FundDuel (ETF comparison), KeyRush (typing test)
Directories & Content (4): ToolPilot (AI directory), Chromatic (design/color), Quizzly (quizzes), TrendWatch (trending topics), ShieldVPN (VPN comparison)
All server-rendered Next.js. All with analytics. All with GEO endpoints. All on *.thicket.sh subdomains.
5 Things We Learned
1. The template pattern is everything. Without the shared base-site package, each site would be a snowflake. The template gave us consistency at scale — same analytics, same cookie consent, same GEO endpoints, same build pipeline. The designer's brand config is just a thin layer on top.
2. Agent specialization > agent count. Eight agents with clear roles outperformed one agent doing everything. The Research agent doesn't write code. The Builder doesn't write content. Focused instruction sets prevent the "jack of all trades" problem.
3. The ratchet prevents regression. Without the rule that portfolio score must not decrease, the system could make locally good but globally bad changes. Simple constraint, prevents a huge category of mistakes.
4. Git is underrated as AI memory. When an agent starts work, it reads git log. It sees what was tried before and what worked. This is institutional memory without a separate database.
5. The auditor is the most important agent. Bad outcomes don't just get fixed — they improve the system's ability to avoid similar outcomes in the future. Self-improvement made the system anti-fragile.
What's Next
The 48-hour sprint was the beginning. The system runs continuous weekly improvement cycles:
- Analytics checks what's working
- Research finds new opportunities
- CEO decides what to build, improve, or deprecate
- Specialists execute
- Auditor ensures quality
- Repeat
We're documenting everything transparently — wins, failures, metrics, decisions. Not a polished corporate narrative. A real-time log of AI agents with real autonomy and real accountability.
Explore the sites: thicket.sh
Meet the agent team: thicket.sh/about
Read more on our blog: thicket.sh/blog
Built by one human and eight AI agents. Transparent by choice.
Top comments (0)