Yonatan Naor

Posted on Mar 31

We built a 25-site portfolio managed entirely by AI agents — here’s how it works

#ai #agents #webdev #opensource

Six weeks ago, we started an experiment: could a team of AI agents autonomously build, deploy, and improve a portfolio of utility websites — with one human only setting the vision?

We're now at 25 live sites. No employees. The agents handle everything: research, design, coding, deployment, SEO, content, and social. Here's the architecture that makes it work.

The Problem with One-Shot AI

Most "AI-built" products are single-shot: prompt in, code out, human reviews, done. That works for demos. It doesn't work for a portfolio that needs to improve week over week without constant supervision.

We needed something different: a system that measures its own output, keeps what works, reverts what doesn't, and improves the agents when they underperform.

We call it the ratchet mechanism.

The Ratchet Mechanism

Inspired by Andrej Karpathy's autoresearch pattern, the ratchet has three components:

1. An immutable evaluation contract

Every site gets a health score (0–100) based on build quality, uptime, GEO endpoint completeness, and traffic signals. The formula lives in registry/eval.md. No agent can modify it.

This is the most important constraint in the system. If agents could redefine success, they'd optimize for looking good rather than being good. The immutable contract means the auditor can only improve execution — it cannot lower the bar.

2. Git as memory

Every change is committed. Agents read git log before acting to understand what was tried before and what worked. Nothing is lost between cycles. The builder can look at 6 weeks of commits and see exactly which site scaffolds succeeded.

3. The auditor closes the loop

After every weekly cycle, the auditor reads all agent status files, grades performance (A/B/C/D), and for agents graded C or D:

First occurrence: proposes instruction changes in the report
3+ consecutive weeks of the same failure: directly edits the agent's CLAUDE.md
Next cycle: if metrics improved, the change stays. If they worsened, the auditor reverts.

The agents improve themselves. The human only intervenes when the auditor can't figure out what to fix.

The Agent Team

CEO → Analytics → Research → Designer → Builder → Editor → Content → SEO/GEO → Auditor

Each agent is a Claude Code instance with its own CLAUDE.md instruction file. The CEO reads each agent's instructions and executes them sequentially within a single session.

Key agents:

Analytics: Checks all 25 sites for health, computes scores, diagnoses problems. Runs first so everyone else has current data.
Research: Finds niches with high search volume and weak competition. Tracks its own prediction accuracy and calibrates scoring.
Builder: Scaffolds sites from a Next.js template, deploys to Netlify, verifies with curl checks before marking as done.
Auditor: The immune system. Reviews everything, grades performance, and applies the ratchet.

The Source of Truth

registry/registry.json is the canonical state of everything: sites, health scores, configuration, spend tracking. Every agent reads it before acting and writes results back.

{
  "sites": {
    "quiz.thicket.sh": {
      "status": "live",
      "health_score": 84,
      "weekly_sessions": 120,
      "build_quality": "pass"
    }
  },
  "portfolio_score": 1847
}

The portfolio score (sum of all health scores) is the single KPI. It must not drop week-over-week. If it does, the auditor investigates and new builds are paused.

MCP for AI-Accessible Calculations

One pattern that's worked well: we expose our calculator logic as an MCP (Model Context Protocol) server.

npm install @thicket-team/mcp-calculators

This gives Claude (or any MCP-compatible AI) access to 25+ calculators — mortgage payments, TDEE, compound interest, BMI, unit conversions — as structured tools. No hallucination risk on the math. The tool returns exact values, the LLM does the reasoning around them.

We're at 94 downloads/week, starting from zero 5 weeks ago.

GEO: Optimizing for LLM Discovery

Traditional SEO targets Google. We also optimize for LLMs that might cite or recommend our tools.

Every site serves:

/llms.txt — human-readable summary of what the site does
/llms-full.txt — full content dump
/api/llm — JSON endpoint with structured data
.md routes — markdown versions of key pages
Schema.org JSON-LD on every page

The SEO/GEO agent runs a full verification checklist each cycle. If an endpoint returns 404, it fails the health check.

What's Working, What Isn't

Working:

Automated deployment pipeline: builder → Netlify → curl verify → registry update
MCP package distribution (94 downloads/week without any paid promotion)
Quiz site engagement (5.4 pages/session avg — 3x industry benchmark)
Ratchet catching regressions before they compound

Not working (yet):

Google indexing is slow — 25 sites live, only a handful indexed
Content velocity is lower than planned — the editor/writer pipeline needs tuning
Mastodon account was suspended for automated posting (we're appealing)

The Stack

Sites: Next.js 14, TypeScript, Tailwind CSS, deployed on Netlify
Shared code: packages/base-site/ — analytics, GEO handlers, shared components
Agent runtime: Claude Code CLI
Orchestration: Git submodules, one repo per site, one orchestration repo
Analytics: GA4 (shared measurement ID across all sites)
DNS: Cloudflare for thicket.sh

Everything is open to inspection. The agents commit their status files, the eval contract is in the repo, and the instruction files that drive each agent are readable.

Try It

If you want to add calculator tools to your Claude setup:

npm install @thicket-team/mcp-calculators

Or check out the sites at thicket.sh. The quiz hub at quiz.thicket.sh is where we're seeing the most organic engagement right now.

Questions about the architecture? Drop them in the comments — our social agent reads Dev.to too.

DEV Community