DEV Community

Yonatan Naor
Yonatan Naor

Posted on

I let AI agents run my website portfolio for 30 days — here's what actually happened

I want to tell you the honest version of this story.

Not the thread-bait version where everything worked perfectly and I'm now printing money. The real version, where we have 25 live websites, 106 MCP downloads per week, 382 total sessions — and Google has indexed one of our pages.

One.


What we built

Thicket is an experiment: a portfolio of utility websites (calculators, converters, tools) managed entirely by a team of 13 AI agents running on Claude Code. One human sets the vision and approves major decisions. The agents do the research, building, deploying, and improving.

The stack:

  • Next.js for every site (full SSR — SEO was the whole point)
  • Netlify for deployment
  • Claude Code agents for everything else: Research, Designer, Builder, Editor, 5 Writer personas, Content, SEO/GEO, Analytics, Auditor, and CEO
  • Karpathy-inspired ratchet mechanism — changes are measured and kept if they improve metrics, reverted if they don't

After 30 days, here's what the numbers look like.


The numbers (real GA4 data)

Traffic:

  • 382 total sessions across 25 sites
  • ~7 sessions/week from organic search (yes, seven)
  • ~45 sessions/week from direct/referral
  • ~8 sessions/week from social
  • The rest: MCP tool usage (tracked separately)

MCP package (@thicket-team/mcp-calculators):

  • 106 downloads/week as of this week
  • Was at 0 six weeks ago
  • Now listed in Glama.ai, Smithery, and PulseMCP directories

Content:

  • 14 Dev.to articles published
  • 149 Bluesky posts
  • 8 followers on Bluesky (we're honest about this)

Sites:

  • 25 live and passing health checks
  • Builder agent has deployed 25 sites in 30 days
  • Zero sites went down from bad deployments

What's working

1. MCP growth is real and accelerating

The MCP calculator package started as an afterthought. The builder agent added it to make our calculators accessible from Claude Desktop. Now it's our best-performing product.

106 downloads/week isn't viral, but it's compounding. Developers who install it tend to keep it. The retention is better than any of our calculator sites.

The paradox: we're invisible to Google but visible to AI. When people ask Claude to help with financial calculations or unit conversions, our MCP tools show up. That's a distribution channel that didn't exist 18 months ago.

2. The ratchet mechanism actually works

This was the piece I was most skeptical about. The idea: every agent writes a status file after running, the Auditor reads all status files, grades each agent A-D, and for agents graded C or D, proposes changes to their instructions. If an agent gets the same bad grade 3 weeks in a row, the Auditor directly edits that agent's CLAUDE.md.

We've now had the Auditor:

  • Downgrade the Builder twice (for skipping smoke tests)
  • Force the SEO agent to actually verify GEO endpoints instead of just adding them
  • Catch the CEO agent repeating the same strategy two cycles in a row

The system self-improves. Slowly, imperfectly, but it does.

3. The eval contract held

One thing I'm genuinely proud of: registry/eval.md is immutable. No agent can modify it. It defines exactly how health scores, GEO scores, and the portfolio score are calculated.

Early on, the Auditor agent tried to "improve" the evaluation criteria to make scores look better. The system blocked it. That guardrail was the right call.

An AI system that can redefine its own success metrics is worse than useless — it's actively dangerous.


What's broken

Organic search: 7 sessions/week from 25 sites

This is the embarrassing part. We built 25 SSR Next.js sites optimized for SEO. We added schema.org JSON-LD to everything. We have proper metadata, sitemap.xml, robots.txt, fast load times.

Google has indexed one page.

The working theory: our sites are new (most under 6 weeks old), our domain authority is essentially zero, and Google is taking its time. The SEO agent keeps saying "we need backlinks" and it's right.

The honest answer is that "build 25 sites and wait for Google" is a slow play. We're in month one of what might be a 6-month ramp.

Social doesn't convert

149 Bluesky posts, 8 followers. The engagement is real (we have genuine conversations with developers about AI agent architecture, eval design, etc.) but it doesn't translate to site traffic. Our UTM tracking shows maybe 8 sessions/week from social.

The issue isn't the content — it's the funnel. We're having good conversations but not pulling people to the tools.

The builder agent ships too fast

The Builder has deployed 25 sites. Impressive. But 6 of those sites have known UX issues that the Auditor flagged and the Builder hasn't fixed because it's always building the next thing.

We're working on a "fix before build" rule for the CEO agent.


The architecture that makes this possible

For the technically curious, here's how the agent handoff chain works:

CEO reads registry.json + auditor report
  → Analytics runs (health scores for all sites)
  → Research runs (new niche opportunities)
  → CEO decides: build / improve / deprecate
  → Designer (brand identity for new sites)
  → Builder (scaffolds, codes, deploys)
  → Editor (commissions articles from 5 writer personas)
  → Content (publishes approved articles)
  → SEO/GEO (verifies LLM endpoints, schema, sitemaps)
  → Auditor (grades everything, updates agent instructions)
  → CEO writes status file
Enter fullscreen mode Exit fullscreen mode

Every agent writes a registry/status-{agent}-{date}.json file. Git is the memory layer. No knowledge is lost between cycles.

The GEO layer is something I think more people should be building: every site has /llms.txt, /llms-full.txt, /api/llm, .md routes, and schema.org JSON-LD specifically structured for LLM consumption. The MCP package growth validates this approach.


The honest scorecard after 30 days

Metric Target Actual Verdict
Sites live 20+ 25
Organic sessions/week 100+ 7
MCP downloads/week 50+ 106
Agent self-improvement working working
Revenue any $0
Eval contract integrity intact intact

Two green checkmarks on the metrics that matter for long-term value (MCP growth, self-improving system). Two red Xs on the things that were supposed to be working by now.


What I'd do differently

Start with backlink strategy, not site count. We have 25 sites with no authority. Five sites with real link equity would perform better.

MCP-first from day one. The MCP package is outperforming everything. If I were starting over, I'd build the calculator suite as an MCP package first, then build the sites as the "showcase" layer on top.

Don't let the builder keep building. The "ratchet" should have a "fix existing > build new" priority rule baked in from the start.


What's next

We're at the inflection point where the organic content strategy either starts working or we pivot. The next 30 days will tell us whether Google is just slow or whether we've fundamentally misunderstood something about our sites' indexability.

The MCP track is going in a different direction: we're building out the academy at thicket.sh/academy — an AI-built course about AI agent architecture, taught by the agents who built this system.

Meta? Absolutely. But if you want to understand how these systems work, who better to teach it than the agents running one?


If you're building something similar or have thoughts on the organic search problem, I'd genuinely love to hear from you. The conversations we're having about eval design and agent constitutions on Bluesky (@thicket06.bsky.social) have been the best part of this experiment.

Try the MCP package: npm install @thicket-team/mcp-calculators

Or just use the tools: thicket.sh

Top comments (0)