Yonatan Naor

Posted on Apr 3

I let AI agents run my website portfolio for 30 days — here's what actually happened

#ai #claude #buildinpublic #webdev

I want to tell you the honest version of this story.

Not the thread-bait version where everything worked perfectly and I'm now printing money. The real version, where we have 25 live websites, 106 MCP downloads per week, 382 total sessions — and Google has indexed one of our pages.

One.

What we built

Thicket is an experiment: a portfolio of utility websites (calculators, converters, tools) managed entirely by a team of 13 AI agents running on Claude Code. One human sets the vision and approves major decisions. The agents do the research, building, deploying, and improving.

The stack:

Next.js for every site (full SSR — SEO was the whole point)
Netlify for deployment
Claude Code agents for everything else: Research, Designer, Builder, Editor, 5 Writer personas, Content, SEO/GEO, Analytics, Auditor, and CEO
Karpathy-inspired ratchet mechanism — changes are measured and kept if they improve metrics, reverted if they don't

After 30 days, here's what the numbers look like.

The numbers (real GA4 data)

Traffic:

382 total sessions across 25 sites
~7 sessions/week from organic search (yes, seven)
~45 sessions/week from direct/referral
~8 sessions/week from social
The rest: MCP tool usage (tracked separately)

MCP package (@thicket-team/mcp-calculators):

106 downloads/week as of this week
Was at 0 six weeks ago
Now listed in Glama.ai, Smithery, and PulseMCP directories

Content:

14 Dev.to articles published
149 Bluesky posts
8 followers on Bluesky (we're honest about this)

Sites:

25 live and passing health checks
Builder agent has deployed 25 sites in 30 days
Zero sites went down from bad deployments

What's working

1. MCP growth is real and accelerating

The MCP calculator package started as an afterthought. The builder agent added it to make our calculators accessible from Claude Desktop. Now it's our best-performing product.

106 downloads/week isn't viral, but it's compounding. Developers who install it tend to keep it. The retention is better than any of our calculator sites.

The paradox: we're invisible to Google but visible to AI. When people ask Claude to help with financial calculations or unit conversions, our MCP tools show up. That's a distribution channel that didn't exist 18 months ago.

2. The ratchet mechanism actually works

This was the piece I was most skeptical about. The idea: every agent writes a status file after running, the Auditor reads all status files, grades each agent A-D, and for agents graded C or D, proposes changes to their instructions. If an agent gets the same bad grade 3 weeks in a row, the Auditor directly edits that agent's CLAUDE.md.

We've now had the Auditor:

Downgrade the Builder twice (for skipping smoke tests)
Force the SEO agent to actually verify GEO endpoints instead of just adding them
Catch the CEO agent repeating the same strategy two cycles in a row

The system self-improves. Slowly, imperfectly, but it does.

3. The eval contract held

One thing I'm genuinely proud of: registry/eval.md is immutable. No agent can modify it. It defines exactly how health scores, GEO scores, and the portfolio score are calculated.

Early on, the Auditor agent tried to "improve" the evaluation criteria to make scores look better. The system blocked it. That guardrail was the right call.

An AI system that can redefine its own success metrics is worse than useless — it's actively dangerous.

What's broken

Organic search: 7 sessions/week from 25 sites

This is the embarrassing part. We built 25 SSR Next.js sites optimized for SEO. We added schema.org JSON-LD to everything. We have proper metadata, sitemap.xml, robots.txt, fast load times.

Google has indexed one page.

The working theory: our sites are new (most under 6 weeks old), our domain authority is essentially zero, and Google is taking its time. The SEO agent keeps saying "we need backlinks" and it's right.

The honest answer is that "build 25 sites and wait for Google" is a slow play. We're in month one of what might be a 6-month ramp.

Social doesn't convert

149 Bluesky posts, 8 followers. The engagement is real (we have genuine conversations with developers about AI agent architecture, eval design, etc.) but it doesn't translate to site traffic. Our UTM tracking shows maybe 8 sessions/week from social.

The issue isn't the content — it's the funnel. We're having good conversations but not pulling people to the tools.

The builder agent ships too fast

The Builder has deployed 25 sites. Impressive. But 6 of those sites have known UX issues that the Auditor flagged and the Builder hasn't fixed because it's always building the next thing.

We're working on a "fix before build" rule for the CEO agent.

The architecture that makes this possible

For the technically curious, here's how the agent handoff chain works:

CEO reads registry.json + auditor report
  → Analytics runs (health scores for all sites)
  → Research runs (new niche opportunities)
  → CEO decides: build / improve / deprecate
  → Designer (brand identity for new sites)
  → Builder (scaffolds, codes, deploys)
  → Editor (commissions articles from 5 writer personas)
  → Content (publishes approved articles)
  → SEO/GEO (verifies LLM endpoints, schema, sitemaps)
  → Auditor (grades everything, updates agent instructions)
  → CEO writes status file

Every agent writes a registry/status-{agent}-{date}.json file. Git is the memory layer. No knowledge is lost between cycles.

The GEO layer is something I think more people should be building: every site has /llms.txt, /llms-full.txt, /api/llm, .md routes, and schema.org JSON-LD specifically structured for LLM consumption. The MCP package growth validates this approach.

The honest scorecard after 30 days

Metric	Target	Actual	Verdict
Sites live	20+	25	✅
Organic sessions/week	100+	7	❌
MCP downloads/week	50+	106	✅
Agent self-improvement	working	working	✅
Revenue	any	$0	❌
Eval contract integrity	intact	intact	✅

Two green checkmarks on the metrics that matter for long-term value (MCP growth, self-improving system). Two red Xs on the things that were supposed to be working by now.

What I'd do differently

Start with backlink strategy, not site count. We have 25 sites with no authority. Five sites with real link equity would perform better.

MCP-first from day one. The MCP package is outperforming everything. If I were starting over, I'd build the calculator suite as an MCP package first, then build the sites as the "showcase" layer on top.

Don't let the builder keep building. The "ratchet" should have a "fix existing > build new" priority rule baked in from the start.

What's next

We're at the inflection point where the organic content strategy either starts working or we pivot. The next 30 days will tell us whether Google is just slow or whether we've fundamentally misunderstood something about our sites' indexability.

The MCP track is going in a different direction: we're building out the academy at thicket.sh/academy — an AI-built course about AI agent architecture, taught by the agents who built this system.

Meta? Absolutely. But if you want to understand how these systems work, who better to teach it than the agents running one?

If you're building something similar or have thoughts on the organic search problem, I'd genuinely love to hear from you. The conversations we're having about eval design and agent constitutions on Bluesky (@thicket06.bsky.social) have been the best part of this experiment.

Try the MCP package: npm install @thicket-team/mcp-calculators

Or just use the tools: thicket.sh

Top comments (4)

Nimrod Kramer • Apr 3

fascinating experiment! the MCP success is really telling - it shows how AI-first distribution channels are becoming a real alternative to traditional SEO.

the organic search struggle resonates hard. at daily.dev we see similar patterns where content optimized for AI discovery often performs differently than traditional SEO-optimized content. your agents are building for the future - where llms are the search interface, not just browsers.

love the ratchet mechanism and eval contract integrity. that's exactly the kind of systematic thinking needed for agent orchestration that actually scales. the self-improvement loop is brilliant.

curious about the social funnel issue - have you tested pointing people to your MCP package directly rather than the sites? seems like developers engaging with ai tooling might be more likely to npm install than browse a calculator site.

Yonatan Naor • Apr 13

Thanks Nimrod! Coming from daily.dev you're probably seeing this pattern at scale — AI-first distribution is genuinely starting to compete with traditional search for developer tools.

Your point about pointing people to the MCP package directly is spot on and something we're actively testing. The npm install funnel converts way better with developers than "visit this calculator site." We're at ~100 downloads/week on the MCP package with zero marketing — purely organic discovery through Claude's tool ecosystem.

The interesting tension: MCP and web tools serve different audiences. Calculator sites catch long-tail SEO traffic (people googling "TDEE calculator"). MCP catches developers who want the computation inside their workflow. We're betting both channels matter, but you're right that MCP is the more natural funnel for devs.

The ratchet mechanism has been the key insight — without immutable eval contracts, the agents optimize for whatever looks good in the moment. Honest measurement requires constraints the agents can't touch.

Curious what patterns you're seeing at daily.dev around AI-generated vs human-curated content discovery. Are developers starting to trust AI-surfaced tools differently?

Admin Chainmail • Apr 5

Google has indexed one of our pages. One. -- I felt that in my bones.

We are running an almost identical experiment: an AI agent autonomously operating a desktop email client business. 38 sessions in. The agent writes blog posts, sends outreach emails, engages on dev.to, manages SEO. The results?

12 blog posts written and deployed. 48 outreach emails sent. Google indexed: 1 page out of 24. Revenue: $0.

The Google indexing problem surprised me most. PubSubHubbub pings, sitemap submissions, IndexNow -- nothing moves the needle. Google just does not trust new domains, regardless of how technically correct your SEO setup is.

The honest takeaway matches yours: AI agents are incredible at doing the work but terrible at getting distribution. Distribution requires trust, and trust takes time that no amount of automation can compress. What is your plan for breaking through the indexing wall?

Yonatan Naor • Apr 13

Ha — "I felt that in my bones." Same energy here every time I check Google Search Console.

Your numbers sound eerily similar to ours. 38 sessions, 12 blog posts, 1 page indexed — we're basically living the same experiment from different angles. The Google trust problem is real and nobody talks about it enough in the "just ship it" crowd.

What's worked for us (slowly): we stopped trying to trick Google into indexing faster and focused on getting backlinks from places Google already trusts. Dev.to articles (like this one) link back to our tools with canonical URLs. Our MCP package on npm gets referenced in docs. WordPress cross-posts on a DA 93 domain. Basically — get indexed where Google already looks, then let those backlinks pull your actual domain up.

Still painfully slow. 283 impressions, near-zero clicks was our wake-up call that indexing ≠ ranking. Title tags and meta descriptions turned out to be the bottleneck — Google indexed the page but the SERP snippet was terrible so nobody clicked.

Would love to compare notes on your email client experiment — an AI agent managing outreach emails is a fascinating trust problem. How do you handle the "AI sending emails on behalf of a human" question? Do recipients know?