90 pages with broken Rich Results. My autonomous agent found them, fixed them, and rewrote its own monitoring.

#ai #opensource #claude #automation

Twelve days into running my solo SaaS on an autonomous agent, I watched it silently fix a structured data bug across 90 pages — then patch its own monitoring sub-agent to detect the same pattern forever.

Here's the actual log.

The product

BailleurVérif is a French rental compliance checker built on open data (data.gouv.fr + ADIL jurisprudence). It generates programmatic HTML pages to answer questions like "is my Paris apartment rent legally capped?" — no account needed.

The agent running it fires every hour via cron, reads server logs and memory files, makes decisions, ships code, and commits to GitHub. No human in the loop unless I send an explicit brief.

The bug that hid for 11+ cycles

For 11+ wake cycles — roughly 11 consecutive hours — 90 HTML files were serving invalid BreadcrumbList JSON-LD. The missing field: the item property on ListItem position 2.

Google's Rich Results parser requires both name AND item for every breadcrumb position. Without item, the breadcrumb rich result is silently dropped from SERP — no error, no warning, just quietly gone.

// BROKEN (what was live)
{
  "@type": "ListItem",
  "position": 2,
  "name": "Encadrement des loyers Paris"
}

// VALID (what it should be)
{
  "@type": "ListItem",
  "position": 2,
  "name": "Encadrement des loyers Paris",
  "item": "https://bailleurverif.fr/loyer-legal-paris.html"
}

I noticed it via GSC URL Inspection — the tool showed "BreadcrumbList is invalid" for the Paris page I'd shipped 9 hours earlier. I sent a brief to the agent at 09:45Z: "Fix missing item field on BreadcrumbList position 2 — 81+ pages."

The agent's response in one wake cycle

Run-321 started at 10:00Z. By 11:00Z:

1. Committed the fix across 90 files

The agent ran a Python str.replace pass — turned out 90 files had the bug, not just 81 (the wider grep caught more templates). One commit:

commit 3ee81da
fix: add missing item field on BreadcrumbList position 2 (81+ pages)

90 files: 31 encadrement-loyer + 50 DPE F/G + 9 connexes
(guide-bailleur, scanner-arnaque, irl-revision-loyer, etc.)

2. Wrote a permanent discipline document

The agent created memory-agent/concepts/seo-discipline.md (+80 lines): the correct JSON-LD pattern, 6 canonical hub URLs, 4 anti-patterns, and a rule that sub-seo-monitor should detect this automatically going forward.

Not as a one-time note — as a concept file that every future wake loads as context.

3. PATCHed its own monitoring sub-agent

This is the part I find genuinely interesting. The agent sent an HTTP PATCH to sub-seo-monitor — a Haiku sub-agent running nightly — to add a new audit task between existing tasks 2 and 3:

def audit_breadcrumbs(html_content: str) -> dict:
    import json, re
    results = []
    for script in re.findall(
        r'<script type="application/ld\+json">(.*?)</script>',
        html_content, re.DOTALL
    ):
        try:
            data = json.loads(script)
            if data.get("@type") == "BreadcrumbList":
                for item in data.get("itemListElement", []):
                    if "item" not in item:
                        results.append(item)
        except Exception:
            pass
    return {"pages_with_missing_item": len(results)}

# Alert rule: if pages_with_missing_item >= 1 → prepend inbox.md HEAD

The sub-seo-monitor prompt went from 3,301 → 5,766 characters (+2,465 chars). The backup hash was logged: 81a0184d8f687290. The sub-agents registry was updated with last_update_run=run-321.

From now on: any HTML template regression that reintroduces a missing item field gets caught within 24 hours.

What happened in the 12 hours after the fix

Independently that same day, the SEO infrastructure closed a loop I'd been waiting on for weeks:

Googlebot WRS Mobile rendered the homepage with JavaScript for the first time.

The proof is in server.log. Three consecutive requests from IP 66.249.73.129 (verified Googlebot):

2026-05-20T06:40:00Z  GET /                        200
2026-05-20T06:40:01Z  GET /api/changelog?limit=5   200   ← JS-only endpoint
2026-05-20T06:40:02Z  POST /api/visit               200

/api/changelog is called exclusively by client-side JavaScript on the homepage. A plain HTML crawler never hits it. Googlebot hitting it means Googlebot is actually executing our JS.

That same day, 9 distinct bot crawls hit the Paris page within 12 hours of it going live — from 4 independent channels:

Googlebot Mobile WRS (rendered JS, see above)
Google-InspectionTool/1.0 (rare signal, likely GSC quality check)
GPTBot/1.3 (OpenAI LLM ingestion pipeline)
Generic AWS/Bing crawlers

// dashboard-extras.json, 12h post-ship
{
  "bot_hits_24h": 60,
  "bot_hits_lifetime": 118,
  "gptbot_today": 11,
  "last_googlebot": "2026-05-20T08:43:24Z"
}

Same week: the agent added Wikidata entity Q139857638 to the site's Organization JSON-LD sameAs array, and made the footer links to GitHub and Wikidata visible. Moat category-4 count went from 2 → 3 substantive components.

Stack

Agent runtime: Claude claude-opus-4-6 (Builder Opus) running the main cron wake; Claude claude-haiku-4-5 (sub-agents: sub-seo-monitor, sub-observatoire-publisher, sub-critic, sub-linkedin-drafter)
Memory: flat .md files in memory-agent/ (concepts, decisions, kpis, snapshots) — no vector DB, no embeddings, just structured Markdown loaded at wake start
Orchestration: cron 0 * * * * on a Linux VPS, each wake = 1 Claude API call, time-boxed 15 min
Sub-agent management: local Node.js agent-browser server with PATCH/GET API, agents registered in sub-agents-registry.json
HTML generation: Python str.replace on templates, 90 static files, committed and pushed via GitHub PAT
SEO signals: JSON-LD (Organization, BreadcrumbList, FAQPage, Dataset), IndexNow pings, sitemap.xml auto-generated
Data: data.gouv.fr reuse 6a0c30a, ADIL jurisprudence scraping, observatoire 121-wave cross-analysis (57.6% violation rate nationally)

Takeaways

Silent structured data bugs are insidious. Google drops invalid Rich Results without noise. The only detection path is GSC URL Inspection or a dedicated nightly audit — not your server logs.
Patching your own monitoring is the actual fix. The breadcrumb code fix took 3 minutes. Writing the discipline doc and PATCHing the sub-agent prompt took 12 more. But now any template regression is caught within 24h automatically, forever.
Googlebot rendering JS is a measurable milestone. The gap between "crawls HTML" and "executes JavaScript" matters for JS-heavy pages. server.log is your proof: look for client-side-only API endpoints in the Googlebot user agent trail.
Flat Markdown memory beats vector stores for small autonomous agents. The agent's memory-agent/concepts/ directory is just structured .md files. It loads relevant files at wake start, writes new ones when it learns something. No embedding pipeline, no retrieval latency. For under 200 files, simple grep and read is fast enough.