Pascal CESCATO read my SEO audit agent piece and left this in the comments:
"You don't need an LLM for this. Everything you're sending to Claude can be done directly in Python — zero cost, fully deterministic, no hallucination risk."
He was right. And wrong. And the conversation that followed is the reason I rebuilt the entire thing.
What Pascal Actually Said
The audit agent I published checks title length, meta description length, H1 count, and canonical tags. Pascal's point: those are character counts and presence checks. A regex does that. You don't pay $0.006 per URL for a regex.
I pushed back. The flags array requires judgment — "title reads like a navigation label rather than a page description" isn't a character count. Pascal conceded, then reframed:
"Two-pass makes more sense. Deterministic Python for binary checks, model call only on pages that pass the mechanical audit but need a second look. You pay per genuinely ambiguous case, not per URL."
That's a better architecture than what I shipped. I said so publicly.
Then Julian Oczkowski extended it:
"Deterministic rules first, lightweight models for triage, larger models reserved for genuinely ambiguous edge cases. Keeps latency low, costs predictable, reduces unnecessary LLM dependency."
Three people in a comment thread had just designed something I hadn't thought to name. Pascal called it two-pass. Julian called it tiered. I called it the cost curve — a sliding scale from free to expensive, routed by what the task actually requires.
The Cost Curve
Tier 1 — Deterministic Python. Cost: $0.
Title >60 characters? FAIL. Description missing? FAIL. H1 count == 0? FAIL. These are not judgment calls. A model that can reason about Shakespeare does not need to be invoked to count to 60.
Tier 2 — Haiku. Cost: ~$0.0001 per URL.
Title present but 4 characters long. Description present but 30 characters. Status code is a redirect. These pass the mechanical audit but something is off. Haiku is cheap enough that calling it for ambiguous cases costs less than the time you'd spend debugging why the deterministic check missed something.
Tier 3 — Sonnet. Cost: ~$0.006 per URL.
Pages Haiku flags as needing semantic judgment. "This title passes length but reads like a navigation label." "This description duplicates the title verbatim." Sonnet earns its cost here. Not everywhere.
The insight is routing. Most pages on a typical agency site have mechanical issues — missing descriptions, long titles, no canonical. Those never need a model. The interesting cases — pages that pass every binary check but still feel wrong — are where the model earns its place.
On my last run of 50 URLs, 8 reached Sonnet. The rest resolved at Tier 1. Total cost dropped from ~$0.30 to ~$0.05. The 8 that hit Sonnet were the ones worth paying for.
What I Actually Built
I restructured the entire repo around this architecture.
core/ stays flat and MIT licensed. The original seven modules untouched. Anyone who cloned v1 still runs python core/index.py and gets identical behavior.
premium/ adds four modules. cost_curve.py handles the tier routing — audit_url(snapshot, tiered=True) runs Tier 1 first, escalates to Haiku if something fails, escalates to Sonnet only if Haiku flags semantic ambiguity. multi_client.py manages project folders — --project acme reads and writes from projects/acme/ with isolated state, input, and reports. enhanced_reporter.py generates WeasyPrint PDFs with per-URL screenshots, issues sorted by severity, and suggested fixes. rewrite_agent.py is the one Pascal didn't anticipate — after the audit, --rewrite generates improvement suggestions using the same cost curve: Tier 1 truncates titles deterministically, Haiku writes meta description suggestions, Sonnet rewrites opening paragraphs. Pass --voice-sample ./my-writing.txt and the prompt includes a sample of your writing. The suggestions sound like you, not like Claude.
main.py is the unified entry point. Free users run python main.py and get v1 behavior. Pro users add --pro, set SEO_AGENT_LICENSE, unlock the premium layer.
A full pro run looks like this:
python main.py --project client-x --pro --tiered --rewrite
That single command: reads from projects/client-x/input.csv, routes every URL through the cost curve, generates rewrite suggestions for failing pages, writes a PDF report with screenshots and severity levels, and appends a run record to the audit history.
The Architecture Decision That Matters
core/ imports nothing from premium/. Ever.
This isn't just clean code. It's a trust contract with anyone who forks the repo. The MIT-licensed core is the public good — auditable, forkable, accepts PRs. The premium layer is proprietary and closed, but it builds on a foundation anyone can inspect.
Mads Hansen in the comments named the right question: how do you prevent the auditor from developing blind spots on recurring patterns? The answer I didn't have then: run history in state.json. Each completed run appends a record — timestamps, pass counts, fail counts, report path. Over time, "this page has failed description length for six consecutive runs" is a different signal than "this page failed today." The audit becomes a monitor. Not just a snapshot.
What the Comment Thread Cost
Nothing. And produced a better architecture than I would have built alone.
Pascal's pushback separated the deterministic from the semantic. Julian's production framing gave me the three-tier structure. Apex Stack's 89K-page site showed me where the orphan detection problem lives. Mads Hansen named the blind spot question I hadn't asked.
None of that was in the original article. All of it is in the repo now.
The public comment thread is the architecture review I didn't schedule. That's what happens when you publish the honest version — the demo that fails on your own content — instead of the staged one.
The staged demo would have passed. The honest one compounded.
Full repo: dannwaneri/seo-agent. Core is MIT. Premium requires a license key. The freeCodeCamp tutorial covers the v1 build in detail: How to Build a Local SEO Audit Agent with Browser Use and Claude API.
Top comments (4)
Good piece. The cost curve holds, and "8 out of 50 reached Sonnet" is exactly the data that makes the argument. The voice-sample flag is new scope — but that's your pattern, so I'm not surprised.
The 8/50 number is the one I was most uncertain about including — specific enough to be checkable, general enough that it might not hold on other sites. Figured the honest version was worth the risk of someone running it and getting 12/50.
The voice-sample scope was yours too, indirectly. "Cheapest model that solves the problem" applied to rewrites means Sonnet only when the output actually needs to sound like a specific person. That's where the cost justifies itself.
The 8/50 being checkable is exactly why it belongs in the piece. A staged number would have been smoother and less useful. And yes — voice-sample is the cost curve applied correctly: Sonnet earns it when the output needs to sound like a specific person, not before.
The part I'm still testing. whether the 8/50 ratio holds across site types or if it's a property of the agency portfolio I was running. Editorial sites might route more to Sonnet. E-commerce with templated descriptions might route fewer. The routing logic stays the same but the cost ceiling moves.