DEV Community

Cover image for I Was Paying $0.006 Per URL for SEO Audits Until I Realized Most Needed $0

I Was Paying $0.006 Per URL for SEO Audits Until I Realized Most Needed $0

Daniel Nwaneri on April 03, 2026

Pascal CESCATO read my SEO audit agent piece and left this in the comments: "You don't need an LLM for this. Everything you're sending to Claude ...
Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

Good piece. The cost curve holds, and "8 out of 50 reached Sonnet" is exactly the data that makes the argument. The voice-sample flag is new scope — but that's your pattern, so I'm not surprised.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The 8/50 number is the one I was most uncertain about including — specific enough to be checkable, general enough that it might not hold on other sites. Figured the honest version was worth the risk of someone running it and getting 12/50.
The voice-sample scope was yours too, indirectly. "Cheapest model that solves the problem" applied to rewrites means Sonnet only when the output actually needs to sound like a specific person. That's where the cost justifies itself.

Collapse
 
pascal_cescato_692b7a8a20 profile image
Pascal CESCATO

The 8/50 being checkable is exactly why it belongs in the piece. A staged number would have been smoother and less useful. And yes — voice-sample is the cost curve applied correctly: Sonnet earns it when the output needs to sound like a specific person, not before.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The part I'm still testing. whether the 8/50 ratio holds across site types or if it's a property of the agency portfolio I was running. Editorial sites might route more to Sonnet. E-commerce with templated descriptions might route fewer. The routing logic stays the same but the cost ceiling moves.

Collapse
 
andrewrozumny profile image
Andrew Rozumny

This is a great breakdown of where LLMs actually make sense vs where they don’t

feels like a lot of people jump straight to AI even for things that are fully deterministic

the “cost curve” idea is really solid — routing based on complexity instead of defaulting to a model for everything

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The default-to-model instinct is expensive to unlearn. Routing by what the task actually requires is the whole argument...

Collapse
 
andrewrozumny profile image
Andrew Rozumny

that’s a really good way to put it

feels like a lot of systems default to “just call a model” without asking if the task even needs one

Collapse
 
jon_at_backboardio profile image
Jonathan Murray

"He was right" is a good way to open a post that could easily have been defensive. The pattern of recognizing when an LLM is solving a problem that deterministic code handles better is genuinely underappreciated — people reach for AI for things that are structurally rule-based and then wonder why results are inconsistent.

The useful generalization here: LLMs add value when the task requires judgment under ambiguity or synthesis across unstructured inputs. When the input is structured and the transformation rules are knowable, deterministic code is faster, cheaper, and more reliable. For SEO audits specifically, most of the high-value checks (missing meta, broken links, heading structure, page speed signals) are fully deterministic — the places where an LLM adds something real are interpretation and prioritization of findings, not the audit itself.

What ended up being the actual breakdown — what percentage of your audit logic stayed as LLM calls vs moved to Python?

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The breakdown on the last run: 8 of 50 URLs reached Sonnet. The rest resolved at Tier 1 — pure Python, zero model calls. So roughly 84% of the audit logic moved to deterministic code, 16% genuinely needed judgment. That ratio will shift by site type — programmatic SEO with inconsistent templates skews higher, clean agency portfolios skew lower but the direction holds.

Your framing is exactly right. interpretation and prioritization of findings is where the model earns it, not the audit itself. "This page has a missing description" is a Python job. "This page passes every check but the title reads like a navigation label for an audience that came from a transactional query" is a model job. The hard part was resisting the urge to use the model for both because the demo looked cleaner that way.

Collapse
 
automate-archit profile image
Archit Mittal

This is a healthy pattern — use deterministic code for deterministic checks, reserve the LLM for the genuinely fuzzy stuff (is the meta description compelling? does the H1 match search intent?). I've seen the same mistake in invoice-parsing agents: people throw Claude at a fixed PDF template when pdfplumber + regex would nail it in 20ms for free. The rule I've settled on: if a unit test could catch the error, don't use an LLM for it. Good rebuild decision.

Collapse
 
apex_stack profile image
Apex Stack

This cost curve framing is the cleanest way I've seen anyone articulate tiered AI processing. The core insight — that routing is the architecture, not the model — is something most people building with LLMs miss entirely.

I'm the "89K-page site" mentioned in the article, and the ratio question Daniel raises at the end is exactly where I'm stuck. My site is multilingual (12 languages), so the Tier 1 → Tier 3 distribution would look very different from an agency portfolio. Things like English content accidentally rendering on a Japanese page, or hreflang tags pointing to wrong canonicals — those pass every deterministic check but are fundamentally broken for their audience. I'd estimate 30-40% of my pages would escalate to Sonnet on any given run.

The --voice-sample flag is an underrated addition. For programmatic SEO sites where you're generating content at scale, keeping a consistent voice across thousands of pages is a real challenge. Having the rewrite agent match a voice sample instead of defaulting to generic Claude output solves a problem I've been thinking about for weeks.

The open-core licensing model (MIT core, proprietary premium) is also smart business design. Builds trust with the community while protecting the value-add.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The hreflang-to-wrong-canonical case is the one that breaks my Tier 1 assumptions cleanest. Deterministic check says canonical is present — PASS. The model looking at rendered context catches that the canonical points to the English version while the hreflang declares Japanese. That's not a missing tag, it's a logic error between two valid tags. No regex finds that. And at 30-40% Sonnet escalation across 89K pages, the cost curve math looks very different from my 8/50 agency run.

The voice-sample use case you named — programmatic SEO at scale, consistent voice across thousands of generated pages — is one I hadn't written up explicitly. The flag exists because "cheapest model that sounds like a specific person" is Sonnet, but the problem you're describing is upstream: how do you maintain voice consistency when generation is happening at volume, not post-hoc? Is the sample you'd pass a single reference document, or are you thinking per-language samples?

Collapse
 
kenwalger profile image
Ken W Alger

It is indeed, well, interesting, how much work folks are pushing to LLMs, and then paying for, that could reasonably done in code or even with a local SLM implementation. While not a solution for every situation, I think there is a shift happening from "send everything to Sonnet" to a more nuanced approached of "does it need to go there?"

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The shift you're describing is the one that actually changes cost structures. "Does it need to go there?" sounds obvious until you're three months into a system where everything went to Sonnet by default and the bill is the first signal that something's wrong. Local SLM for the middle tier is the piece I haven't tested yet — Haiku is cheap enough that I haven't needed to, but at serious scale that changes.

Collapse
 
ai_made_tools profile image
Joske Vermeulen

That’s perfect ! Most of us get used to the ease of using ai for everything, that we sometimes forget how we could do it all for free.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The "free first" instinct is the one worth building back. Regex doesn't hallucinate either.

Collapse
 
ai_made_tools profile image
Joske Vermeulen

Win win win

Collapse
 
jennamade profile image
Jenna

The cost curve framing is really useful. I do AI visibility audits for small businesses and hit the same realization - most of the checks that actually matter (NAP consistency across directories, GBP completeness, schema markup presence) are binary. Either it's there or it isn't.

The expensive part isn't the audit. It's explaining what the results mean to a business owner who doesn't know what a canonical tag is. That's where the model earns its cost - translating "your H1 is missing" into "when someone asks ChatGPT for a plumber in your area, it can't figure out what page to recommend because your homepage doesn't clearly say what you do."

Pascal's two-pass insight applies beyond SEO tooling. Any time you're running an LLM on structured data that has deterministic answers, you're paying for confidence theater.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"Translating H1 missing into what ChatGPT can't do for your business" is the use case that justifies Sonnet on the output layer even when the audit itself is deterministic. The check is binary. The explanation isn't. That's a clean separation I hadn't written up explicitly — the cost curve applies to both the audit and the reporting, and they route differently. A missing canonical is a Tier 1 find and a Tier 3 explanation.
"Confidence theater" is the sharpest way I've seen the problem named. You're paying the model to sound certain about something a regex already knew. The cost isn't just dollars — it's latency and the hallucination surface on data that had a deterministic answer. What does your output layer look like for the business owner translation — templated prompts per issue type, or does the model generate the explanation from the raw audit result each time?

Collapse
 
admin_chainmail_6cfeeb3e6 profile image
Admin Chainmail

The tiered cost curve concept is brilliant and applies way beyond SEO audits. Pascal's two-pass reframe is the kind of feedback that makes public building worthwhile.

We applied essentially the same pattern to bootstrapping a product on $0 budget: Cloudflare free tier for hosting, Workers, and CDN. Resend free tier for transactional email. Google Gemini free tier for automation. Only escalate to paid when the free tier hits a real wall, not a hypothetical one.

The trap is the same as your SEO agent: it's tempting to throw the expensive tool at everything because it can handle it. But 'can' and 'should' diverge fast when you're paying per request. Character counts don't need Claude. Most bootstrapping tasks don't need paid infrastructure.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The "only escalate when the free tier hits a real wall, not a hypothetical one" is the discipline that's hardest to maintain. The temptation is always to provision for the ceiling case before you've hit it which means you pay for scale you haven't earned yet. Your stack is the cost curve applied to infrastructure: same logic, different layer. The failure mode in both cases is the same: optimizing for capability instead of routing by what the task actually requires. Curious where Gemini free tier actually hit a wall for you — that's usually where the architecture gets interesting...

Collapse
 
itskondrat profile image
Mykola Kondratiuk

the regex vs llm split you landed on is where most people get stuck - not because they can't see the difference, but because they scaffolded the whole thing around llm first and retrofitting feels wasteful. rebuilding was the right call.

Collapse
 
soniarotglam profile image
Sonia

This “cost curve” framing is really smart. Most people jump straight to LLMs for everything, but routing by ambiguity makes way more sense.
I’ve seen the same pattern: 70–80% of SEO issues are deterministic (missing meta, duplicate titles, broken canonicals), and the model is only useful for the “this technically passes but feels wrong” cases.
Also interesting that your biggest savings came from routing, not prompt optimization that’s a good reminder that architecture usually beats micro-optimizations.

Have you tried pushing the deterministic layer even further? For example clustering similar pages first and auditing templates instead of URLs one by one.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The template clustering question is the right next step and someone in another thread handed me the framing for it: if a bug lives in a template, it affects every page using that template — fix one, fix all. So the audit unit stops being a URL and becomes a failure mode. Sample one page per template per locale, catch the systemic issues at a fraction of the full-site cost, then only run individual URL audits on pages that deviate from their template's expected pattern.

I haven't shipped this yet but it's on the roadmap. The missing piece is the classification layer — you need something that groups pages by template reliably before you can sample from them. For a CMS with clean URL patterns that's straightforward. For programmatic SEO sites with 89K pages across 12 languages it gets more complex fast. What does your page inventory look like — consistent enough that URL pattern matching would cluster templates cleanly?