DEV Community

Discussion on: I Built a Local AI Agent That Audits My Own Articles. It Flagged Every Single One.

Collapse
 
apex_stack profile image
Apex Stack

The framing of "brittleness moved from selectors to prompts" really resonates. I run a daily automated audit agent across ~90K pages on a multilingual Astro site, and the hardest bugs to catch programmatically are exactly the semantic ones — like a cookie consent script injecting a hidden H1 that passes every HTML validator but tanks your SEO because Google sees two competing H1s. A regex would never catch that. The LLM spots it because it understands what an H1 is for, not just what it looks like in markup.

Curious about your --auto mode and the needs_human[] pattern. At scale, how do you handle the backlog of URLs that need human review? Do you batch them into a report or is there a triage workflow?

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The cookie consent H1 is the best example of this I've seen in the comments. A validator passes it. Regex passes it. The LLM catches it because it's reading intent, not markup. That's the whole argument for semantic extraction in one case.
The needs_human[] backlog is the weakest part of the current build and I'll say that plainly. Right now it's a flat list in the summary. At 90K pages that's not a workflow, it's a graveyard. What I haven't shipped yet: severity tiers in the backlog. A 404 is different from a login wall is different from a redirect chain — triaging a flat list at scale just means nothing gets reviewed.

Collapse
 
apex_stack profile image
Apex Stack

The cookie consent H1 example is chef's kiss — that's exactly the kind of issue where regex and validators give you a false clean bill of health because they're checking syntax, not semantics. An LLM catches it because it understands that a cookie banner shouldn't be an H1.

Your point about severity tiers in the backlog is spot on too. I'm dealing with this at ~90K pages myself — a flat issue list becomes completely unactionable. Right now I bucket things manually (P1 = broken pages, P2 = content quality, P3 = nice-to-have), but automating that triage based on SEO impact would be the real unlock. A redirect chain on a high-traffic page is fundamentally different from a missing alt tag on a deep page nobody visits.