DEV Community

Discussion on: I Built a Local AI Agent That Audits My Own Articles. It Flagged Every Single One.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

Love this. There's something genuinely useful about using the same tool to audit the work it helped you produce - catches the patterns you've normalized. Curious what the most common flag was. Was it style/tone or more structural things like missing context or weak conclusions?

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Mostly structural — missing meta descriptions, titles over 60 characters, H1 count issues. Nothing about style or tone because the agent isn't reading for quality, it's checking against standards. The interesting case was freeCodeCamp's own template truncating descriptions on article listing pages — the agent flagged it as a FAIL and it technically is, even though it's platform-level and outside my control. Auditing your own work with your own tool finds the things you'd rationalized as acceptable.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

The structural stuff makes sense - those are measurable so the agent can actually flag them. But that freeCodeCamp case is the interesting one. Platform-imposed truncation showing up as a personal FAIL is exactly the kind of thing you'd normally just rationalize away. The agent doesn't know context, so it flags it anyway. Weirdly that's the most honest kind of audit.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Context-free is the feature not the limitation. A human auditor would see "freeCodeCamp template" and mark it acceptable. The agent sees a missing meta description and marks it FAIL. Both are correct . they're answering different questions.
The agent answers: does this page meet the standard? The human answers: is this worth fixing given the constraints? You need both. The agent's job is to surface everything. Your job is to triage what actually matters.

The platform-imposed FAIL is useful precisely because it forces the triage decision to be explicit rather than assumed. You either fix it, escalate it, or document why it's acceptable. Any of those is better than normalizing it silently.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

The agent/human split you're describing is exactly right. The agent answers "does this meet the standard" - the human answers "is this worth fixing given the context". Those are genuinely different questions and both useful. The platform-imposed failures are actually good signal - they're showing you the gap between your setup and the standard, even if you consciously chose that gap.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri • Edited

Consciously chose that gap

is the useful distinction. There's a difference between a FAIL you didn't know about and a FAIL you accepted. The agent can't tell which is which but surfacing both forces you to be explicit about which category each one falls into. The ones you assumed were acceptable without ever deciding they were is where the audit earns its cost.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

Yeah exactly - the ones you assumed were fine without deciding they were is the honest gap. Surfacing it is most of the value.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Surfacing it is most of the work. The deciding is faster once you can see it clearly.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

Exactly. The decision is quick once you stop rationalizing.