DEV Community

Aria13
Aria13

Posted on • Originally published at forge.closerhub.app

AI Content Filter: The Practitioner's Playbook for Killing Low-Quality LLM Slop at Scale

If you run a forum, community, or any platform that accepts user-generated content, you've already felt it: the flood. Posts that technically answer the question but say nothing. Replies that hover at 400 words of confident-sounding noise. Comments that begin with "Great question!" and end with a bulleted list of things you could have googled in 30 seconds.

AI-generated content isn't going away. What you can do is build a practical filter stack that catches the bad stuff before it degrades your community's signal-to-noise ratio.

Here's what actually works.


The Signals That Betray Low-Quality AI Output

Before you build anything, understand what you're hunting. Low-quality AI content has consistent tells — not because it's "AI," but because it's lazy AI: prompt-in, dump-out, zero editing.

Structural tells:

  • Suspiciously consistent paragraph length (4–6 sentences, every single time)
  • Bulleted lists as the default response structure, even when prose would be more appropriate
  • Transitions like "Furthermore," "In conclusion," and "It's worth noting that" — phrases no human under 60 types spontaneously
  • Hedging language density: "may," "can," "often," "generally," "it depends" clustered together

Content tells:

  • Zero specificity. Real practitioners name versions, mention edge cases, share failure stories. AI fills the same space with generalities.
  • Missing pronouns of ownership. "One can," "users might" — no "I tried this on a 50k-row dataset and it exploded."
  • No cultural or temporal grounding. Real people reference what happened last week. AI output is curiously timeless.
  • Repetition of the original question, reworded, as the first paragraph.

Build a scoring rubric from these signals. Even a simple heuristic checklist gives you a 60–70% catch rate before you write a single line of code.


NLP Heuristics You Can Ship in a Weekend

You don't need a fine-tuned transformer to start filtering. These lightweight heuristics run fast and catch the bulk of slop:

Perplexity scoring. LLMs generate low-perplexity text — it's statistically predictable because that's the optimization target. Run submissions through a small local model (GPT-2 works fine for this) and flag anything with perplexity below your threshold. Tune it on a labeled sample of 200–300 posts from your platform.

Type-token ratio (TTR). Calculate unique words divided by total words. AI dumps tend to have lower TTR — they repeat sentence structures and vocabulary more than humans do. A TTR below 0.55 on a 300+ word post is a yellow flag.

Sentence length variance. Compute standard deviation of sentence lengths. Human writing is jagged — we mix one-word punches with long rambling clauses. AI output is smoother. Low variance (< 8 words std dev) correlates with generated content.

Transition phrase density. Build a list of 40–50 AI-characteristic transitions and count occurrences per 100 words. Anything above 3 hits per 100 words is suspect.

Stack these four into a composite score. In Python:

def score_post(text):
    scores = {
        "low_perplexity": check_perplexity(text),     # bool
        "low_ttr": ttr(text) < 0.55,                   # bool
        "smooth_sentences": sentence_variance(text) < 8, # bool
        "transition_density": transition_score(text) > 3  # bool
    }
    return sum(scores.values()) / len(scores)  # 0.0–1.0
Enter fullscreen mode Exit fullscreen mode

A score above 0.6 triggers human review. Above 0.8 goes to auto-hold.


Tools Worth Integrating (and Which Ones to Skip)

Worth your time:

  • Originality.ai API — best precision for long-form content, reasonable pricing at scale. Good for content sites, not real-time forum replies.
  • GPTZero API — solid for educational contexts, decent batch processing. More false positives on technical writing.
  • Local perplexity scoring with llama.cpp — free, runs on a $20/month VPS, tunable. Requires calibration per domain.
  • Perspective API (Google) — not for AI detection specifically, but excellent for low-effort/toxic combo posts. AI slop often co-occurs with low-quality engagement patterns it catches.

Skip for now:

  • Any browser extension marketed to teachers. Not designed for API integration, not calibrated for developer communities.
  • Watermarking schemes (SynthID, etc.) — only catches content generated by participating providers, easily bypassed by paraphrasing.

Building the Moderation Workflow

Detection is only half the problem. You need a workflow that doesn't burn out your mods or create false-positive drama.

Three-tier queue system:

  1. Auto-pass (score < 0.4): Post goes live immediately.
  2. Soft-hold (0.4–0.75): Post is visible to the author, pending mod review within 4 hours. No public flag.
  3. Hard-hold (> 0.75): Post held, author gets a generic "under review" notice. Mod reviews within 1 hour.

Never auto-delete. Your model will be wrong sometimes, and public auto-deletion creates the worst kind of community drama.

Feedback loop: When mods override your filter (pass a flagged post or hold a passing one), log it. Retrain your heuristics monthly. After 60 days of feedback loops, most teams get detection accuracy above 85% on their specific platform.

Account-level signals: A user who has 3 posts flagged in 30 days gets soft-rate-limited regardless of per-post scores. Behavioral signals compound the per-post analysis significantly.


Handling Edge Cases Without Breaking Your Community

The hardest part isn't detection — it's handling legitimate users who write in AI-adjacent styles (non-native English speakers, people with certain cognitive styles, technical writers trained on formal prose).

Do:

  • Weight account history heavily. A 2-year member with 400 posts gets benefit of the doubt; a 3-day account doesn't.
  • Create an appeal flow. One click, 48-hour turnaround, human review. Communicated clearly.
  • Publish your content quality standards explicitly. "We value specificity, personal experience, and concrete examples" gives users something to aim for.

Don't:

  • Show users their score. You'll just train the bad actors to optimize for your metrics.
  • Apply the same thresholds to all content types. A job listing has different norms than a technical tutorial.
  • Treat this as a solved problem. The models get better; your filter needs to evolve with them.

Putting It Together

The practical reality: no single tool catches everything, and chasing 100% accuracy will cost you more in false positives than you gain in slop removal. Aim for 80% catch rate with a low false-positive rate, build human review into the loop, and iterate.

Start with the heuristics, add one commercial API for calibration, build the three-tier queue, and create the feedback loop. You'll have something useful in production within two weeks.

I compiled everything into a practical guide: AI Content Filter: The Practitioner's Playbook

Top comments (0)