When a spam title outranks your best documentary
Every night a cron job pulls a few thousand fresh video records from the YouTube Data API into DailyWatch, the free video discovery platform I run. Each record is the usual shape: title, description, channel name, tags, a thumbnail URL. On paper it's clean JSON. In practice, three to five percent of it is garbage — engagement-bait titles in ALL CAPS, descriptions stuffed with forty hashtags and three Telegram links, reupload channels impersonating real creators, and the occasional clip whose title is perfectly safe but whose description very much is not.
SQLite's FTS5 index does not care about any of that. It will happily tokenize 🔥🔥 FREE $$$ CLAIM NOW 🔥🔥 and rank it right next to a legitimate documentary. Discovery is the entire product here, so a poisoned index is an existential problem, not a cosmetic one. This post is how I built the moderation stage that sits between ingestion and indexing: a small Python worker that asks the Claude API for a structured verdict on each record, and the PHP query gate that enforces those verdicts at read time.
Why a keyword blocklist wasn't enough
I started where everyone starts: a regex blocklist. It fails in the same two directions for everyone who tries it.
False negatives, because spammers rotate obfuscations faster than you can enumerate them. free becomes f r e e, then fr€e, then the same word with a zero-width joiner spliced into the middle. Maintaining that pattern file is a second unpaid job, and you are always one step behind.
False positives, because substring matching has no idea what a phrase means. A documentary called The Free Speech Wars gets nuked by a naive match on a payment scam keyword. A cooking channel literally named The Naked Chef trips the adult filter. Context is the whole game, and a blocklist has exactly none of it.
A language model reads The Naked Chef as a cooking brand and send $500 to this Telegram to claim your prize as a scam without me writing a single rule for either. That is the trade I wanted: stop maintaining an arms race of patterns, and start describing a policy in plain English once.
Shape of the pipeline
The design goal was that the two halves never talk to each other directly — they're decoupled through the database:
-
Ingest. The fetch cron writes raw YouTube metadata into a
videostable withmoderation_statusleftNULL. -
Moderate. A Python worker pulls
NULLrows in batches, sends each to Claude, and writes back a verdict. -
Index. Only rows with
moderation_status = 'allow'get copied into the FTS5 virtual table. - Serve. The PHP front end joins every search hit against the verdict column, so blocked content stays invisible even if it slipped into the index during an earlier build.
The important property of that layout is failure isolation. Python never calls PHP. If the moderation worker is down, ingestion keeps running and rows simply pile up with NULL status — they just don't get indexed yet. Nothing user-facing breaks, because discovery only ever reads vetted rows. An outage costs you freshness, never correctness.
Here's the read side of the worker. It grabs a batch of unmoderated rows and hydrates them into a dataclass:
import sqlite3
from dataclasses import dataclass
DB_PATH = '/var/www/dailywatch/data/videos.db'
@dataclass
class VideoRecord:
id: str
title: str
description: str
channel_title: str
tags: str
def fetch_unmoderated(limit: int = 200) -> list[VideoRecord]:
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
rows = conn.execute(
'''
SELECT id, title, description, channel_title, tags
FROM videos
WHERE moderation_status IS NULL
ORDER BY fetched_at DESC
LIMIT ?
''',
(limit,),
).fetchall()
conn.close()
return [VideoRecord(**dict(r)) for r in rows]
The WHERE moderation_status IS NULL clause is doing quiet double duty: it selects work, and it's also what makes the whole worker idempotent, which matters later.
Getting structured verdicts out of the model
The naive way to call an LLM for classification is to ask it to "reply with JSON" and then parse the reply. Don't. You end up writing defensive code to strip markdown fences, handle a stray Sure, here's the JSON: preamble, and cope with a hallucinated field. The Claude API has a cleaner mechanism: define a tool with a JSON schema and force the model to call it. The response comes back already shaped, already type-constrained, no parsing.
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
MODEL = 'claude-haiku-4-5-20251001'
MODERATION_TOOL = {
'name': 'record_moderation',
'description': "Record the moderation verdict for one video's metadata.",
'input_schema': {
'type': 'object',
'properties': {
'verdict': {
'type': 'string',
'enum': ['allow', 'review', 'block'],
'description': 'allow = safe to index; review = hold for a human; block = never surface.',
},
'categories': {
'type': 'array',
'items': {
'type': 'string',
'enum': ['spam', 'scam', 'adult', 'violence', 'hate', 'impersonation', 'misleading'],
},
},
'confidence': {'type': 'number'},
'reason': {'type': 'string'},
},
'required': ['verdict', 'categories', 'confidence', 'reason'],
},
}
SYSTEM = (
'You are a content moderation classifier for a general-audience video '
'discovery site. Judge only the supplied metadata (title, description, '
'tags), not the video itself. Be strict about scams and adult content, '
'lenient about strong language. Always call record_moderation exactly once.'
)
def moderate(video) -> dict:
prompt = (
f'Title: {video.title}\n'
f'Channel: {video.channel_title}\n'
f'Tags: {video.tags}\n'
f'Description:\n{video.description[:2000]}'
)
msg = client.messages.create(
model=MODEL,
max_tokens=512,
system=SYSTEM,
tools=[MODERATION_TOOL],
tool_choice={'type': 'tool', 'name': 'record_moderation'},
messages=[{'role': 'user', 'content': prompt}],
)
for block in msg.content:
if block.type == 'tool_use' and block.name == 'record_moderation':
return block.input
raise RuntimeError(f'No verdict for {video.id}')
Two choices in there are worth calling out. First, tool_choice={'type': 'tool', 'name': 'record_moderation'} forces the call — the model cannot reply with prose, so block.input is guaranteed to match the schema. Second, the three-way verdict enum. A binary allow/block is a trap, because the interesting cases are the borderline ones, and auto-blocking them means you never see them. The review bucket routes ambiguous records to a human queue, and that queue is where I actually learn how to tune the rubric.
I deliberately use Haiku here. Moderation against a tight rubric is a classification task, not open-ended reasoning, so the cheapest tier handles it at the throughput I need. I keep the system prompt and the tool schema byte-for-byte stable across the run, which lets prompt caching kick in — every request after the first reuses that cached prefix at a fraction of the input token rate, and since the system block and tool definition dominate my input, most of the batch is cache hits.
Running it as a batch, safely
This job runs at 03:00 and nobody is waiting on it, so it's a batch, not a service. The bottleneck is API round-trip latency rather than CPU, which means a modest thread pool gets me most of the throughput without any async ceremony. Results go back in a single executemany transaction:
import concurrent.futures as cf
import sqlite3
def persist(results: list[tuple[str, dict]]) -> None:
conn = sqlite3.connect(DB_PATH)
conn.executemany(
'''
UPDATE videos
SET moderation_status = ?,
moderation_categories = ?,
moderation_confidence = ?,
moderated_at = strftime('%s', 'now')
WHERE id = ?
''',
[
(r['verdict'], ','.join(r['categories']), r['confidence'], vid)
for vid, r in results
],
)
conn.commit()
conn.close()
def run_batch(limit: int = 200) -> None:
videos = fetch_unmoderated(limit)
results: list[tuple[str, dict]] = []
with cf.ThreadPoolExecutor(max_workers=8) as pool:
futures = {pool.submit(moderate, v): v for v in videos}
for fut in cf.as_completed(futures):
v = futures[fut]
try:
results.append((v.id, fut.result()))
except Exception as exc:
print(f'skip {v.id}: {exc}')
persist(results)
print(f'moderated {len(results)}/{len(videos)}')
if __name__ == '__main__':
run_batch()
Idempotency falls out of the schema for free. A crashed run leaves the un-processed rows with moderation_status still NULL, so the next invocation picks them up. An individual API failure is caught, logged, and skipped — same outcome, that one row retries tomorrow. At a few thousand records a night I don't need a dead-letter queue or a retry table; the NULL sentinel is the queue.
Because the nightly run isn't latency-bound, I actually push the bulk of it through the Message Batches API rather than these synchronous calls — identical prompts, results back within the hour, at half the per-token price. The threaded version above is what I run for small intraday catch-up batches when a fetch lands during the day. Both paths call the same persist(), so there's one write path regardless of how the verdicts were produced.
Wiring the verdict into search
The front end is PHP 8.4 talking to SQLite over PDO, with an FTS5 virtual table for full-text search. The moderation gate is a single join condition — everything else is ordinary bm25-ranked search:
<?php
declare(strict_types=1);
final class VideoSearch
{
public function __construct(private readonly PDO $db) {}
/** @return array<int,array<string,mixed>> */
public function search(string $query, int $limit = 30): array
{
$stmt = $this->db->prepare(<<<SQL
SELECT v.id, v.title, v.channel_title, v.thumbnail_url
FROM videos_fts f
JOIN videos v ON v.id = f.rowid
WHERE f.videos_fts MATCH :q
AND v.moderation_status = 'allow'
ORDER BY bm25(f.videos_fts)
LIMIT :limit
SQL);
$stmt->bindValue(':q', $this->sanitize($query));
$stmt->bindValue(':limit', $limit, PDO::PARAM_INT);
$stmt->execute();
return $stmt->fetchAll(PDO::FETCH_ASSOC);
}
private function sanitize(string $raw): string
{
// Quote each token so FTS5 treats punctuation literally.
$tokens = preg_split('/\s+/', trim($raw)) ?: [];
$safe = array_map(
static fn(string $t): string => '"' . str_replace('"', '', $t) . '"',
array_filter($tokens),
);
return implode(' ', $safe);
}
}
The load-bearing line is AND v.moderation_status = 'allow'. Even though the indexer only ever copies allow rows into videos_fts, re-checking the live status column at query time is deliberate belt-and-suspenders. If I flip a row to block from the admin panel — say a human working the review queue catches something — it disappears from results on the very next request, no reindex required. The verdict is enforced where it's read, not just where it's written.
One small index keeps the moderation lookups cheap. A partial index only covers rows that have actually been judged, which is most of the table over time but skips the fresh NULL backlog:
CREATE INDEX IF NOT EXISTS idx_videos_moderation
ON videos (moderation_status)
WHERE moderation_status IS NOT NULL;
There's a happy side effect from the rest of my stack. Because a blocked row never renders into a page, it never enters the LiteSpeed page cache and never gets pushed to Cloudflare's edge. The moderation decision propagates through every caching layer I already run without a single explicit purge — the cheapest cache invalidation is the page that was never built.
What I'd tell my past self
- Describe policy, don't enumerate patterns. A regex blocklist is technical debt from the first commit, and the maintenance never ends.
-
Force structured output with
tool_choice. Never parse prose out of an LLM when the API can hand you a schema-validated object instead. -
Decouple through the database. A moderation outage should stall indexing, never ingestion. A
NULLsentinel column is a perfectly good work queue at this scale. - Keep the prompt byte-stable and lean on caching. A frozen system prompt plus tool schema turns almost every request into a cache hit, and the Batch API halves the bill for anything that isn't latency-bound.
- Re-check the verdict at query time, not just at index time. Policy changes shouldn't require a reindex to take effect.
-
Route borderline cases to a human, don't auto-block them. The
reviewqueue is where you discover what your rubric is actually missing.
The whole moderation layer is maybe 150 lines of Python and one extra WHERE clause of PHP, and it replaced a regex file I was quietly afraid to touch. Metadata moderation is not a solve-once problem — spammers adapt, and so must the rubric — but when the policy lives in a paragraph of a system prompt instead of a thousand-line pattern list, adapting means editing a sentence, not chasing Unicode homoglyphs at midnight.
Top comments (0)