ahmet gedik

Posted on Jun 22

Building a Video Metadata Moderation Pipeline with Python and Claude API

#python #moderation #claude #backend

Ingesting tens of thousands of video records a day means you never get to assume the metadata is clean. Titles arrive in a dozen languages. Some are emoji walls engineered for clicks, some are spam redirects pointing at sketchy off-platform links, and some describe content that has no business on a family-friendly discovery homepage. At DailyWatch we pull public video metadata, store it in SQLite with an FTS5 index, and serve it through a PHP 8.4 stack behind LiteSpeed and Cloudflare. Fetching the data was never the hard part. The hard part was deciding — at volume, and without lighting money on fire — which records are safe to surface.

We tried the obvious thing first: keyword blocklists and regex. It failed in both directions, and that failure is what pushed us toward a layered pipeline with the Claude API doing the judgment calls. Here is the whole thing, including the parts that broke.

Why rule-based filtering falls apart

A blocklist is seductive because it is cheap and deterministic. It is also wrong constantly:

False positives nuke legitimate content. A blocklist that catches a slur substring also kills "Scunthorpe," cooking videos containing "breast," and medical explainers. We lost an entire category of legitimate health content to a 40-line regex before anyone noticed.
False negatives are trivial to slip past. Spammers do not write banned words plainly. They use leetspeak, zero-width joiners, homoglyphs, and language switching mid-title. Regex chases this forever and never wins.
Context is invisible to patterns. "How to field dress a deer" and an actual gore video share vocabulary. A title alone cannot separate them; you need to weigh title, description, channel, and category together.

What we actually needed was a system that reads metadata the way a human reviewer would — holistically, with context — but runs on every record automatically. That is a language-model job. The trick is doing it without firing a paid API request at all 30k daily records.

Pipeline shape

The pipeline has three stages, ordered cheapest-first so the expensive model only ever sees what the cheap layers cannot resolve:

Deterministic pre-filter (free, Python). Heuristics that catch the obvious garbage and, crucially, auto-approve the obviously fine. Roughly 70% of records never reach the API.
LLM adjudication (Claude API, batched). The ambiguous remainder goes to Claude with a structured tool-call schema, so we get machine-parseable verdicts instead of prose.
Persistence and serving (PHP/SQLite, with a Go fast path). Verdicts land in SQLite alongside the FTS5 index, and the serving layer filters on a single indexed column.

Each stage can short-circuit. The goal is to spend tokens only on genuine ambiguity.

Stage 1 — the free pre-filter

Before any token is spent, a Python pass classifies records into approve, reject, or review. Only review continues. This stage is dumb on purpose — it just needs high precision on the extremes.

import re
import unicodedata
from dataclasses import dataclass
from enum import Enum

class Verdict(str, Enum):
    APPROVE = "approve"
    REJECT = "reject"
    REVIEW = "review"

@dataclass
class Video:
    video_id: str
    title: str
    description: str
    channel: str
    category: str

# Hard reject: known spam-link patterns in the description.
SPAM_LINK = re.compile(
    r"(bit\.ly|tinyurl|t\.me/|join now|free v[i1]p|whatsapp\s*\+?\d)",
    re.IGNORECASE,
)
# Emoji / symbol density is a strong clickbait signal.
EMOJI = re.compile(
    "[\U0001F000-\U0001FAFF\U00002600-\U000027BF\U0001F1E6-\U0001F1FF]"
)

def normalize(text: str) -> str:
    # Strip zero-width format chars, then fold homoglyphs toward ASCII.
    text = "".join(c for c in text if unicodedata.category(c) != "Cf")
    return unicodedata.normalize("NFKC", text)

def emoji_ratio(text: str) -> float:
    if not text:
        return 0.0
    return len(EMOJI.findall(text)) / max(len(text), 1)

def pre_filter(v: Video) -> Verdict:
    title = normalize(v.title)
    desc = normalize(v.description)

    if SPAM_LINK.search(desc):
        return Verdict.REJECT
    if emoji_ratio(title) > 0.25:
        return Verdict.REJECT
    if not title.strip() or len(title) < 3:
        return Verdict.REJECT

    # Obviously safe: clean title, known-good category, no red flags.
    safe_cats = {"Music", "Sports", "Education", "Science & Technology"}
    if v.category in safe_cats and emoji_ratio(title) < 0.05 and len(desc) < 5000:
        return Verdict.APPROVE

    return Verdict.REVIEW

The normalize step matters more than the regex does. Folding Unicode with NFKC and dropping format characters (category Cf) defeats most homoglyph and zero-width tricks before any pattern even runs. Everything that survives as REVIEW is genuinely ambiguous, and that ambiguous slice is the only thing we pay for.

Stage 2 — adjudicating with the Claude API

For the ambiguous set, we want a verdict we can trust and parse. Two decisions made this reliable.

Force a tool call instead of free text. A schema-constrained tool returns a typed object every time — no regex-parsing the model's prose, no "Sure, here is my analysis" preamble.
Cache the system prompt. The policy definition is long and identical on every request. With prompt caching we write it once and read it from cache on every subsequent call, which on a moderation workload is exactly where the savings live.

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
MODEL = "claude-haiku-4-5"

POLICY = """You are a content moderation classifier for a family-friendly
video discovery platform. Judge each video using its title, description,
channel, and category TOGETHER, never in isolation.

Reject when content is: sexual, graphic violence or gore, hate or
harassment, self-harm promotion, illegal-goods sales, or scam/spam.
Approve mainstream entertainment, music, gaming, education, news, sports.
When genuinely uncertain, prefer 'review' over guessing.
"""

VERDICT_TOOL = {
    "name": "record_verdict",
    "description": "Record the moderation verdict for one video.",
    "input_schema": {
        "type": "object",
        "properties": {
            "verdict": {"type": "string", "enum": ["approve", "reject", "review"]},
            "category": {
                "type": "string",
                "enum": ["safe", "sexual", "violence", "hate",
                         "self_harm", "illegal", "spam", "uncertain"],
            },
            "confidence": {"type": "number", "minimum": 0, "maximum": 1},
            "reason": {"type": "string", "maxLength": 200},
        },
        "required": ["verdict", "category", "confidence", "reason"],
    },
}

def adjudicate(v: dict) -> dict:
    prompt = (
        f"Title: {v['title']}\n"
        f"Channel: {v['channel']}\n"
        f"Category: {v['category']}\n"
        f"Description: {v['description'][:1500]}"
    )
    msg = client.messages.create(
        model=MODEL,
        max_tokens=300,
        system=[{
            "type": "text",
            "text": POLICY,
            "cache_control": {"type": "ephemeral"},  # cache the long policy
        }],
        tools=[VERDICT_TOOL],
        tool_choice={"type": "tool", "name": "record_verdict"},
        messages=[{"role": "user", "content": prompt}],
    )
    block = next(b for b in msg.content if b.type == "tool_use")
    return block.input

tool_choice forces the model to call record_verdict, so msg.content always contains a tool_use block with a validated shape. The cache_control marker on the policy block is what turns this from an expensive experiment into a sane line item — the policy text is served from cache instead of being reprocessed on every record.

One more thing that earns its keep: a confidence threshold. We do not auto-act on low-confidence verdicts. Anything under roughly 0.8 is routed to a human queue. The model is explicitly allowed to say "I am not sure," and we listen when it does.

def decide(v: dict) -> str:
    result = adjudicate(v)
    if result["confidence"] < 0.8 or result["verdict"] == "review":
        return "human_queue"
    return result["verdict"]

Stage 3 — persisting verdicts in SQLite

Verdicts live in the same SQLite database that powers search, right next to the FTS5 index. The serving layer never recomputes anything; it reads one indexed column.

<?php
declare(strict_types=1);

final class ModerationStore
{
    public function __construct(private readonly PDO $db) {}

    public function migrate(): void
    {
        $this->db->exec("ALTER TABLE videos ADD COLUMN mod_verdict TEXT DEFAULT 'review'");
        $this->db->exec("ALTER TABLE videos ADD COLUMN mod_category TEXT");
        $this->db->exec("ALTER TABLE videos ADD COLUMN mod_confidence REAL");
        $this->db->exec("ALTER TABLE videos ADD COLUMN mod_checked_at INTEGER");

        // Partial index keeps the hot "approved only" lookup tiny.
        $this->db->exec(
            "CREATE INDEX IF NOT EXISTS idx_mod_approved
             ON videos (mod_verdict) WHERE mod_verdict = 'approve'"
        );
    }

    public function saveVerdict(string $videoId, array $v): void
    {
        $stmt = $this->db->prepare(
            "UPDATE videos
                SET mod_verdict    = :verdict,
                    mod_category   = :category,
                    mod_confidence = :confidence,
                    mod_checked_at = :ts
              WHERE video_id = :id"
        );
        $stmt->execute([
            ':verdict'    => $v['verdict'],
            ':category'   => $v['category'],
            ':confidence' => $v['confidence'],
            ':ts'         => time(),
            ':id'         => $videoId,
        ]);
    }
}

The partial index is the detail that makes serving cheap. Because the homepage only ever queries WHERE mod_verdict = 'approve', an index restricted to that single value stays small and hot even as the table grows into the millions of rows. The FTS5 search query joins against the same column, so moderation collapses into one extra AND mod_verdict = 'approve' clause — no separate filtering pass, no second table.

The hot path — a Go fallback guard

LiteSpeed serves cached pages, but cache misses still hit PHP, and during a cron-triggered ingest we occasionally serve freshly fetched records before the async pipeline has tagged them. For that window we keep a tiny Go sidecar that re-applies the Stage-1 heuristics in-process, so an un-moderated record is never served raw.

package moderation

import (
    "regexp"
    "strings"
)

var spamLink = regexp.MustCompile(`(?i)(bit\.ly|tinyurl|t\.me/|free v[i1]p)`)

// SafeToServe is a last-resort guard for records that reached the
// serving layer before the async pipeline tagged them.
func SafeToServe(title, desc, verdict string) bool {
    switch verdict {
    case "approve":
        return true
    case "reject", "human_queue":
        return false
    }

    // verdict == "review" or empty: apply conservative heuristics.
    if spamLink.MatchString(desc) {
        return false
    }
    if emojiRatio(title) > 0.25 || len(strings.TrimSpace(title)) < 3 {
        return false
    }
    return false // fail closed: an unresolved verdict is never auto-served
}

func emojiRatio(s string) float64 {
    runes := []rune(s)
    if len(runes) == 0 {
        return 0
    }
    count := 0
    for _, r := range runes {
        if r >= 0x1F000 && r <= 0x1FAFF {
            count++
        }
    }
    return float64(count) / float64(len(runes))
}

The important line is the final return false. The guard fails closed: anything it is not certain about is withheld, never shown. Withholding a borderline-fine video costs us almost nothing; serving a genuinely bad one costs us trust with users and ad partners alike. When in doubt, the safe default is invisible.

Running it as an async batch worker

The three stages do not run inline with ingestion, and that separation is deliberate. Our cron job fetches video metadata, writes raw rows into SQLite with a default mod_verdict of review, and returns immediately. A separate worker process then drains the review backlog at its own pace. Decoupling ingestion from moderation means a slow or rate-limited API call never blocks the fetch loop, and a backlog spike just means verdicts land a few minutes later rather than the ingest grinding to a halt.

Throughput came down to two knobs. The first is concurrency: the adjudication calls are I/O-bound, so we run a bounded pool of workers rather than firing requests serially. A semaphore caps in-flight requests so we stay under the API rate limit while keeping the pipe full. The second is ordering — we adjudicate newest records first, because those are the ones most likely to be requested while still warm, and we let the long tail of older review rows trickle through behind them.

A few practical guards live in the worker loop itself:

Retry with backoff on transient errors. Rate-limit and overload responses are retried with exponential backoff; a record that fails repeatedly is parked in the human queue rather than dropped, so nothing silently vanishes.
Bounded description length. We truncate descriptions to 1500 characters before sending them. Past that point the signal for a moderation decision is gone, and the extra tokens are pure waste on every single call.
Structured logging of every verdict. Each decision is logged with the video_id, verdict, category, and confidence. When someone asks "why was this video hidden," the answer is a single indexed log lookup, not a re-run of the model.

The worker is genuinely boring, and that is the point. All the interesting judgment lives in the policy prompt and the schema; the loop around it just needs to be reliable, observable, and easy to pause when we are tuning the policy.

Cost, batching, and the things that bit us

A few hard-won operational notes from running this in production:

The pre-filter ratio is the whole ballgame. Pushing the deterministic stage from "catches obvious spam" to "also auto-approves obvious safe content" cut our API volume by about 70%. Tune the cheap layer before you optimize the expensive one.
Prompt caching is not optional at this scale. The policy block is the largest part of each request and never changes between calls. Caching it is the difference between a workable budget and a comical one.
Confidence thresholds prevent silent disasters. Auto-acting only above 0.8 and queuing the rest for humans meant a bad policy edit degraded gracefully into a longer review queue instead of a wave of wrongly-rejected videos.
Re-moderate on policy change, not on every fetch. mod_checked_at lets us re-run only records older than the latest policy version, so tweaking the policy does not mean re-paying for the entire catalog.
Idempotency keyed on video_id. Re-ingesting the same video must not re-spend a token when its verdict is still fresh. The pipeline checks mod_checked_at before queuing anything for the API at all.

Conclusion

The pipeline that actually works is not "send everything to a language model." It is a funnel: deterministic heuristics dispatch the obvious cases for free, the Claude API adjudicates only the genuine ambiguity with a schema-constrained tool call and a cached policy, and the serving layer reads a single indexed verdict column that fails closed on anything uncertain. That layering is what makes language-model-grade judgment affordable on a free product. If you run anything that ingests untrusted metadata at volume, resist the urge to either over-trust regex or over-spend on the model — put the cheap layer first, force structured output, cache the constant parts, and always fail closed. That combination has kept our homepage clean without a human watching every record.

DEV Community