Content Moderation API: Screen User Uploads Beyond Nudity

#api #python #security #webdev

The moment your product lets users upload images, profile photos, listing pictures, chat attachments, review photos, you inherit a Trust and Safety problem. Someone will upload something that breaks your policy, and if it reaches other users before you catch it, that is a brand and sometimes a legal incident. Manual review does not scale past a trickle of uploads. A content moderation API screens every image automatically and lets a human handle only the genuinely ambiguous ones.

This builds that flow: one call to score an image across the risk categories, a per-category policy engine that turns scores into an allow / review / block decision, and a concurrent batch pattern for real platform volume. All code is Python and runs against a live API.

Want to test it on your own uploads? Try the content moderation API on a sample image.

Moderation is more than a nudity filter

A lot of "NSFW detection" content treats moderation as one yes/no question about nudity. Real platform policy is broader. Weapons and gore, drug paraphernalia, hate symbols, and gambling promotions all break the rules on most platforms, and none of them are nudity. The API returns labels across a broad set of categories (nudity, suggestive content, violence, visually disturbing imagery, drugs and tobacco, alcohol, gambling, hate symbols, rude gestures), each with a confidence score.

That breadth lets your policy distinguish a weapon from a wine bottle instead of lumping everything into "bad."

Step 1: Score an image

Send an image (file or public URL) and get back the labels the model is confident about. A clean image returns an empty list:

import requests

HEADERS = {
    "x-rapidapi-key": "YOUR_API_KEY",
    "x-rapidapi-host": "nsfw-detect3.p.rapidapi.com",
}

def moderate(image_path):
    """Return the moderation labels for an image (empty list means clean)."""
    with open(image_path, "rb") as f:
        r = requests.post(
            "https://nsfw-detect3.p.rapidapi.com/nsfw-detect",
            headers=HEADERS,
            files={"image": f},
        )
    return r.json()["body"]["ModerationLabels"]

Each label has a Name, a Confidence (0 to 100), and a ParentName. Labels are hierarchical and can be a few levels deep (a specific drug label sits under a broader Drugs & Tobacco category). Top-level categories are the ones with an empty ParentName. The reliable way to build a policy is to first see which top-level categories the API actually returns on your content:

labels = moderate("sample_upload.jpg")
categories = [l["Name"] for l in labels if l["ParentName"] == ""]
print(categories)
# ['Gambling']          on a poker photo
# ['Drugs & Tobacco']   on a pills photo
# []                    on a clean photo

Step 2: Turn scores into a policy decision

The labels are not a decision. A wine marketplace allows alcohol; a kids' app blocks it. Your policy lives in a small table that maps each top-level category to an action and a threshold:

# Use the exact Names the API returns for your content (see Step 1).
# A category absent from the table is allowed.
POLICY = {
    "Explicit":                                          {"action": "block",  "threshold": 60},
    "Non-Explicit Nudity of Intimate parts and Kissing": {"action": "block",  "threshold": 60},
    "Violence":                                          {"action": "block",  "threshold": 70},
    "Hate Symbols":                                      {"action": "block",  "threshold": 50},
    "Drugs & Tobacco":                                   {"action": "block",  "threshold": 70},
    "Swimwear or Underwear":                             {"action": "review", "threshold": 60},
    "Gambling":                                          {"action": "review", "threshold": 80},
    # Alcohol is not listed, so it is allowed (e.g. a wine marketplace)
}

def decide(labels):
    """Map moderation labels to a verdict: allow, review, or block."""
    verdict, reasons = "allow", []
    for label in labels:
        if label["ParentName"]:          # match on top-level categories only
            continue
        rule = POLICY.get(label["Name"])
        if not rule or label["Confidence"] < rule["threshold"]:
            continue
        reasons.append(f"{label['Name']} ({label['Confidence']:.0f}%)")
        if rule["action"] == "block":
            verdict = "block"
        elif verdict != "block":
            verdict = "review"
    return verdict, reasons


print(decide(moderate("clean_photo.jpg")))    # ('allow', [])
print(decide(moderate("wine_listing.jpg")))   # ('allow', [])  Alcohol is not in the policy
print(decide(moderate("poker_table.jpg")))    # ('review', ['Gambling (99%)'])

Three ideas make this work at scale. Match on the top-level category (empty ParentName) so a deeply nested label still maps to the right rule. Set thresholds per category. And use three outcomes, not two: block clear violations, allow clean images, and route the uncertain middle to a human.

Curious which categories fire on your content? Run the API on a sample and print the top-level labels.

Step 3: Screen uploads at scale

A live platform does not moderate one image at a time. Screen a batch concurrently and act on each verdict as it lands:

from concurrent.futures import ThreadPoolExecutor, as_completed

def screen(upload):
    """upload: dict with id and local path. Returns the moderation verdict."""
    try:
        verdict, reasons = decide(moderate(upload["path"]))
    except Exception as e:
        # On error, fail safe to review rather than letting content through
        verdict, reasons = "review", [f"moderation error: {e}"]
    return {"id": upload["id"], "verdict": verdict, "reasons": reasons}

def screen_batch(uploads, max_workers=10):
    blocked, review, allowed = [], [], []
    with ThreadPoolExecutor(max_workers=max_workers) as pool:
        futures = [pool.submit(screen, u) for u in uploads]
        for fut in as_completed(futures):
            r = fut.result()
            {"block": blocked, "review": review, "allow": allowed}[r["verdict"]].append(r)
    return blocked, review, allowed

Publish the allowed bucket, hold the blocked bucket, and push the review bucket to your moderation queue. The fail-safe to review on error matters: when the API call fails, you want the image held for a human, not published unchecked.

Honest limits

A moderation API is a strong first line, not the whole system. It scores pixels, not intent, so the same image can be a violation or legitimate news depending on context the model cannot see (that is what the review queue is for). Thresholds are a product decision: strict catches more but annoys real users with false positives. Policy is regional, and images are one surface (text, video, and audio need their own moderation).