Roger Rosset

Posted on Feb 22

Adding Content Moderation to a SvelteKit App with OpenAI's Moderation API

#sveltekit #cloudflare #openai #moderation

When you build a platform where users can submit free-text content, it's only a matter of time before someone tries to post something nasty. My project, AutoFeedback, a European car review platform built with SvelteKit and deployed on Cloudflare Pages, was no exception. I needed a way to catch harmful content before it ever reached the database, without over-blocking legitimate (even strongly-worded) car criticism.

Here's how I did it in under 70 lines of server-side code.

Why Not Just Block Bad Words?

The first instinct most developers have, myself included, is to build a keyword blocklist. Maintain a list of slurs, flag anything that matches. Simple, right?

In practice, it falls apart fast. Consider hate speech: it's not just a list of banned words. Hate speech relies on context, phrasing, and intent. Someone might write "people like that shouldn't be allowed to drive", no slurs, no flagged keywords, but clearly hateful depending on context. Or they might use coded language, deliberate misspellings, or Unicode tricks to bypass your list. You'd be playing an endless game of whack-a-mole, maintaining an ever-growing blocklist across multiple languages (AutoFeedback supports six), and still missing things.

And then there's the flip side: false positives. A car review that says "this car is a killer deal" or "the acceleration is insane, it absolutely murders the competition", perfectly legitimate, but a naive keyword filter would flag them. On a car review site, people use strong, emotional language all the time. That's the whole point.

Here's a quick comparison:

	Keyword Blocklist	AI Moderation
Hate speech detection	Catches known slurs only	Understands context, coded language, intent
Multi-language support	Need separate lists per language	Works across languages out of the box
Maintenance	Constant manual updates	Model improves automatically
False positives	High ("killer deal", "this car is a beast")	Very low, understands product context
Evasion	Trivial (misspellings, Unicode, spacing)	Much harder to circumvent
Setup time	Hours of curating word lists	~70 lines of code, one API call

So I looked at proper moderation APIs:

Perspective API (Google/Jigsaw), solid and battle-tested, but another Google dependency, more complex setup with API key provisioning, and a separate scoring model you need to interpret yourself
OpenAI Moderation, free to use (even without a paid OpenAI plan), purpose-built for exactly this use case, and supports their latest omni-moderation-latest model with a dead-simple API

I went with OpenAI. The endpoint checks text against categories like hate speech, harassment, self-harm, sexual content, and violence. It returns both a boolean flagged result and per-category scores. Critically for a car review site, it won't flag someone saying "this engine is terrible" or "worst purchase of my life", it understands the difference between a frustrated car owner and actual harmful content.

The Architecture

My stack:

SvelteKit for the full-stack framework
Cloudflare Pages (Workers) for hosting
Cloudflare D1 for the database
Server-side form actions for review submission

The moderation check sits between form validation and database insertion, a simple gate that rejects flagged content with a 400 error.

User submits review
    → Turnstile CAPTCHA verification
    → Zod schema validation
    → OpenAI moderation check
    → Insert into D1 database
    → Redirect to model page

The Moderation Utility

I created a single utility file at src/lib/server/moderation.ts:

interface ModerationResult {
  flagged: boolean;
  categories?: Record<string, boolean>;
}

export async function moderateReview(
  text: string,
  apiKey: string,
): Promise<ModerationResult> {
  // Fail open, if no API key, allow the review
  if (!apiKey || !text.trim()) {
    return { flagged: false };
  }

  try {
    const response = await fetch("https://api.openai.com/v1/moderations", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${apiKey}`,
      },
      body: JSON.stringify({
        model: "omni-moderation-latest",
        input: text,
      }),
    });

    if (!response.ok) {
      console.error(
        `[Moderation] API error: ${response.status} ${response.statusText}`,
      );
      return { flagged: false }; // Fail open
    }

    const data = await response.json();
    const result = data.results?.[0];

    if (!result) {
      console.error("[Moderation] No results in response");
      return { flagged: false };
    }

    if (result.flagged) {
      const triggered = Object.entries(result.categories)
        .filter(([, v]) => v)
        .map(([k]) => k);
      console.warn(
        `[Moderation] Content flagged. Categories: ${triggered.join(", ")}`,
      );
    }

    return {
      flagged: result.flagged,
      categories: result.categories,
    };
  } catch (err) {
    console.error("[Moderation] Failed to call API:", err);
    return { flagged: false }; // Fail open
  }
}

Key Design Decisions

1. Fail open, not closed. If the OpenAI API is down, slow, or the API key isn't configured, reviews go through. I'd rather have one bad review slip past than block every legitimate user because of a third-party outage.

2. Log flagged categories. When content is rejected, I log exactly which categories were triggered. This helps me understand what's being caught and tune if needed.

3. Single concatenated string. Rather than making separate API calls for each field, I concatenate all text fields into one string. One API call, one latency hit.

Integrating Into the Form Action

In my SvelteKit form action (+page.server.ts for the review page), the integration is minimal:

import { moderateReview } from "$lib/server/moderation";

// ... inside the form action, after Zod validation passes:

// Content moderation, check text fields before storing
const apiKey = platform?.env?.OPENAI_API_KEY;
if (apiKey) {
  const { recommendation, pros, cons, summary_line } = parseResult.data;
  const textToModerate = [recommendation, pros, cons, summary_line]
    .filter(Boolean)
    .join("\n");
  const modResult = await moderateReview(textToModerate, apiKey);
  if (modResult.flagged) {
    return fail(400, {
      error:
        "Your review contains content that violates our guidelines. Please revise and try again.",
      values: data,
    });
  }
}

// If we get here, content is clean, proceed with createReview()

That's it. The user sees the same error format as a validation failure, their form values are preserved, and they can edit and resubmit.

Setting Up the API Key on Cloudflare

The OPENAI_API_KEY is stored as a Cloudflare Pages secret, never in code, never in wrangler.toml:

npx wrangler pages secret put OPENAI_API_KEY

Then in app.d.ts, I declare the type so TypeScript knows about it:

interface Platform {
  env?: {
    DB: D1Database;
    OPENAI_API_KEY: string;
    // ... other bindings
  };
}

What It Catches (and Doesn't)

Blocked:

Hate speech, including subtle, context-dependent phrasing that no keyword list would ever catch
Slurs and derogatory language targeting protected groups
Threats and incitement to violence
Sexually explicit content
Graphic violence descriptions
Self-harm content

Allowed (correctly):

"This car is absolute garbage"
"Worst money I ever spent, the dealer was useless"
"The engine sounds like it's dying"
"This thing is a death trap on wheels"
"It kills every competitor in its class"
Strong but legitimate criticism with mild profanity

This is where the AI approach really shines compared to a blocklist. A keyword filter would choke on half of those "allowed" examples. Meanwhile, the omni-moderation-latest model understands that someone ranting about their unreliable Renault is not the same as someone posting genuinely harmful content, even if both use aggressive language. In my testing across multiple languages, I haven't hit a single false positive on legitimate car reviews.

Performance Impact

The OpenAI moderation endpoint is fast, typically 50-150ms from Cloudflare Workers. Since this only runs on form submission (not page loads), users barely notice. The total review submission flow goes from ~200ms to ~300ms. Completely acceptable.

What I'd Do Differently

Rate limiting per user, Currently, a determined user could keep tweaking phrasing to bypass moderation. Adding per-user rate limits on review submissions would help.
Category-specific thresholds, The API returns confidence scores per category. I could allow borderline content in some categories (like harassment/threatening with low confidence) while being stricter on others (like sexual/minors).
Moderation queue, Instead of outright rejecting, I could put flagged reviews in a moderation queue for manual review. But for a small project, the binary accept/reject is simpler.

Wrapping Up

Adding content moderation to AutoFeedback took about an hour of work:

~70 lines for the moderation utility
~10 lines to integrate it into the form action
A few lines of translation strings per language
One wrangler secret command for the API key

The OpenAI moderation endpoint is free, fast, and remarkably accurate for UGC platforms. If you're building anything where users submit text, reviews, comments, forums, this is one of the easiest safety nets you can add.

DEV Community