When you build a platform where users can submit free-text content, it's only a matter of time before someone tries to post something nasty. My project, AutoFeedback, a European car review platform built with SvelteKit and deployed on Cloudflare Pages, was no exception. I needed a way to catch harmful content before it ever reached the database, without over-blocking legitimate (even strongly-worded) car criticism.
Here's how I did it in under 70 lines of server-side code.
Why Not Just Block Bad Words?
The first instinct most developers have, myself included, is to build a keyword blocklist. Maintain a list of slurs, flag anything that matches. Simple, right?
In practice, it falls apart fast. Consider hate speech: it's not just a list of banned words. Hate speech relies on context, phrasing, and intent. Someone might write "people like that shouldn't be allowed to drive", no slurs, no flagged keywords, but clearly hateful depending on context. Or they might use coded language, deliberate misspellings, or Unicode tricks to bypass your list. You'd be playing an endless game of whack-a-mole, maintaining an ever-growing blocklist across multiple languages (AutoFeedback supports six), and still missing things.
And then there's the flip side: false positives. A car review that says "this car is a killer deal" or "the acceleration is insane, it absolutely murders the competition", perfectly legitimate, but a naive keyword filter would flag them. On a car review site, people use strong, emotional language all the time. That's the whole point.
Here's a quick comparison:
| Keyword Blocklist | AI Moderation | |
|---|---|---|
| Hate speech detection | Catches known slurs only | Understands context, coded language, intent |
| Multi-language support | Need separate lists per language | Works across languages out of the box |
| Maintenance | Constant manual updates | Model improves automatically |
| False positives | High ("killer deal", "this car is a beast") | Very low, understands product context |
| Evasion | Trivial (misspellings, Unicode, spacing) | Much harder to circumvent |
| Setup time | Hours of curating word lists | ~70 lines of code, one API call |
So I looked at proper moderation APIs:
- Perspective API (Google/Jigsaw), solid and battle-tested, but another Google dependency, more complex setup with API key provisioning, and a separate scoring model you need to interpret yourself
-
OpenAI Moderation, free to use (even without a paid OpenAI plan), purpose-built for exactly this use case, and supports their latest
omni-moderation-latestmodel with a dead-simple API
I went with OpenAI. The endpoint checks text against categories like hate speech, harassment, self-harm, sexual content, and violence. It returns both a boolean flagged result and per-category scores. Critically for a car review site, it won't flag someone saying "this engine is terrible" or "worst purchase of my life", it understands the difference between a frustrated car owner and actual harmful content.
The Architecture
My stack:
- SvelteKit for the full-stack framework
- Cloudflare Pages (Workers) for hosting
- Cloudflare D1 for the database
- Server-side form actions for review submission
The moderation check sits between form validation and database insertion, a simple gate that rejects flagged content with a 400 error.
User submits review
→ Turnstile CAPTCHA verification
→ Zod schema validation
→ OpenAI moderation check
→ Insert into D1 database
→ Redirect to model page
The Moderation Utility
I created a single utility file at src/lib/server/moderation.ts:
interface ModerationResult {
flagged: boolean;
categories?: Record<string, boolean>;
}
export async function moderateReview(
text: string,
apiKey: string,
): Promise<ModerationResult> {
// Fail open, if no API key, allow the review
if (!apiKey || !text.trim()) {
return { flagged: false };
}
try {
const response = await fetch("https://api.openai.com/v1/moderations", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({
model: "omni-moderation-latest",
input: text,
}),
});
if (!response.ok) {
console.error(
`[Moderation] API error: ${response.status} ${response.statusText}`,
);
return { flagged: false }; // Fail open
}
const data = await response.json();
const result = data.results?.[0];
if (!result) {
console.error("[Moderation] No results in response");
return { flagged: false };
}
if (result.flagged) {
const triggered = Object.entries(result.categories)
.filter(([, v]) => v)
.map(([k]) => k);
console.warn(
`[Moderation] Content flagged. Categories: ${triggered.join(", ")}`,
);
}
return {
flagged: result.flagged,
categories: result.categories,
};
} catch (err) {
console.error("[Moderation] Failed to call API:", err);
return { flagged: false }; // Fail open
}
}
Key Design Decisions
1. Fail open, not closed. If the OpenAI API is down, slow, or the API key isn't configured, reviews go through. I'd rather have one bad review slip past than block every legitimate user because of a third-party outage.
2. Log flagged categories. When content is rejected, I log exactly which categories were triggered. This helps me understand what's being caught and tune if needed.
3. Single concatenated string. Rather than making separate API calls for each field, I concatenate all text fields into one string. One API call, one latency hit.
Integrating Into the Form Action
In my SvelteKit form action (+page.server.ts for the review page), the integration is minimal:
import { moderateReview } from "$lib/server/moderation";
// ... inside the form action, after Zod validation passes:
// Content moderation, check text fields before storing
const apiKey = platform?.env?.OPENAI_API_KEY;
if (apiKey) {
const { recommendation, pros, cons, summary_line } = parseResult.data;
const textToModerate = [recommendation, pros, cons, summary_line]
.filter(Boolean)
.join("\n");
const modResult = await moderateReview(textToModerate, apiKey);
if (modResult.flagged) {
return fail(400, {
error:
"Your review contains content that violates our guidelines. Please revise and try again.",
values: data,
});
}
}
// If we get here, content is clean, proceed with createReview()
That's it. The user sees the same error format as a validation failure, their form values are preserved, and they can edit and resubmit.
Setting Up the API Key on Cloudflare
The OPENAI_API_KEY is stored as a Cloudflare Pages secret, never in code, never in wrangler.toml:
npx wrangler pages secret put OPENAI_API_KEY
Then in app.d.ts, I declare the type so TypeScript knows about it:
interface Platform {
env?: {
DB: D1Database;
OPENAI_API_KEY: string;
// ... other bindings
};
}
What It Catches (and Doesn't)
Blocked:
- Hate speech, including subtle, context-dependent phrasing that no keyword list would ever catch
- Slurs and derogatory language targeting protected groups
- Threats and incitement to violence
- Sexually explicit content
- Graphic violence descriptions
- Self-harm content
Allowed (correctly):
- "This car is absolute garbage"
- "Worst money I ever spent, the dealer was useless"
- "The engine sounds like it's dying"
- "This thing is a death trap on wheels"
- "It kills every competitor in its class"
- Strong but legitimate criticism with mild profanity
This is where the AI approach really shines compared to a blocklist. A keyword filter would choke on half of those "allowed" examples. Meanwhile, the omni-moderation-latest model understands that someone ranting about their unreliable Renault is not the same as someone posting genuinely harmful content, even if both use aggressive language. In my testing across multiple languages, I haven't hit a single false positive on legitimate car reviews.
Performance Impact
The OpenAI moderation endpoint is fast, typically 50-150ms from Cloudflare Workers. Since this only runs on form submission (not page loads), users barely notice. The total review submission flow goes from ~200ms to ~300ms. Completely acceptable.
What I'd Do Differently
Rate limiting per user, Currently, a determined user could keep tweaking phrasing to bypass moderation. Adding per-user rate limits on review submissions would help.
Category-specific thresholds, The API returns confidence scores per category. I could allow borderline content in some categories (like
harassment/threateningwith low confidence) while being stricter on others (likesexual/minors).Moderation queue, Instead of outright rejecting, I could put flagged reviews in a moderation queue for manual review. But for a small project, the binary accept/reject is simpler.
Wrapping Up
Adding content moderation to AutoFeedback took about an hour of work:
- ~70 lines for the moderation utility
- ~10 lines to integrate it into the form action
- A few lines of translation strings per language
- One
wrangler secretcommand for the API key
The OpenAI moderation endpoint is free, fast, and remarkably accurate for UGC platforms. If you're building anything where users submit text, reviews, comments, forums, this is one of the easiest safety nets you can add.




Top comments (0)