The founder called me at 11pm on a Saturday.
His app had a public gallery feature. Users uploaded images, images appeared in the gallery. Simple. Except someone had uploaded explicit content, it had been visible for six hours, and now he had a PR crisis and possibly legal exposure.
"We need moderation," he said. "Yesterday."
Here's the thing: content moderation isn't a nice-to-have when you accept user uploads. It's a requirement. One piece of illegal content can get your app pulled from stores, cost you your payment processor, or land you in actual legal trouble.
You don't have to hire an army of human moderators. You need to automate the obvious stuff.
What Needs Moderation
If users can submit it, you need to check it:
Images:
- NSFW/explicit content
- Violence/gore
- Illegal content
- Spam/ads
Text:
- Profanity
- Hate speech
- Spam/promotional content
- Personal information (for privacy)
Both:
- Context-inappropriate content
- Policy violations specific to your platform
Automation catches 90%+ of problems instantly. The edge cases go to human review.
Image Moderation: The Critical One
Images are the highest risk. An inappropriate image visible for even an hour can cause serious damage.
async function moderateImage(imageUrl) {
const res = await fetch('https://api.apiverve.com/v1/nsfwimagedetector', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.APIVERVE_KEY
},
body: JSON.stringify({ url: imageUrl })
});
const { data } = await res.json();
// NSFW detector returns categories with confidence scores
const isNSFW = data.isNSFW || data.nsfwScore > 0.8;
const needsReview = data.nsfwScore > 0.4 && data.nsfwScore <= 0.8;
return {
safe: !isNSFW,
needsReview,
scores: {
nsfw: data.nsfwScore,
suggestive: data.suggestiveScore,
safe: data.safeScore
},
action: isNSFW ? 'reject' : needsReview ? 'review' : 'approve'
};
}
The key insight: don't use binary approve/reject. Use three buckets:
- Approve (clearly safe)
- Review (uncertain)
- Reject (clearly NSFW)
Anything in the uncertain zone goes to human moderators. You're not making them review everything—just the edge cases.
The Pre-Publication Pattern
Never publish first, moderate later. That's how you get six-hour-exposure incidents.
app.post('/upload', upload.single('image'), async (req, res) => {
const file = req.file;
// Save to temporary storage with pending status
const tempUrl = await uploadToTempStorage(file);
const moderation = await moderateImage(tempUrl);
if (moderation.action === 'reject') {
await deleteFromTempStorage(tempUrl);
return res.status(400).json({
error: 'Image rejected',
reason: 'Content policy violation'
});
}
if (moderation.action === 'review') {
// Keep in temp storage, flag for human review
await db.insert('pending_reviews', {
tempUrl,
userId: req.user.id,
type: 'image',
moderationScores: moderation.scores,
submittedAt: new Date()
});
return res.json({
status: 'pending',
message: 'Your upload is being reviewed and will be visible shortly.'
});
}
// Approved - move to permanent storage and publish
const permanentUrl = await moveToPerManentStorage(tempUrl);
await db.insert('gallery', {
imageUrl: permanentUrl,
userId: req.user.id,
status: 'published',
publishedAt: new Date()
});
return res.json({
status: 'published',
url: permanentUrl
});
});
Users experience at most a brief delay. Your platform never shows unmoderated content.
Text Moderation: Profanity and Beyond
Text seems easier than images, but it's more nuanced. A word that's profane in one context is fine in another.
async function moderateText(text) {
const [profanity, sentiment] = await Promise.all([
// Check for profanity
fetch(`https://api.apiverve.com/v1/profanitydetector?text=${encodeURIComponent(text)}`, {
headers: { 'x-api-key': process.env.APIVERVE_KEY }
}).then(r => r.json()).then(r => r.data),
// Check sentiment (hate speech often correlates with very negative sentiment)
fetch('https://api.apiverve.com/v1/sentimentanalysis', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.APIVERVE_KEY
},
body: JSON.stringify({ text })
}).then(r => r.json()).then(r => r.data)
]);
let severity = 'low';
const flags = [];
if (profanity.hasProfanity) {
flags.push('profanity');
severity = profanity.severity || 'medium';
}
// Very negative + high magnitude often indicates harassment/hate
if (sentiment.sentiment === 'negative' && sentiment.magnitude > 0.8) {
flags.push('hostile_tone');
severity = severity === 'low' ? 'medium' : severity;
}
return {
text,
flags,
severity,
profanityCount: profanity.profanityCount || 0,
sentiment: sentiment.sentiment,
action: getTextAction(severity, flags)
};
}
function getTextAction(severity, flags) {
if (severity === 'high') return 'reject';
if (severity === 'medium' || flags.includes('hostile_tone')) return 'review';
if (flags.includes('profanity')) return 'censor'; // Replace with asterisks
return 'approve';
}
Censoring vs Rejecting
For text content, you have options beyond approve/reject:
Censor and publish:
function censorText(text, profanityWords) {
let censored = text;
profanityWords.forEach(word => {
const regex = new RegExp(word, 'gi');
const replacement = word[0] + '*'.repeat(word.length - 2) + word[word.length - 1];
censored = censored.replace(regex, replacement);
});
return censored;
}
// "This is bullshit" -> "This is b****t"
Contextual moderation:
A gaming platform might allow mild profanity. A kids' education app allows none. Make your policies configurable:
const policies = {
strict: {
profanityAction: 'reject',
hostileAction: 'reject',
allowedSeverity: 'none'
},
moderate: {
profanityAction: 'censor',
hostileAction: 'review',
allowedSeverity: 'mild'
},
relaxed: {
profanityAction: 'allow',
hostileAction: 'review',
allowedSeverity: 'medium'
}
};
function applyPolicy(moderationResult, policy) {
// ... apply policy rules to moderation result
}
Building a Moderation Queue
Automated checks catch the obvious stuff. Edge cases need human eyes. Build a queue:
class ModerationQueue {
async addToQueue(item) {
await db.insert('moderation_queue', {
...item,
status: 'pending',
createdAt: new Date(),
priority: this.calculatePriority(item)
});
}
calculatePriority(item) {
// Higher scores = more uncertain = higher priority
if (item.type === 'image' && item.scores.nsfw > 0.6) return 'high';
if (item.type === 'text' && item.flags.includes('hostile_tone')) return 'high';
return 'normal';
}
async getNextItem(moderatorId) {
// Get highest priority pending item and claim it
const item = await db.query(`
SELECT * FROM moderation_queue
WHERE status = 'pending'
ORDER BY
CASE priority WHEN 'high' THEN 1 WHEN 'normal' THEN 2 END,
createdAt ASC
LIMIT 1
FOR UPDATE
`);
if (!item) return null;
await db.update('moderation_queue', item.id, {
status: 'in_review',
moderatorId,
reviewStartedAt: new Date()
});
return item;
}
async resolveItem(itemId, decision, moderatorId, notes = '') {
const item = await db.get('moderation_queue', itemId);
await db.update('moderation_queue', itemId, {
status: 'resolved',
decision, // 'approve', 'reject', 'escalate'
moderatorId,
notes,
resolvedAt: new Date()
});
// Take action based on decision
if (decision === 'approve') {
await this.publishContent(item);
} else if (decision === 'reject') {
await this.rejectContent(item);
} else if (decision === 'escalate') {
await this.escalateToSenior(item);
}
}
}
Moderators see a queue sorted by priority. High-uncertainty items get reviewed first. Clear approvals/rejections are handled automatically.
Handling User Appeals
Sometimes automated moderation is wrong. Let users appeal:
app.post('/appeal', async (req, res) => {
const { contentId, reason } = req.body;
const content = await db.get('rejected_content', contentId);
if (!content || content.userId !== req.user.id) {
return res.status(404).json({ error: 'Content not found' });
}
// Check if already appealed
if (content.appealed) {
return res.status(400).json({ error: 'Already appealed' });
}
// Create appeal for human review
await db.insert('moderation_queue', {
type: content.type,
contentId: content.id,
originalContent: content.url || content.text,
appealReason: reason,
status: 'pending',
priority: 'normal',
isAppeal: true
});
await db.update('rejected_content', contentId, {
appealed: true,
appealedAt: new Date()
});
res.json({
message: 'Appeal submitted. You will be notified of the decision.'
});
});
Appeals get human review. If automation was wrong, reinstate the content and use the case to improve your thresholds.
Rate Limiting Uploads
Moderation costs resources. Protect against abuse:
const rateLimit = require('express-rate-limit');
const uploadLimiter = rateLimit({
windowMs: 60 * 60 * 1000, // 1 hour
max: 20, // 20 uploads per hour per user
message: {
error: 'Too many uploads. Please try again later.'
}
});
app.post('/upload', uploadLimiter, upload.single('image'), async (req, res) => {
// ... moderation logic
});
20 uploads per hour is generous for legitimate users. Abusers hit the limit before they can cause damage.
The Complete Pipeline
Here's everything together:
class ContentModerator {
async moderateContent(type, content, userId) {
const startTime = Date.now();
let result;
if (type === 'image') {
result = await this.moderateImage(content);
} else if (type === 'text') {
result = await this.moderateText(content);
}
// Log for analytics and model improvement
await this.logModeration({
type,
userId,
result,
duration: Date.now() - startTime
});
return result;
}
async moderateImage(imageUrl) {
const nsfw = await fetch('https://api.apiverve.com/v1/nsfwimagedetector', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.APIVERVE_KEY
},
body: JSON.stringify({ url: imageUrl })
}).then(r => r.json()).then(r => r.data);
return {
type: 'image',
safe: nsfw.safeScore > 0.8,
needsReview: nsfw.nsfwScore > 0.3 && nsfw.nsfwScore <= 0.8,
rejected: nsfw.nsfwScore > 0.8,
scores: nsfw,
action: this.determineImageAction(nsfw)
};
}
async moderateText(text) {
const [profanity, sentiment] = await Promise.all([
fetch(`https://api.apiverve.com/v1/profanitydetector?text=${encodeURIComponent(text)}`, {
headers: { 'x-api-key': process.env.APIVERVE_KEY }
}).then(r => r.json()).then(r => r.data),
fetch('https://api.apiverve.com/v1/sentimentanalysis', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.APIVERVE_KEY
},
body: JSON.stringify({ text })
}).then(r => r.json()).then(r => r.data)
]);
return {
type: 'text',
profanity,
sentiment,
action: this.determineTextAction(profanity, sentiment)
};
}
determineImageAction(nsfw) {
if (nsfw.nsfwScore > 0.8) return 'reject';
if (nsfw.nsfwScore > 0.3) return 'review';
return 'approve';
}
determineTextAction(profanity, sentiment) {
if (profanity.severity === 'high') return 'reject';
if (profanity.hasProfanity && sentiment.magnitude > 0.7) return 'review';
if (profanity.hasProfanity) return 'censor';
return 'approve';
}
}
The Numbers
Costs per moderation:
- NSFW detection: 1 credit
- Profanity detection: 1 credit
- Sentiment analysis: 1 credit
Image + text moderation: 3 credits per submission.
On Starter ({{plan.starter.price}}/month, {{plan.starter.calls}} credits): thousands of content submissions moderated.
Compare to:
- One PR crisis from unmoderated content: $10,000+ in damage control
- Getting your app removed from the App Store: Potentially fatal
- Legal exposure from hosting illegal content: Unquantifiable
Moderation isn't optional. The only question is whether you automate it.
User-generated content is a feature and a liability. The liability wins if you don't moderate.
Automated moderation catches the obvious stuff instantly. It's not perfect—nothing is—but it reduces your human review burden by 90%+ and ensures nothing appears publicly without being checked.
The NSFW Detector, Profanity Detector, and Sentiment Analysis APIs work together to give you a complete moderation pipeline.
Get your API key and protect your platform before you need to.
Originally published at APIVerve Blog
Top comments (0)