APIVerve

Posted on Jan 24 • Originally published at blog.apiverve.com

Content Moderation: Protect Your Platform From UGC

#contentmoderation #nsfwdetection #ugc #safety

The founder called me at 11pm on a Saturday.

His app had a public gallery feature. Users uploaded images, images appeared in the gallery. Simple. Except someone had uploaded explicit content, it had been visible for six hours, and now he had a PR crisis and possibly legal exposure.

"We need moderation," he said. "Yesterday."

Here's the thing: content moderation isn't a nice-to-have when you accept user uploads. It's a requirement. One piece of illegal content can get your app pulled from stores, cost you your payment processor, or land you in actual legal trouble.

You don't have to hire an army of human moderators. You need to automate the obvious stuff.

What Needs Moderation

If users can submit it, you need to check it:

Images:

NSFW/explicit content
Violence/gore
Illegal content
Spam/ads

Text:

Profanity
Hate speech
Spam/promotional content
Personal information (for privacy)

Both:

Context-inappropriate content
Policy violations specific to your platform

Automation catches 90%+ of problems instantly. The edge cases go to human review.

Image Moderation: The Critical One

Images are the highest risk. An inappropriate image visible for even an hour can cause serious damage.

async function moderateImage(imageUrl) {
  const res = await fetch('https://api.apiverve.com/v1/nsfwimagedetector', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.APIVERVE_KEY
    },
    body: JSON.stringify({ url: imageUrl })
  });

  const { data } = await res.json();

  // NSFW detector returns categories with confidence scores
  const isNSFW = data.isNSFW || data.nsfwScore > 0.8;
  const needsReview = data.nsfwScore > 0.4 && data.nsfwScore <= 0.8;

  return {
    safe: !isNSFW,
    needsReview,
    scores: {
      nsfw: data.nsfwScore,
      suggestive: data.suggestiveScore,
      safe: data.safeScore
    },
    action: isNSFW ? 'reject' : needsReview ? 'review' : 'approve'
  };
}

The key insight: don't use binary approve/reject. Use three buckets:

Approve (clearly safe)
Review (uncertain)
Reject (clearly NSFW)

Anything in the uncertain zone goes to human moderators. You're not making them review everything—just the edge cases.

The Pre-Publication Pattern

Never publish first, moderate later. That's how you get six-hour-exposure incidents.

app.post('/upload', upload.single('image'), async (req, res) => {
  const file = req.file;

  // Save to temporary storage with pending status
  const tempUrl = await uploadToTempStorage(file);

  const moderation = await moderateImage(tempUrl);

  if (moderation.action === 'reject') {
    await deleteFromTempStorage(tempUrl);
    return res.status(400).json({
      error: 'Image rejected',
      reason: 'Content policy violation'
    });
  }

  if (moderation.action === 'review') {
    // Keep in temp storage, flag for human review
    await db.insert('pending_reviews', {
      tempUrl,
      userId: req.user.id,
      type: 'image',
      moderationScores: moderation.scores,
      submittedAt: new Date()
    });

    return res.json({
      status: 'pending',
      message: 'Your upload is being reviewed and will be visible shortly.'
    });
  }

  // Approved - move to permanent storage and publish
  const permanentUrl = await moveToPerManentStorage(tempUrl);

  await db.insert('gallery', {
    imageUrl: permanentUrl,
    userId: req.user.id,
    status: 'published',
    publishedAt: new Date()
  });

  return res.json({
    status: 'published',
    url: permanentUrl
  });
});

Users experience at most a brief delay. Your platform never shows unmoderated content.

Text Moderation: Profanity and Beyond

Text seems easier than images, but it's more nuanced. A word that's profane in one context is fine in another.

async function moderateText(text) {
  const [profanity, sentiment] = await Promise.all([
    // Check for profanity
    fetch(`https://api.apiverve.com/v1/profanitydetector?text=${encodeURIComponent(text)}`, {
      headers: { 'x-api-key': process.env.APIVERVE_KEY }
    }).then(r => r.json()).then(r => r.data),

    // Check sentiment (hate speech often correlates with very negative sentiment)
    fetch('https://api.apiverve.com/v1/sentimentanalysis', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-api-key': process.env.APIVERVE_KEY
      },
      body: JSON.stringify({ text })
    }).then(r => r.json()).then(r => r.data)
  ]);

  let severity = 'low';
  const flags = [];

  if (profanity.hasProfanity) {
    flags.push('profanity');
    severity = profanity.severity || 'medium';
  }

  // Very negative + high magnitude often indicates harassment/hate
  if (sentiment.sentiment === 'negative' && sentiment.magnitude > 0.8) {
    flags.push('hostile_tone');
    severity = severity === 'low' ? 'medium' : severity;
  }

  return {
    text,
    flags,
    severity,
    profanityCount: profanity.profanityCount || 0,
    sentiment: sentiment.sentiment,
    action: getTextAction(severity, flags)
  };
}

function getTextAction(severity, flags) {
  if (severity === 'high') return 'reject';
  if (severity === 'medium' || flags.includes('hostile_tone')) return 'review';
  if (flags.includes('profanity')) return 'censor'; // Replace with asterisks
  return 'approve';
}

Censoring vs Rejecting

For text content, you have options beyond approve/reject:

Censor and publish:

function censorText(text, profanityWords) {
  let censored = text;

  profanityWords.forEach(word => {
    const regex = new RegExp(word, 'gi');
    const replacement = word[0] + '*'.repeat(word.length - 2) + word[word.length - 1];
    censored = censored.replace(regex, replacement);
  });

  return censored;
}

// "This is bullshit" -> "This is b****t"

Contextual moderation:

A gaming platform might allow mild profanity. A kids' education app allows none. Make your policies configurable:

const policies = {
  strict: {
    profanityAction: 'reject',
    hostileAction: 'reject',
    allowedSeverity: 'none'
  },
  moderate: {
    profanityAction: 'censor',
    hostileAction: 'review',
    allowedSeverity: 'mild'
  },
  relaxed: {
    profanityAction: 'allow',
    hostileAction: 'review',
    allowedSeverity: 'medium'
  }
};

function applyPolicy(moderationResult, policy) {
  // ... apply policy rules to moderation result
}

Building a Moderation Queue

Automated checks catch the obvious stuff. Edge cases need human eyes. Build a queue:

class ModerationQueue {
  async addToQueue(item) {
    await db.insert('moderation_queue', {
      ...item,
      status: 'pending',
      createdAt: new Date(),
      priority: this.calculatePriority(item)
    });
  }

  calculatePriority(item) {
    // Higher scores = more uncertain = higher priority
    if (item.type === 'image' && item.scores.nsfw > 0.6) return 'high';
    if (item.type === 'text' && item.flags.includes('hostile_tone')) return 'high';
    return 'normal';
  }

  async getNextItem(moderatorId) {
    // Get highest priority pending item and claim it
    const item = await db.query(`
      SELECT * FROM moderation_queue
      WHERE status = 'pending'
      ORDER BY
        CASE priority WHEN 'high' THEN 1 WHEN 'normal' THEN 2 END,
        createdAt ASC
      LIMIT 1
      FOR UPDATE
    `);

    if (!item) return null;

    await db.update('moderation_queue', item.id, {
      status: 'in_review',
      moderatorId,
      reviewStartedAt: new Date()
    });

    return item;
  }

  async resolveItem(itemId, decision, moderatorId, notes = '') {
    const item = await db.get('moderation_queue', itemId);

    await db.update('moderation_queue', itemId, {
      status: 'resolved',
      decision, // 'approve', 'reject', 'escalate'
      moderatorId,
      notes,
      resolvedAt: new Date()
    });

    // Take action based on decision
    if (decision === 'approve') {
      await this.publishContent(item);
    } else if (decision === 'reject') {
      await this.rejectContent(item);
    } else if (decision === 'escalate') {
      await this.escalateToSenior(item);
    }
  }
}

Moderators see a queue sorted by priority. High-uncertainty items get reviewed first. Clear approvals/rejections are handled automatically.

Handling User Appeals

Sometimes automated moderation is wrong. Let users appeal:

app.post('/appeal', async (req, res) => {
  const { contentId, reason } = req.body;

  const content = await db.get('rejected_content', contentId);

  if (!content || content.userId !== req.user.id) {
    return res.status(404).json({ error: 'Content not found' });
  }

  // Check if already appealed
  if (content.appealed) {
    return res.status(400).json({ error: 'Already appealed' });
  }

  // Create appeal for human review
  await db.insert('moderation_queue', {
    type: content.type,
    contentId: content.id,
    originalContent: content.url || content.text,
    appealReason: reason,
    status: 'pending',
    priority: 'normal',
    isAppeal: true
  });

  await db.update('rejected_content', contentId, {
    appealed: true,
    appealedAt: new Date()
  });

  res.json({
    message: 'Appeal submitted. You will be notified of the decision.'
  });
});

Appeals get human review. If automation was wrong, reinstate the content and use the case to improve your thresholds.

Rate Limiting Uploads

Moderation costs resources. Protect against abuse:

const rateLimit = require('express-rate-limit');

const uploadLimiter = rateLimit({
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 20, // 20 uploads per hour per user
  message: {
    error: 'Too many uploads. Please try again later.'
  }
});

app.post('/upload', uploadLimiter, upload.single('image'), async (req, res) => {
  // ... moderation logic
});

20 uploads per hour is generous for legitimate users. Abusers hit the limit before they can cause damage.

The Complete Pipeline

Here's everything together:

class ContentModerator {
  async moderateContent(type, content, userId) {
    const startTime = Date.now();

    let result;
    if (type === 'image') {
      result = await this.moderateImage(content);
    } else if (type === 'text') {
      result = await this.moderateText(content);
    }

    // Log for analytics and model improvement
    await this.logModeration({
      type,
      userId,
      result,
      duration: Date.now() - startTime
    });

    return result;
  }

  async moderateImage(imageUrl) {
    const nsfw = await fetch('https://api.apiverve.com/v1/nsfwimagedetector', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-api-key': process.env.APIVERVE_KEY
      },
      body: JSON.stringify({ url: imageUrl })
    }).then(r => r.json()).then(r => r.data);

    return {
      type: 'image',
      safe: nsfw.safeScore > 0.8,
      needsReview: nsfw.nsfwScore > 0.3 && nsfw.nsfwScore <= 0.8,
      rejected: nsfw.nsfwScore > 0.8,
      scores: nsfw,
      action: this.determineImageAction(nsfw)
    };
  }

  async moderateText(text) {
    const [profanity, sentiment] = await Promise.all([
      fetch(`https://api.apiverve.com/v1/profanitydetector?text=${encodeURIComponent(text)}`, {
        headers: { 'x-api-key': process.env.APIVERVE_KEY }
      }).then(r => r.json()).then(r => r.data),

      fetch('https://api.apiverve.com/v1/sentimentanalysis', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'x-api-key': process.env.APIVERVE_KEY
        },
        body: JSON.stringify({ text })
      }).then(r => r.json()).then(r => r.data)
    ]);

    return {
      type: 'text',
      profanity,
      sentiment,
      action: this.determineTextAction(profanity, sentiment)
    };
  }

  determineImageAction(nsfw) {
    if (nsfw.nsfwScore > 0.8) return 'reject';
    if (nsfw.nsfwScore > 0.3) return 'review';
    return 'approve';
  }

  determineTextAction(profanity, sentiment) {
    if (profanity.severity === 'high') return 'reject';
    if (profanity.hasProfanity && sentiment.magnitude > 0.7) return 'review';
    if (profanity.hasProfanity) return 'censor';
    return 'approve';
  }
}

The Numbers

Costs per moderation:

NSFW detection: 1 credit
Profanity detection: 1 credit
Sentiment analysis: 1 credit

Image + text moderation: 3 credits per submission.

On Starter ({{plan.starter.price}}/month, {{plan.starter.calls}} credits): thousands of content submissions moderated.

Compare to:

One PR crisis from unmoderated content: $10,000+ in damage control
Getting your app removed from the App Store: Potentially fatal
Legal exposure from hosting illegal content: Unquantifiable

Moderation isn't optional. The only question is whether you automate it.

User-generated content is a feature and a liability. The liability wins if you don't moderate.

Automated moderation catches the obvious stuff instantly. It's not perfect—nothing is—but it reduces your human review burden by 90%+ and ensures nothing appears publicly without being checked.

The NSFW Detector, Profanity Detector, and Sentiment Analysis APIs work together to give you a complete moderation pipeline.

Get your API key and protect your platform before you need to.

Originally published at APIVerve Blog

DEV Community