APIVerve

Posted on Feb 26 • Originally published at blog.apiverve.com

Why Your Sentiment Analysis Is Wrong

#sentimentanalysis #nlp #machinelearning #customerfeedback

The marketing team is excited. They just added sentiment analysis to the customer feedback pipeline. Every review, survey response, and support ticket now gets a sentiment score.

Red is negative. Green is positive. A dashboard shows the company's overall sentiment trending up and to the right.

Three months later, the product team is confused. Sentiment scores are great, but customer churn is increasing. How can customers be so positive while they're leaving?

The problem isn't the sentiment analysis. The problem is how it's being used.

The Sarcasm Problem

"Oh great, another app that crashes every time I try to use it."

What's the sentiment here? The words "great" are positive. An unsophisticated sentiment analyzer will score this positively.

A human instantly recognizes the sarcasm. This customer is furious.

Sarcasm inverts sentiment. It uses positive words to convey negative meaning. And it's everywhere — in reviews, in support tickets, in social media. The more frustrated people are, the more likely they are to use sarcasm.

The result: your most upset customers might be scoring as positive.

Sentiment analysis algorithms have gotten better at detecting sarcasm, but it's still an unsolved problem. Context matters enormously. "This is exactly what I needed" could be sincere or dripping with sarcasm depending on what preceded it.

What to do about it: Don't use sentiment scores as the sole signal for customer satisfaction. Cross-reference with behavior data. A customer with positive sentiment who then cancels their subscription was probably being sarcastic.

The Mixed Sentiment Problem

"The food was amazing but the service was terrible and we waited an hour for our check."

What's the sentiment? Positive? Negative? Both?

Most sentiment analyzers return a single score that averages everything together. Amazing food (+4) plus terrible service (-4) equals... neutral? That seems wrong.

The customer has strong opinions. They loved one thing and hated another. A neutral score erases this valuable information.

What to do about it: For longer text, consider analyzing at the sentence level rather than document level:

async function analyzeBySection(text) {
  // Split into sentences
  const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];

  const results = [];

  for (const sentence of sentences) {
    const response = await fetch('https://api.apiverve.com/v1/sentimentanalysis', {
      method: 'POST',
      headers: {
        'x-api-key': 'YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ text: sentence.trim() })
    });
    const { data } = await response.json();

    results.push({
      text: sentence.trim(),
      sentiment: data.sentimentText,
      score: data.comparative
    });
  }

  // Find conflicting sentiments
  const hasPositive = results.some(r => r.score > 0.1);
  const hasNegative = results.some(r => r.score < -0.1);
  const isMixed = hasPositive && hasNegative;

  return {
    sentences: results,
    isMixed: isMixed,
    summary: isMixed ? 'mixed' : results[0]?.sentiment
  };
}

Mixed sentiment is often more actionable than pure positive or negative. It tells you specifically what to fix.

The Neutral Trap

A sentiment score of zero doesn't mean the customer has no opinion. It means one of several things:

The text is genuinely neutral ("The package arrived on Tuesday.")
Positive and negative elements canceled out
The text contains opinions the analyzer didn't recognize
The text uses vocabulary outside the training data

Most feedback isn't truly neutral. When you see a lot of neutral scores, investigate why.

import requests

def analyze_with_confidence(texts):
    """
    Analyze sentiment with awareness of neutral ambiguity.
    """
    results = {
        'strong_positive': [],  # comparative > 0.3
        'mild_positive': [],    # 0.1 < comparative <= 0.3
        'neutral': [],          # -0.1 <= comparative <= 0.1
        'mild_negative': [],    # -0.3 <= comparative < -0.1
        'strong_negative': []   # comparative < -0.3
    }

    for text in texts:
        response = requests.post(
            'https://api.apiverve.com/v1/sentimentanalysis',
            headers={'x-api-key': 'YOUR_API_KEY'},
            json={'text': text}
        )
        data = response.json()['data']
        score = data['comparative']

        if score > 0.3:
            results['strong_positive'].append(text)
        elif score > 0.1:
            results['mild_positive'].append(text)
        elif score < -0.3:
            results['strong_negative'].append(text)
        elif score < -0.1:
            results['mild_negative'].append(text)
        else:
            results['neutral'].append(text)

    # Flag high neutral rate for review
    neutral_rate = len(results['neutral']) / len(texts) if texts else 0

    if neutral_rate > 0.5:
        print(f"Warning: {neutral_rate:.0%} of texts scored neutral. "
              "Consider manual review for missed sentiment.")

    return results

The Context Problem

"The battery life is short."

Is this negative sentiment? It sounds like it.

Unless this is a review of a product specifically marketed as compact and lightweight, where short battery life is an expected tradeoff. Or a comparison where "short" is being used as "shorter than the previous model" which was known to be excessive.

Sentiment analyzers don't understand product context, industry norms, or customer expectations. They analyze words in isolation.

What to do about it: Combine sentiment analysis with domain-specific knowledge. If you're analyzing reviews for a budget product, calibrate expectations accordingly. A 3-star review with neutral sentiment might be excellent for a $10 product but concerning for a $500 one.

The Language Problem

"That's sick!"

Positive or negative?

In 1990, probably negative. In 2025, among certain demographics, definitely positive. Slang evolves. Sentiment lexicons struggle to keep up.

Industry jargon has the same problem. In finance, "aggressive" often means positive (aggressive growth, aggressive returns). In customer service, "aggressive" is negative (aggressive sales tactics, aggressive behavior).

What to do about it: Be aware of your audience. If you're analyzing feedback from teenagers, sentiment scores based on traditional lexicons will be less accurate. If you're in a specialized industry, expect some domain-specific language to be misclassified.

When Sentiment Analysis Works Well

Despite these limitations, sentiment analysis is genuinely useful in specific contexts:

Trend analysis over time. Individual scores may be noisy, but aggregates reveal patterns. If your average sentiment drops 20% after a product update, that's signal even if individual scores are imprecise.

function trackSentimentTrend(dataPoints) {
  // Calculate rolling 7-day average
  const windowSize = 7;
  const trend = [];

  for (let i = windowSize - 1; i < dataPoints.length; i++) {
    const window = dataPoints.slice(i - windowSize + 1, i + 1);
    const avg = window.reduce((sum, d) => sum + d.score, 0) / window.length;

    trend.push({
      date: dataPoints[i].date,
      averageSentiment: avg,
      dataPoints: window.length
    });
  }

  // Detect significant changes
  for (let i = 1; i < trend.length; i++) {
    const change = trend[i].averageSentiment - trend[i-1].averageSentiment;

    if (Math.abs(change) > 0.2) {
      console.log(`Significant sentiment shift on ${trend[i].date}: ${change > 0 ? '+' : ''}${change.toFixed(2)}`);
    }
  }

  return trend;
}

Prioritization and routing. You don't need perfect sentiment scores to route urgent issues. Anything scoring strongly negative deserves faster attention, even if some scores are imprecise.

def route_support_ticket(ticket_text):
    """
    Route ticket based on sentiment urgency.
    """
    response = requests.post(
        'https://api.apiverve.com/v1/sentimentanalysis',
        headers={'x-api-key': 'YOUR_API_KEY'},
        json={'text': ticket_text}
    )
    data = response.json()['data']

    if data['comparative'] < -0.3:
        return {
            'queue': 'urgent',
            'priority': 'high',
            'reason': 'Strongly negative sentiment detected'
        }
    elif data['comparative'] < -0.1:
        return {
            'queue': 'standard',
            'priority': 'medium',
            'reason': 'Negative sentiment detected'
        }
    else:
        return {
            'queue': 'standard',
            'priority': 'normal',
            'reason': 'Neutral or positive sentiment'
        }

Large-scale categorization. When you have 10,000 reviews and need to understand the general sentiment distribution, automated analysis is the only practical approach. Perfect accuracy isn't required; directional accuracy is.

Early warning systems. A sudden spike in negative sentiment — even if some individual scores are wrong — is worth investigating. You're looking for anomalies, not precision.

Building Better Sentiment Features

Given the limitations, here's how to build sentiment features that actually work:

Don't show raw scores to users. A score of 0.23 is meaningless to most people. Translate to categories: positive, negative, needs attention.

Use sentiment as one signal among many. Combine with behavior (did they renew?), explicit ratings (what stars did they give?), and keywords (did they mention specific issues?).

Set appropriate thresholds. For alerting on negative sentiment, use stricter thresholds to reduce false positives. For categorization, looser thresholds are acceptable.

Build feedback loops. Let users correct misclassified items. Use these corrections to understand where sentiment analysis is failing for your specific use case.

async function enrichedSentimentAnalysis(text, additionalSignals = {}) {
  const response = await fetch('https://api.apiverve.com/v1/sentimentanalysis', {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ text })
  });
  const { data } = await response.json();

  // Combine sentiment with additional signals
  const analysis = {
    rawSentiment: data.sentimentText,
    sentimentScore: data.comparative,
    confidence: Math.abs(data.comparative) > 0.3 ? 'high' : 'medium'
  };

  // Override with explicit signals if available
  if (additionalSignals.starRating) {
    if (additionalSignals.starRating <= 2 && data.comparative > 0) {
      analysis.warning = 'Sentiment/rating mismatch - possible sarcasm';
      analysis.effectiveSentiment = 'negative';
    }
  }

  if (additionalSignals.customerChurned) {
    if (data.comparative > 0) {
      analysis.warning = 'Positive sentiment but customer churned';
      analysis.effectiveSentiment = 'negative';
    }
  }

  // Check for urgency keywords regardless of sentiment
  const urgencyWords = ['immediately', 'urgent', 'asap', 'critical', 'emergency'];
  if (urgencyWords.some(word => text.toLowerCase().includes(word))) {
    analysis.urgency = 'high';
  }

  return analysis;
}

The Human-in-the-Loop

The best sentiment analysis systems include human review.

Not for every piece of feedback — that would defeat the purpose of automation. But for:

Items flagged as strongly negative (validation before escalation)
Items with conflicting signals (low rating but positive sentiment)
Random samples (calibration and quality assurance)
Edge cases (unusual vocabulary, very short text)

Sentiment analysis should reduce the human workload, not eliminate it. Use automation to prioritize and categorize; use humans to validate and act.

What the Numbers Actually Mean

Let's be concrete about the scores from our Sentiment Analysis API:

Sentiment score (-5 to +5): The raw emotional intensity. Higher magnitude means stronger sentiment.

Comparative score (score / word count): Normalized for text length. Allows fair comparison between a tweet and a paragraph. Typically ranges from -1 to +1.

Sentiment text: The categorical label (positive, negative, neutral).

For most applications, use comparative score for comparisons and thresholding:

Comparative Score	Interpretation
> 0.3	Strongly positive
0.1 to 0.3	Mildly positive
-0.1 to 0.1	Neutral or mixed
-0.3 to -0.1	Mildly negative
< -0.3	Strongly negative

These thresholds aren't magic. Adjust based on your data and needs.

The Honest Conclusion

Sentiment analysis won't tell you exactly what customers think. It won't catch every nuance, detect every sarcasm, or understand every context.

What it will do:

Process thousands of feedback items in seconds
Identify patterns in aggregate sentiment
Flag potential issues for human review
Prioritize responses based on emotional intensity
Track sentiment trends over time

That's valuable. But only if you understand its limitations.

Use sentiment analysis as a tool for scale and prioritization, not as ground truth. Combine it with other signals. Validate important decisions with human review.

Your customers are communicating with you. Sentiment analysis helps you listen at scale — but you still need to actually listen.

Ready to understand customer sentiment? The Sentiment Analysis API processes text in milliseconds, returning sentiment scores and classifications. Combine with behavioral data and human review for insights that actually drive decisions.

Originally published at APIVerve Blog

DEV Community