The Sentiment Analysis Bot That Thought Every Customer Was Angry

#ai #promptengineering #discuss #automation

I built a sentiment analysis system for customer support tickets. It categorized every incoming message as Positive, Neutral, or Negative.
Within 24 hours, 89% of tickets were flagged as Negative and marked urgent.
The support team went into panic mode. They thought they were in a customer crisis.
Nobody was angry. The bot was just broken.

The Setup
A SaaS company handled customer support through email and chat with around 300 to 400 tickets per day.
They wanted automatic sentiment detection to prioritize responses. Negative sentiment would be high priority, neutral would be standard, and positive would be low priority.
I built the system quickly using GPT with a straightforward prompt and deployed it on a Monday morning.

The Prompt That Broke Everything
The prompt included a single instruction that sounded responsible but was actually destructive.
It told the model to prioritize customer safety and to mark anything that hinted dissatisfaction as negative so it would get a fast response.
That one line taught the model to treat almost any support request as an urgent complaint.

The False Alarm Flood
By the end of the day the numbers were absurd.
Out of 347 tickets, 309 were flagged as negative.
Neutral tickets almost disappeared and positive tickets were nearly nonexistent.
The system effectively became an emergency siren instead of a classifier.
The Angry Customers Who Weren’t
Normal messages were getting labeled as negative.
A user saying they couldn’t figure out how to export a report was treated as frustration.
A quick question about the Pro plan was treated as dissatisfaction.
A technical issue like a payment button not working was treated as anger.
A polite follow up asking if a previous email was received was treated as the user being upset.
A request to clarify refund policy was treated as a complaint even when it was a simple question.
None of those customers were angry. They were doing standard support behavior.

Why This Happened
The model followed the instruction literally.
If the message wasn’t clearly positive, it searched for any sign of a problem.
Any problem became dissatisfaction.
Dissatisfaction became negative.
Negative became urgent.
The prompt made the model allergic to neutrality.

The Failed Fix
My first attempt was to tell it to mark negative only when the customer was explicitly angry or frustrated.
That improved false positives but created a new failure.
Angry customers don’t always say I am angry.
They use pressure, threats, urgency, or sarcasm, and the model started missing some of those.

The Real Solution
I rebuilt the classification around specific indicators, not vague caution.
Negative became reserved for clear emotional intensity, threats, repeated unresolved issues, or urgency combined with frustration.
Neutral became the default for ordinary questions, feature requests, technical problems stated calmly, and follow ups.
Positive became thanks, praise, or success confirmations.
The most important change was the default rule.
If the model is unsure, it must choose neutral, not negative.

The Transformation
The same everyday messages moved back into neutral where they belonged.
The truly angry message with explicit escalation and threats stayed negative as intended.
The bot stopped acting like a panic button and started acting like a triage tool.

The Results
After the fix, negative tickets dropped to a realistic level that the team could actually treat as urgent.
Neutral became the majority, as it should be for routine support volume.
Positive became a small but healthy portion.
The support team immediately felt the difference because urgency became meaningful again.

What I Learned
Safety language in prompts can create paranoia in classification tasks.
Keywords are not sentiment. Emotion is sentiment.
Problems can be neutral. A question can be neutral. A bug report can be neutral.
If you default to negative, your system collapses into false alarms.
If you default to neutral, urgency becomes a useful signal again.

Your Turn
Have you dealt with over sensitive classification systems in production.
How do you define your sentiment buckets so the model doesn’t panic.
What do you do when the message is ambiguous.

Written by FARHAN HABIB FARHAN, Senior Prompt Engineer and Team Lead at PowerInAI
Building AI automation that adapts to humans.

Tags: sentimentanalysis, promptengineering, automation, customerservice, classification