The Medical Chatbot That Diagnosed Everyone With Cancer

#responsibleai #discuss #ai #promptengineering

I built a healthcare chatbot that turned into a hypochondriac’s nightmare generator.
A telehealth startup wanted a symptom checker. Users describe symptoms, the AI provides information, and suggests whether to see a doctor. Standard stuff. Every health app does this.
I researched medical chatbots. Studied symptom-checking algorithms. Read clinical decision-making guides. Built what I thought was a responsible, cautious system.
Then we deployed it.

Day Four: Panic Mode
Customer support was drowning.
Users were furious. Some were terrified.
People with headaches were told they might have brain tumors. Parents of kids with colds were seeing leukemia in the response. People with coughs were being warned about lung cancer.
I checked the logs.
And immediately understood what I had done.

What Was Actually Happening
A user said they’d had a headache for three days.
The AI responded with a long list of possible causes. Tension headaches. Migraines. Dehydration. Eye strain. Sinus infections.
Then it added rare but serious possibilities. Brain tumors. Meningitis. Aneurysms.
It ended with a warning to consult a doctor immediately.
It did this for everything.
Fatigue included leukemia. Coughs included lung cancer. Fever included life-threatening conditions.
The AI wasn’t technically wrong. Those things can happen.
But it was functionally dangerous.

Why This Happened
My prompt told the AI to be thorough and cautious.
It said to provide comprehensive information. To always include serious conditions, even if rare. To recommend seeing a doctor when in doubt.
The AI followed those instructions perfectly.
It listed everything. It emphasized seriousness. And it escalated every case.
For every symptom.

The Problem I Created
I built medical student syndrome at scale.
Medical students learn about diseases and suddenly believe they have all of them. I turned that experience into a chatbot for everyday users.
The AI presented worst-case scenarios alongside common explanations with equal weight. Users fixated on the scariest possibility.

Real User Impact
A woman with occasional headaches saw brain tumor mentioned. She didn’t sleep for three days and went to the ER. Twelve hundred dollars later, the diagnosis was stress.
A parent saw leukemia listed for a child’s fever. They rushed to the ER at two in the morning. Eight hundred dollars. Viral infection.
A former smoker with a cough saw lung cancer. He spiraled. Emergency visit. Seasonal allergies.
Reviews called the app irresponsible and dangerous.

The Legal Scare
In week two, a lawyer contacted us.
A user had gone to the ER three times in one week after chatbot interactions. Each time, the AI suggested serious cardiac conditions. Each diagnosis was anxiety and panic attacks.
The user claimed the chatbot caused the anxiety that sent them to the ER.
That’s when I knew this wasn’t just bad UX. It was a liability.

My First Failed Fix
I tried restricting serious conditions to severe symptoms.
That failed immediately.
The AI couldn’t consistently interpret what severe meant. Some mild symptoms were escalated. Some genuinely concerning cases were underplayed.
We were blamed either way.

The Real Solution
I rebuilt the system around likelihood, context, and framing.
The AI stopped acting like a medical encyclopedia and started acting like a guide.
Common conditions were listed first. Rare conditions were only mentioned when the symptom pattern, duration, or severity justified it.
Single mild symptoms no longer triggered cancer mentions.
Short durations didn’t trigger emergency language.
And every response clearly explained when and why a doctor visit made sense.

How Responses Changed
A three-day headache now produced reassurance, practical self-care advice, and clear red flags for when to worry.
Fatigue produced lifestyle explanations first, with guidance on when to seek medical evaluation.
Urgent escalation only happened for true emergencies like chest pain with shortness of breath or stroke symptoms.
The AI stopped shouting. It started explaining.

Handling the Hard Cases
When symptoms were truly concerning, the AI acted decisively. No hedging. No lists. Just clear instructions to seek emergency care.
When symptoms persisted for weeks, it escalated calmly and appropriately.
When users explicitly expressed fear, the AI addressed the fear directly. It explained probabilities. It reassured without dismissing. It acknowledged health anxiety as real.

The Results
Before the fix, most users reported feeling anxious or scared after using the chatbot. ER visits spiked. Reviews were brutal. Legal risk was real.
After the fix, panic dropped dramatically. ER visits became rare and appropriate. Reviews turned positive. Legal threats disappeared.
Users described the chatbot as calming, helpful, and reassuring.

What I Learned
Completeness is not the same as helpfulness.
Order matters. Rare conditions listed alongside common ones feel equally likely.
Medical information without probability context is irresponsible.
Healthcare AI needs different safety rules than most other domains.
And testing only with clinicians is a mistake. Real users don’t think in probabilities.

The Principle
Inform without alarming. Guide without frightening.
Healthcare AI should reduce anxiety, not create it. The goal isn’t to show everything that could go wrong. It’s to help people make reasonable decisions without panic.

Your Turn
Have you ever built something that was technically correct but practically harmful? How do you balance thoroughness with responsibility in sensitive domains?

Written by FARHAN HABIB FARAZ
Senior Prompt Engineer and Team Lead at PowerInAI
Building health AI that informs, not terrifies

Tags: healthtech, medicalai, symptomchecker, responsibleai, promptengineering, healthcare

Top comments (2)

keyr Syntax • Jan 21

The story in this post is likely fabricated story, right?

FARHAN HABIB FARAZ • Jan 22

Fair question! Yeah, this is from a real client project we've been handling for about 3 months now. The initial version genuinely caused the panic I described - people with headaches seeing brain tumor mentions and spiraling.
Can't share the full prompt due to confidentiality, but the core fix was switching from listing everything possible to likelihood-based filtering. Single mild symptoms under 3 days no longer trigger cancer mentions. Duration and severity have to align before serious conditions get mentioned.
Feel free to reachout if you are seeking any kind of ai chatbot or voice ai services or want to partner.

CRITICAL RULES FOR SYMPTOM ASSESSMENT:

COMMON CONDITIONS FIRST
Always start with the most common, likely explanations.
List in order of probability (common → uncommon → rare).
CONTEXT MATTERS
Consider:
Symptom severity (mild vs severe)
Duration (1 day vs 2 weeks)
Age group
Associated symptoms
SERIOUS CONDITIONS PROTOCOL
Mention serious conditions ONLY if:
Symptoms match multiple key indicators
Duration is concerning (>1 week persistent)
Severity level suggests urgency

DO NOT mention cancer, tumors, or life-threatening conditions for:

Single mild symptoms
Short duration (<3 days)
Common complaint patterns (headache, fatigue, cough)