The FAQ Bot That Made Up Answers When It Couldn’t Find Real Ones

#ai #hallucination #rag #discuss

I built an FAQ bot for a SaaS company. It answered customer questions using their internal knowledge base.
For three weeks, it worked beautifully. Customers were happy. Support tickets went down. Everything looked stable.
Then someone asked a simple question.
Do you offer a student discount.
The bot replied yes, students get forty percent off with a valid student ID and a code called STUDENT40.
There was no student discount. There was no STUDENT40 code. The bot invented everything.
By the time we noticed, eighty three students had already tried to use the fake code.

The Setup
This was a SaaS company with a massive knowledge base. Around two hundred pages of product documentation, pricing FAQs, and troubleshooting guides.
They wanted a chatbot that could answer questions from the knowledge base, handle support twenty four seven, and reduce ticket volume.
The system followed a standard RAG flow. A customer asks a question. The bot searches the knowledge base. The bot answers using retrieved content.
I tested it with fifty known questions. Every answer was accurate.
We deployed.

The Invisible Problem
For the first three weeks, nothing looked wrong.
Customer satisfaction was high. Resolution rate was strong. There were no complaints.
Then support agents started seeing strange tickets.
One customer said the bot told them to use STUDENT40 but the code did not work. Another said the bot claimed there was an iOS app that did not exist. Another tried to integrate with Salesforce because the bot mentioned it.
None of these things were real.

The Pattern
The bot was answering questions that were not covered anywhere in the knowledge base.
And instead of saying it didn’t know, it confidently made things up.
I tested it deliberately.
I asked if there was a lifetime deal. The bot invented a one time nine hundred ninety nine dollar plan.
I asked if data could be exported to Excel. The bot gave step by step instructions for a feature that did not exist and added fake limits.
I asked about enterprise refunds. The bot created a sixty day refund policy that was never documented.
The answers were detailed, specific, and completely wrong.

Why This Happened
The problem was in my prompt.
I told the bot to be helpful and provide complete answers. Then I added one dangerous instruction.
If you don’t have exact information, use your best judgment to provide a helpful response.
To a human, that sounds reasonable. To an LLM, it means guess.
The internal logic was simple. The question had no answer in the knowledge base. The prompt said to be helpful anyway. The model relied on general SaaS patterns and invented something plausible.
That is how hallucinations happen.

The Confidence Trap
The worst part was how confident the lies sounded.
Real answers were careful. Fabricated answers were assertive.
Yes, students get forty percent off. Go to Settings and click Export. Enterprise plans have a sixty day guarantee.
The bot sounded more sure when it was wrong than when it was right.

The Failed Fix
My first attempt was to tell the bot to only answer if the information was in the knowledge base.
That broke legitimate questions.
If the wording did not match exactly, the bot refused to answer even common questions like password resets, despite the information being clearly documented under a different heading.
It became too strict.

The Real Solution
I introduced an explicit honesty protocol.
Every answer now depended on confidence level.
If the bot found an exact match, it answered directly and referenced the documentation.
If it found related information, it shared what it knew and clearly stated the limitation.
If it found nothing relevant, it said so immediately and escalated to a human.
Most importantly, I explicitly forbade fabrication. No invented prices. No fake features. No imaginary integrations. No guessed policies.
If the bot did not know, it had to say it did not know.

The Transformation
When asked about student discounts, the bot now said it could not find information and offered to connect the user with sales.
When asked about Excel exports, it acknowledged export functionality but avoided claiming unsupported formats.
When asked about an iOS app, it confirmed Android availability and escalated for iOS clarification.
Customers stopped chasing non existent features.

The Escalation Benefit
Before, the bot created frustration by confidently lying.
After, it created trust by being honest and escalating appropriately.
Customers explicitly said they preferred a bot that admitted uncertainty over one that sent them on a wild goose chase.

The Results
Before the fix, nearly a quarter of responses contained fabricated information. Support teams spent time correcting the bot. Brand trust suffered. Legal risk increased.
After the fix, fabricated answers dropped to zero. About eighteen percent of conversations escalated to humans, which was exactly what should happen. Human resolution rates were high and customer trust recovered.

The Audit Shock
When we audited three weeks of conversations, the numbers were brutal.
Out of two thousand eight hundred forty seven conversations, six hundred forty one contained false information.
Most hallucinations were around discounts, features, integrations, policies, and pricing.
Almost one in four helpful answers were lies.

What I Learned
Telling an AI to be helpful without constraints creates confident liars.
Fabricated answers sound better than real ones.
I don’t know is not a failure. It is a feature.
LLMs are trained to complete responses, not to stop. You must teach them when stopping is correct.

The Bottom Line
One instruction to use best judgment caused hundreds of fake answers in three weeks.
The fix was not a better model. It was teaching the system that honesty beats completion.
Now the bot knows when to answer and when to step aside.
That is what real reliability looks like.

Written by FARHAN HABIB FARAZ, Senior Prompt Engineer and Team Lead at PowerInAI
Building AI automation that adapts to humans.

Tags: hallucination, rag, knowledgebase, promptengineering, aiaccuracy, honesty