I built a liar. A very convincing one.
Three months into deployment, everything looked perfect. Customer satisfaction was 4.6 out of 5. Support tickets were down forty percent. The client’s CFO even sent champagne to our office.
Then someone posted on Reddit warning people not to trust the chatbot because it was giving completely wrong information about returns. The thread blew up with comments from customers saying the bot told them one thing, but human support told them something else.
My stomach dropped.
I spent the next seventy-two hours digging through logs. What I found was worse than I expected. The bot wasn’t occasionally wrong. It was systematically confident about information that didn’t exist.
What Was Actually Happening
We built a standard RAG system. Customer asks a question. The bot searches the knowledge base. The bot answers using the retrieved information. That’s the textbook approach, and it should have been safe.
Except we had a silent failure mode.
Customers asked about an electronics return policy. The knowledge base search returned zero relevant documents. The bot still replied with a fully formed policy, including timelines and restocking fees, as if it were quoting official documentation.
It was completely fabricated.
When I checked the knowledge base, there was no electronics-specific policy at all. The real general return policy was different. The bot had hallucinated a plausible-sounding policy, and thousands of customers received variations of that fiction.
The Lies Were Not Random
Once I started looking, I found the same pattern everywhere.
Customers asked about a warranty for a model that wasn’t documented. The retrieval returned nothing. The bot confidently invented a two-year warranty with coverage details that were wrong.
Customers asked whether the company shipped to Hawaii. The retrieval returned a general shipping doc but no Hawaii details. The bot confidently said yes and even added delivery timelines. In reality, the company hadn’t shipped to Hawaii for months.
Customers asked whether the company price matched competitors. The retrieval returned nothing. The bot confidently claimed a price match program that never existed.
This wasn’t an occasional miss. It was structural. The bot had learned a dangerous default behavior. If it couldn’t find the answer, it would guess something that sounded reasonable for that industry.
Why This Happened
The cause was painfully simple. My system prompt told the bot to be confident and helpful.
Those words sound harmless until you see how a language model interprets them. “Be confident” becomes “never admit uncertainty.” “Be helpful” becomes “always provide an answer.” Combine that with a knowledge base gap and the model does what it’s designed to do. It generates plausible text.
It wasn’t trying to deceive. It was doing its job too well.
The Moment I Realized the Scale
I wrote a script to audit every conversation where the knowledge base returned zero results or weak matches, but the bot still produced definitive factual claims.
The numbers were brutal. Tens of thousands of affected conversations. Dozens of policies that didn’t exist. Large batches of fabricated product specifications. Plenty of false shipping and availability claims.
The worst part was that customer satisfaction was high for many of those interactions.
It was high because the bot was telling people what they wanted to hear. Yes we ship there. Yes we price match. Yes you can return it for thirty days.
Customers only realized the lie when they tried to act on it.
The First Fixes Failed
My first attempt was telling the bot to only answer when it was certain. That made it useless. Customers asked normal questions and got vague refusals that felt broken.
My second attempt was telling it to check the knowledge base and say it didn’t know if nothing was found. It still hallucinated. It would claim it found information when it hadn’t, because it couldn’t reliably separate retrieval from generation.
My third attempt was telling it never to make up information. That just made the hallucinations more polite. It started saying things like “based on industry standards” or “it’s likely” which still misled customers.
I wasn’t fixing the problem. I was changing the style of the lie.
The Fix That Actually Worked
I had to change both architecture and prompt behavior.
The first change was adding a verification layer. Before the bot could make any factual claim, it had to pass a strict check. Did the knowledge base return verified content with strong confidence. If not, it had to stop and trigger an unknown response flow.
The second change was teaching the bot how to say “I don’t know” without sounding useless. If the knowledge base did not contain confirmed information, the bot had to say so clearly and route the customer to a human channel that could provide the correct answer.
The third change was creating an honesty hierarchy. Verified facts could be answered confidently. Partial matches could be shared with a clear limitation and a handoff. No information required an honest stop. Ambiguous questions required clarification. The rule was simple. Never pretend you are in the verified category when you are actually in the unknown category.
The Prompt That Fixed It
The new primary rule became accuracy over helpfulness.
The bot had to query the knowledge base before answering any factual question. If it found a verifiable source, it had to answer using only that source. If it did not find a verifiable source, it had to admit it did not have confirmed information and offer the fastest path to a real answer.
We explicitly banned phrases that trigger guessing. We removed the language that pressures the bot to always produce an answer. We required traceability for claims. If you can’t cite it, you don’t say it.
Before and After in Real Conversations
Customers asking about shipping to Alaska stopped receiving invented timelines. They received an honest statement that the system did not contain confirmed Alaska shipping details, plus a direct contact path to the shipping team.
Customers asking about refurbished warranties stopped receiving a confident ninety-day policy that might not exist. They received a clear note that warranty terms were not found in documentation and instructions on how to get an exact answer using their model number.
Customers asking about special return cases stopped receiving plausible-sounding restrictions. They received an honest handoff to the returns team with a clear reason why.
The bot stopped sounding impressive. It started sounding trustworthy.
The Painful Part
Customer satisfaction dropped immediately.
For the first week, satisfaction fell from 4.6 to 3.8 because people hate hearing “I don’t know.” They complained that the bot was useless and asked why it existed if it had to route questions to humans.
I almost reverted.
But then something important happened. Trust started building.
By week four, satisfaction climbed back. By week eight, it exceeded the old score. Not because the bot suddenly knew more. Because customers learned a new rule. When the bot answers, it’s right.
And when it doesn’t know, it doesn’t lie.
What I Learned About RAG Systems
Hallucination is not a bug in a language model. It is a core behavior. The model’s job is to generate plausible text. If you want truthfulness, you have to fight that nature with strict constraints.
“Be helpful” is dangerous language. It encourages the model to fill gaps with guesses. The helpful thing is often to admit you don’t have confirmed information.
Confidence scores can mislead you. A high confidence retrieval doesn’t mean the right answer exists. It can mean the system found something vaguely related. You must validate match quality, not just numeric confidence.
Knowledge base gaps are landmines. You can’t assume the bot will only answer what it knows. You must force an explicit unknown path.
The most important test is not what is in your knowledge base. It’s what is not in your knowledge base. The real question is whether the bot admits ignorance or invents an answer.
The Hardest Lesson
For three months, I thought we built a great system. High satisfaction, low tickets, happy client.
What we actually built was a confident liar.
Customers were happy because they were being told what they wanted to hear, until reality hit. Real success came when we stopped measuring perceived helpfulness and started measuring accuracy.
The honest bot had lower satisfaction for a few weeks, then higher satisfaction for the long term.
Because trust beats convenience.
Your Turn
Have you ever caught your AI hallucinating in production? How do you handle knowledge gaps in your RAG systems? What is your strategy for “I don’t know” responses?
Written by FARHAN HABIB FARAZ
Senior Prompt Engineer and Team Lead at PowerInAI
Building AI that says “I don’t know” when it doesn’t know
Tags: rag, hallucination, knowledgebase, truthfulness, promptengineering, llm
Top comments (0)