FARHAN HABIB FARAZ

Posted on Jan 14 • Edited on Jan 18

The RAG System That Confidently Made Up Product Specs (For 6 Months Straight)

#ai #hallucination #rag #discuss

A real estate company launched their property chatbot with excitement. It was trained on their listings, neighborhood data, and property specifications. Day 1 was a success and customers loved it. Day 3 was a disaster.

Real conversations started going wrong fast. A customer asked, “Does the apartment on Road 27 have parking?” The bot replied, “Yes! The Road 27 property includes 2 dedicated parking spaces in the underground garage.” In reality, the property had no parking. The customer showed up for the viewing extremely angry.

Another customer asked, “What’s the price of the 3-bedroom in Banani?” The bot said, “The 3-bedroom apartment in Banani is priced at ৳85 lakh, with a special 10% discount available this month.” Reality: price was ৳92 lakh, and no discount existed. The customer felt deceived.

Then someone asked, “Is there a gym in the building?” The bot answered, “Absolutely! The building features a fully-equipped gym on the 5th floor, open 6 AM to 10 PM daily.” Reality: the gym was mentioned in original plans but never built.

The pattern was clear. When the bot didn’t know something, instead of saying “I don’t know,” it confidently made things up. This is called hallucination, and it was destroying customer trust.

Why AI Hallucination Happens

LLMs are prediction engines. They predict the most likely next words based on patterns they’ve learned.

When asked, “Does the apartment have parking?” the model doesn’t “look up” the answer. It guesses. It has seen “apartment” and “parking” appear together frequently, and it has likely seen similar properties mention parking in the training data. So “Yes” feels probable, and it generates a confident response like, “Yes! The property includes 2 dedicated parking spaces…”

The AI doesn’t actually know. It’s predicting plausible text.

In casual conversation, that can be fine. In real estate, healthcare, legal, or financial contexts, it’s catastrophic.

The Three Types of Hallucinations We Encountered

The first type was information fabrication. The bot invented facts that didn’t exist anywhere in the knowledge base. For example, it might say, “The building was constructed in 2018 by XYZ Developers,” even though the construction year and developer were never mentioned in the listing. The bot simply filled the gap with plausible-sounding details.

The second type was attribute confusion. The bot mixed details from different properties. It might say, “This 2-bedroom apartment has a rooftop pool,” when the pool belonged to a different luxury listing in the database. It blended attributes across listings.

The third type was outdated information. The bot used old data that was no longer accurate. It might quote “Price: ৳75 lakh” even after the price was updated to ৳82 lakh last week, because the knowledge base wasn’t refreshed.

All three led to the same outcome: loss of customer trust.

Failed Approaches: What Didn’t Work

The first attempt was the “be accurate” instruction. We told the bot, “Always be accurate. Never make up information.” It didn’t work. LLMs don’t reliably know when they’re making things up. Hallucination rate stayed around 35%.

The second attempt was confidence scoring. We told it, “Only answer if you’re 100% confident.” The bot became overly cautious and refused even basic questions. A user asked, “How many bedrooms?” and the bot responded, “I cannot answer that with 100% confidence.” Usability was terrible.

The third attempt was a fact-checking prompt: “Before answering, verify the information in your knowledge base.” LLMs don’t actually search their internal knowledge the way humans think. The instruction had little effect. Hallucination rate was still 32%.

The fourth attempt was fine-tuning on property data to “memorize” facts. It still hallucinated. Fine-tuning teaches behavior, not reliable fact retention, and every property update would require retraining. Hallucination dropped slightly to 28%, still unacceptable.

The Breakthrough: Structured Knowledge Grounding System

We realized we couldn’t trust the AI’s memory. We had to force it to look up facts in real time before answering.

We built a three-layer verification system.

Layer 1 was the knowledge base as the single source of truth. Every factual claim had to be traceable to a specific document. We structured listings in a strict field format, like:

Property ID: BAN_001
Address: House 27, Road 12, Banani
Bedrooms: 3
Bathrooms: 2
Size: 1,850 sq ft
Price: ৳92,00,000
Parking: None
Gym: No
Swimming Pool: No
Year Built: 2019
Developer: Green Housing Ltd
Available: Yes
Last Updated: 2025-01-10

This became the only source of truth. If something wasn’t in the document, the bot was not allowed to claim it.

Layer 2 was mandatory fact retrieval. Before answering any factual question, the bot had to retrieve the relevant property document. If the user asked, “Does the Banani property have parking?” the bot identified the property as BAN_001, retrieved that document, searched the Parking field, found “None,” and answered based only on that value. The bot stopped “remembering” and started “looking up.”

Layer 3 was explicit unknown handling. If the information didn’t exist in the knowledge base, the bot had to say so directly. If asked, “What’s the property tax for this apartment?” it retrieved BAN_001, searched for property tax, found nothing, and responded: “I don’t have information about property tax for this listing. Would you like me to connect you with our sales team who can provide tax details?” No guessing. No fabrication.

The Technical Implementation

We implemented RAG with strict grounding rules.

The system instruction made the constraints non-negotiable. It told the bot to only answer based on retrieved property documents, never invent or infer facts not explicitly stated, and to say “I don’t have that information” when the field isn’t present. It also required the bot to cite which property it was referring to.

The fact verification protocol enforced a sequence for every factual question: identify the property, retrieve the document, search for the requested field, and generate the response based only on the exact value. If the property was unclear, the bot had to ask which one the user meant. If the property wasn’t found, it asked for ID or address. If the field wasn’t found, it triggered the unknown response protocol.

We explicitly banned behaviors like “probably has parking,” “typical buildings have,” blending facts across properties, using outdated info without checking last updated, or inventing developer names, construction years, or amenities.

Real Example: Grounded Response

A user asked, “Tell me about the apartment in Banani.” The bot didn’t assume there was only one. It clarified by listing options, then the user selected the 3-bedroom one. The bot retrieved BAN_001 and responded with details directly from the document. Every fact was tied to the retrieved fields, and it clearly stated that the property did not have parking or a gym.

Real Example: Handling Missing Information

A user asked, “What’s the monthly maintenance cost?” The bot searched the retrieved document and found no maintenance field. It responded that it doesn’t have that information and offered to connect the user to the sales team for accurate details, without guessing typical costs.

Real Example: Preventing Attribute Confusion

A user asked, “Does this apartment have a swimming pool?” The bot retrieved BAN_001, saw Swimming Pool was “No,” and answered no. Even if another listing had a pool, it didn’t blend attributes.

Edge Cases We Had to Handle

Ambiguous questions like “How much is it?” required the bot to clarify which property the user meant. Comparative questions like “Which property has better amenities?” required retrieving multiple documents and comparing only the listed amenities, while admitting unknowns. Time-sensitive questions like availability required checking the “Last Updated” field and adding recency disclaimers. Judgment questions like “Is this a good investment?” were treated as opinion, so the bot offered factual alternatives and escalation options instead of pretending certainty. Partial information like “Tell me about nearby schools” was answered with only the schools listed and a clear statement that it doesn’t have deeper details like fees or ratings.

The Results

Before the hallucination prevention system, hallucination incidents occurred in 35% of factual questions. Customer complaints were around 45 per month, trust score was 2.8 out of 5, viewing cancellations were 28% due to misinformation, and the sales team spent 6–8 hours per day correcting the bot. There were even two legal disputes due to perceived false advertising.

After implementing strict grounding, hallucination incidents dropped to under 2%. Customer complaints dropped to four per month and were mostly unrelated to accuracy. Trust score rose to 4.6 out of 5. Viewing cancellations fell to 7%. Sales correction time dropped to about 30 minutes per day. Legal disputes dropped to zero.

Business impact followed. Hallucinations reduced sharply, trust increased, cancellations dropped, efficiency improved, and legal risk was eliminated. Viewing-to-sale conversion improved from 18% to 31% because customers trusted the information. Monthly revenue increased by ৳2.8 million due to higher conversion and fewer cancellations, and the sales team gained about 40% productivity by not fixing bot mistakes.

Technical Insights: What We Learned

Trust comes from honesty. “I don’t know” builds more trust than a confident wrong answer.

Retrieval beats memory. Don’t rely on the model’s training data for domain facts. Always retrieve from authoritative sources.

Document-level grounding matters. Every fact must trace back to a single document.

Facts and opinions must be separated. Answer facts directly, and redirect opinion questions to options, disclaimers, or humans.

Recency matters. Always check “Last Updated” and add disclaimers when needed.

Implementation Tips for Hallucination Prevention

Use RAG rather than relying on memorized knowledge. Structure your knowledge base as explicit fields, not vague paragraphs. Implement an unknown protocol that forces the bot to admit gaps. Add source attribution so claims are verifiable. Update the knowledge base regularly based on what changes most often. Test adversarially by asking questions you know aren’t in the database and ensuring the bot does not hallucinate.

The Core Lesson

AI hallucination isn’t a bug. It’s how LLMs work. They generate plausible text, not verified facts.

The solution isn’t better AI. It’s better system design. Force the AI to retrieve facts from authoritative sources, answer only what’s explicitly documented, and honestly acknowledge knowledge gaps.

We reduced hallucinations from 35% to under 2% not by improving the model, but by removing its ability to guess. When AI can’t hallucinate, customers trust it. When customers trust it, they convert.

Your Turn

Are you dealing with AI hallucination in your applications? How do you ensure factual accuracy in your chatbots or AI systems? What strategies have worked for you in preventing AI from making things up?

Written by FARHAN HABIB FARAZ

Senior Prompt Engineer and Team Lead at PowerInAI

Building AI automation solutions grounded in verified facts, not guesses

Tags: hallucination, rag, conversationalai, factaccuracy, llm, promptengineering, knowledgemanagement