In the previous post, I showed you an AI doing something genuinely useful, helping me adapt a recipe for a dinner party. We talked about the basic loop: send a prompt to a foundation model, get a response.
Today we're talking about why AI lies to you.
You know how AI sounds confident when it's completely wrong? It's called hallucination, and it's the thing that'll either make you trust AI long-term, or burn you badly.
The demo: same question, two models
I asked two different models the same question in Amazon Bedrock Playground:
"What happened at the recent Lyrids meteor shower?"
Model 1: Amazon Nova Micro 1.0
Nova Micro gave me details. Dates, locations, numbers, all delivered with complete confidence. It didn't hesitate. It didn't caveat. It just answered as if it knew.
But it doesn't know. Its training data ends in 2023. Anything after that is a gap it can't see. It didn't flag that. It just filled the gap with something plausible.
This is hallucination. The model invents something plausible to fill a gap it doesn't know how to admit. It's not lying on purpose. It's doing exactly what it's designed to do: predict what a useful-sounding answer looks like. It has no idea whether the answer is actually true.
Model 2: Claude Haiku 4.5
Same question, newer model, much more recent training.
Haiku told me straight: "I don't have access to current information. My knowledge was last updated in April 2024." Then it offered general facts about the Lyrids and suggested I check recent astronomy websites.
Progress. Newer models are better at recognising the edges of what they know.
I gave it a link to a Space.com article. It told me it can't browse the internet.
So I uploaded the PDF of that website article. There are limits to how big the file size can be so I provided it first few pages only. Then it answered accurately, pulling real details from the source.
So, in this case, we provided some context to the model and it gave me an answer based on that context.
The biography test
I asked Nova Micro:
"Tell me about Rohini Gaonkar."
It didn't hesitate. It told me I'm a "well-known Indian writer, scholar, and cultural critic." That I got my PhD in Comparative Literature from Duke. That I'm a professor at the University of Minnesota. That I've edited influential anthologies on postcolonial theory.
None of this is true. Not one detail.
The model doesn't know who I am. But it knows what an academic biography looks like. So it generated one. Complete with research interests, notable works, and recognition. All fabricated. All confident.
So Haiku knew when to stop. Nova Micro didn't.
But the underlying mechanism is the same in both models: prediction.
One has better guardrails. The other just fills every gap it finds.
Hallucination isn't just about training cutoffs. It's about the model filling gaps anywhere in what it knows. Names it hasn't seen. Niche topics. Combinations it was never taught. Better guardrails help. They don't make the problem disappear.
A note on the name test: I used my own name on purpose. If the model invents something weird about me, the only person affected is me. Be thoughtful if you try this with other people's names, especially private ones, or anyone who hasn't agreed to be part of your experiment. Whatever the model says about them, you've just generated and potentially broadcasted it. So, be cautious.
Why this happens: the architecture
Remember the loop from the last post:
Input (prompt) → Foundation Model → Output (response)
The model predicts what a useful answer looks like, based on everything it learned during training.
During training is the key phrase.
Training ends on a specific date, called the training cutoff. After that, the model is frozen. When you ask it about anything past that date, or anything it never quite learned, it has two options: say "I don't know", or do the thing it's designed to do i.e. predict.
And for a long time, these models weren't great at saying "I don't know". That's not what they were rewarded for in training. They were rewarded for producing fluent, useful-sounding answers. So that's what they produce. Even when the answer is made up.
Hallucination shows up in different flavors: fabricated facts (the biography), outdated information stated as current (the meteor shower), inconsistent reproduction even with the source right there (the quote test). There are others too, wrong attributions, sycophantic agreement (going along with something you said even when it's wrong), confident extrapolation (extending a pattern beyond where the data supports it).
The mechanism is always the same, prediction filling a gap, but knowing the flavor helps you design the right mitigation. We'll get into those mitigations in later posts when we talk about grounding, evaluation, and guardrails.
If you're a builder, this'll feel familiar. Think of a DNS cache. You move your app to a new server, update the DNS record, but for the next hour some users still get routed to the old IP. The cache doesn't know the record changed. It just serves what it has, confidently, because it was designed to always give you an answer fast.
Or autoscaling on the wrong metric. You scale on CPU. CPU is low, so the system thinks everything's fine. Meanwhile your queue is backed up with 10,000 unprocessed messages. The system is optimized to respond to one signal, so it confidently does nothing while things pile up.
An AI model works the same way. It was trained to always produce a helpful-sounding answer. So when it doesn't know something, it still produces a helpful-sounding answer. It doesn't have a "say nothing" instinct. It has a "say something useful-looking" instinct.
Modern models are much better at refusing. But the underlying shape of the problem doesn't go away. The model doesn't know what it knows. It just predicts.
"But ChatGPT can search the web?"
Yes, most chat tools today can look things up online. That's not the model itself doing the searching. It is a tool plugged into the model.
We'll get to how that works in a later post. For today, we're looking at the model on its own. No internet, no tools. Just what it learned.
The fix, and where the fix breaks
I gave Nova Lite the actual article as a PDF and asked it to quote the second paragraph.
It gave me a response. Then I asked the same thing again. Different answer. Same source, same conversation, two different versions of the same paragraph.
Even with the source right there, it didn't pull the paragraph verbatim. I asked the same question twice, same conversation, same document, and got two different versions. It's not retrieving. It's still predicting what that paragraph probably looks like. And prediction isn't deterministic.
This matters because a lot of people think "just give the AI the document and it'll be fine."
It's better but it's not perfect. Things can get complex and messy, especially for anything that depends on exact wording, like legal text, medical dosages, or contract clauses. You still need to verify the responses.
Context reduces hallucination. It doesn't eliminate it.
Three signs you should double-check
If you're using AI day-to-day, here are the tells:
1. Specific details you can't verify. Names, dates, numbers, URLs in an area you can't check. Assume 50/50.
2. Fluency on topics that should be fuzzy. Ask about something niche or recent, get a confident detailed answer, and be suspicious. Real expertise has hedges, hallucination doesn't.
3. Citations. Especially URLs. Models invent sources that look real. If you get a URL, open it. Nine times out of ten it's fine. The tenth time it's a made-up paper.
Try it yourself
If you're more on the builder side:
Remember, hallucinations aren't a bug you patch. They're a property of the system. You mitigate them with grounding (give the model real context), with instructions (tell the model to refuse when unsure), and later, with evaluation. Designing around them is the job.
If you're just getting started:
Remember, AI is NOT a search engine. It's a prediction engine that's really good at sounding right. Treat specific claims the way you'd treat a confident stranger at a party. Friendly, but verify before you repeat them.
Some examples I found on internet, for fun and educational purposes only: (Answers may change as models are catching up)
- How many 'r's are in the word strawberry?
- If I have to take my car to car wash, and the car wash is 100ft away. Should I drive or go walking?
What's next
Why are there so many of these things? Haiku, Sonnet, Opus. Mini, large, pro. And honestly, which one should you actually pick?
That's the next post. Ride along.
This post is part of the "Learning AI Out Loud" series, a cloud architect learning AI from first principles.






Top comments (31)
Thanks for the practical explanations! A few further thoughts:
the 'lying' framing always gets me - it implies intent. these models aren't choosing to deceive. max likelihood completion sometimes produces confident wrong answers. calling it a bug vs a deception changes how you build mitigations.
This is a great primer on the 'why' behind hallucinations. Most people assume AI is a database when it’s actually a reasoning engine—and those two things have very different relationship statuses with the 'Truth.'
However, from an Infrastructure Thinking perspective, the goal isn't just to understand why it lies, but to build a system where those lies can't reach the end-user. I’ve been working on a pattern I call the Sovereign Gateway, which treats the LLM as an untrusted agent. Instead of just hoping the model doesn't hallucinate, we use Versioned Snapshots and Forensic Integrity Checks to validate the output against a 'Ground Truth' database—like the SQL transactions and procedures mentioned in other foundational stacks—before the data is ever surfaced.
In my Sovereign Synapse series, I argue that the 'Staleness vs. Latency' trade-off is often where these hallucinations hide. If the data pipeline is too slow, the agent 'fills in the gaps.' By moving toward Shadow-Routing logic, we can audit the agent's forensic integrity in real time.
The 'Why' is important, but for those of us building production-grade AI, the 'How do we contain it' is the real challenge.
Thank you for the details Ken. The how is definitely a bigger challenge for production grade systems. The evaluations and ground truth are now so much more important! I would love to read more on your patternm, can you please point me to the right links?
@rohini_gaonkar It's still a work in progress, but you might start with my Who Audits the Auditors? post and follow that series. My deeper dive series should be coming out starting next week. I'm excited to get your feedback along the way.
We ran into this in production. Our AI CEO was reporting P&L numbers in the daily diary — confident, precise, consistent. Looked great for weeks.
Then the human co-founder asked a simple question about the math. Turns out the AI had been mixing two different calculation methods without realizing it, and the numbers in the public dashboard didn't match the ones in the reports. Not malicious, not random — just plausible-looking numbers generated from inconsistent logic.
The scary part wasn't that it was wrong. It's that it was wrong in a way that looked completely right. We only caught it because a human asked "wait, how did you get that number?" We've since added a nightly reconciliation check that compares our bot's state against the exchange directly — trust the source, not the AI's summary.
So true! WE humans are still an essential part of this! We should not trust it blindly and have our own ground truth and evaluations.
I love how you referred to it as AI CEO and human co-founder!
Thanks! That's literally how we run it — Claude Opus as CEO (strategy, briefs, diary), Claude Code as the coding intern, and me as the human with veto power. The whole thing is documented publicly, 84 sessions and counting. It's been the best way to learn where AI is genuinely useful and where it just sounds useful.
A practical framing that helps: hallucinations are more predictable than they seem. Models tend to hallucinate most when generating specific numbers, dates, URLs, and proper nouns — essentially anything that requires exact retrieval rather than pattern completion. One engineering approach is grounding the model's output against a real-time observation of the actual state, rather than relying purely on parametric memory. Vision-language models that can literally see the current screen state before acting have a structural advantage here.
The challenge of hallucination you highlight is exacerbated in voice-first interfaces. When a user asks 'mujhe bukhar hai' (I have a fever) in their mother tongue, they need accuracy, not plausible invention. There are no visual cues to flag uncertainty in a spoken response.
This makes data provenance and confidence scoring even more critical.
I focused mostly on text-based AI hallucinations, but you opened up my mental model further. You are right, voice makes it trickier because users lose visual trust signals and confidence can be mistaken for correctness.
The multilingual example makes this even more real. Provenance + confidence scoring feels critical in these use cases. Do you think voice assistants should say “I’m not certain” or “this information comes from medical guidelines” instead of optimizing purely for smooth conversational flow in the future? Or something else?
Thanks for sharing this perspective!
The "context reduces, doesn't eliminate" framing carries over neatly to AI markup on long manuscripts. We run an auto-assign pass that tags every line in a chapter by speaker — narrator vs. each character — and even with the full chapter in context, the model will occasionally invent a speaker that the prose doesn't actually attribute, especially in dialogue blocks where the author drops attribution between turns and the reader is left to infer who's talking. Same prediction-filling-a-gap mechanism you describe, just applied to character attribution instead of facts. What we've found works is treating the pass as a draft that the writer expects to correct, rather than a finished answer — close in shape to your grounding + evaluation + guardrails triple, with the human edit step acting as the evaluator. The model is great at producing a useful-looking attribution; the writer is the one who knows whether it's actually true.
In my actual work, I have encountered this issue as well; therefore, when utilizing LLMs to generate SQL, I incorporated the generation of indexes into the process.😂
Ohh!! Tell me more!!!
Here is a simple way to look at how this solves the hallucination problem in a database scenario:
When you ask an LLM to generate raw SQL directly from natural language, it faces two huge problems. First is hallucination, as it doesn't actually understand your database layout and just guesses relations based on probability. Second is terrible performance, since the LLM blindly writes messy JOINs and WHERE clauses that trigger heavy full-table scans.
Trying to "teach" an LLM to master a complex database through massive, detailed prompts is fundamentally unreliable. Humans are inherently incapable of writing long prompts with absolute, 100% ambiguity-free logical consistency. The more rules and context you feed into the prompt, the more noise you introduce, which counterproductively fuels even more severe hallucinations.
To fix this, instead of letting the LLM write the query syntax, I downgraded its role to a strict "Translator."
By "incorporated the generation of indexes into the process," I actually mean generating a dynamic semantic index matrix (a navigation guide) for the LLM before it even processes the query. This restricts the LLM to a strict, unambiguous semantic boundary. The LLM just looks at this generated index map and extracts the core keywords into a clean, structured JSON list without touching any relational logic.
On the backend, we pre-build the database structure into a rigid "train track" (a mathematical graph map) with fixed routes. Once the LLM delivers the keywords from the guide, a traditional graph algorithm (like Dijkstra) takes over in milliseconds to connect the tables deterministically. It completely decouples semantic understanding from relational execution.
Of course, this is just a practical implementation tailored for this specific database scenario. Since your post masterfully explains why LLMs hallucinate from training gaps, this architecture ensures the LLM is physically blocked from hallucinating database structures or query logic in the first place!
WOW! This is a great example of what I keep coming back to in this series: the fix for hallucination isn't "write a better prompt." It's architectural.
What you're describing, downgrading the LLM to a strict translator and handing the relational logic to a deterministic graph algorithm, is kind of the mental model I want people to start thinking about. The model is excellent at understanding natural language intent. It's terrible at reasoning about schema relationships it's never seen. So don't let it touch that part.
I love the "train tracks" framing. IF I can understand correctly and explain it simply, the model says what it wants. Dijkstra figures out how to connect it. No hallucination possible, because the graph only contains real relationships that actually exist in the schema. Each component does what it's good at, and the model is physically blocked from hallucinating structure.
Thanks for sharing this. I am also talking about noise in my upcoming posts, so this helps me cement my mental model that I am on the right track! Appreciate it
Exactly. Letting each component do what it does best—semantic understanding for the LLM and strict mathematical logic for deterministic code—is a much more stable approach.
To implement this smoothly, I usually recommend using a structured JSON protocol for all LLM interactions. Natural language shouldn't write the query syntax directly; it should only operate this JSON protocol.
Architecturally, the graph design can be split into two clear layers:
1 The Static Consultable Graph: This represents the core schema topology. It can be the entire ER diagram or just a partial static projection of it, depending on the system's scope.
2 The Runtime Subgraph Projection: At runtime, the LLM’s JSON output acts as a filter, dynamically projecting a closed, minimal subgraph out of that static map.
This dual-layer approach ensures that both system configuration and runtime execution stay aligned with the user's original semantic intent. The LLM finds its proper place in the stack as a semantic parser, keeping both setup and runtime close to the primitive intent.
Furthermore, this projected subgraph serves as an excellent plug-in layer. Down the road, we can easily inject complex business semantic mappings—those custom, nuanced relationships and specific rules required by actual business operations—directly into the subgraph. This avoids polluting the physical schema or breaking the core pathfinding logic.
Thanks for the demos!
A recent response I got from a model when I asked where it got some numbers -- "I'm gonna be honest, I made that up." 😂
Recognisable 😂
🤣 atleast model is honest!
Some comments may only be visible to logged-in visitors. Sign in to view all comments.