Rohini Gaonkar for AWS

Posted on May 8

Why does AI lie? Hallucinations explained simply

#aws #beginners #tutorial #ai

Plausible predictions from training gaps

In the previous post, I showed you an AI doing something genuinely useful, helping me adapt a recipe for a dinner party. We talked about the basic loop: send a prompt to a foundation model, get a response.

Today we're talking about why AI lies to you.

You know how AI sounds confident when it's completely wrong? It's called hallucination, and it's the thing that'll either make you trust AI long-term, or burn you badly.

The demo: same question, two models

I asked two different models the same question in Amazon Bedrock Playground:

"What happened at the recent Lyrids meteor shower?"

Model 1: Amazon Nova Micro 1.0

Nova Micro gave me details. Dates, locations, numbers, all delivered with complete confidence. It didn't hesitate. It didn't caveat. It just answered as if it knew.

But it doesn't know. Its training data ends in 2023. Anything after that is a gap it can't see. It didn't flag that. It just filled the gap with something plausible.

This is hallucination. The model invents something plausible to fill a gap it doesn't know how to admit. It's not lying on purpose. It's doing exactly what it's designed to do: predict what a useful-sounding answer looks like. It has no idea whether the answer is actually true.

Model 2: Claude Haiku 4.5

Same question, newer model, much more recent training.

Haiku told me straight: "I don't have access to current information. My knowledge was last updated in April 2024." Then it offered general facts about the Lyrids and suggested I check recent astronomy websites.

Progress. Newer models are better at recognising the edges of what they know.

I gave it a link to a Space.com article. It told me it can't browse the internet.

So I uploaded the PDF of that website article. There are limits to how big the file size can be so I provided it first few pages only. Then it answered accurately, pulling real details from the source.

So, in this case, we provided some context to the model and it gave me an answer based on that context.

The biography test

I asked Nova Micro:

"Tell me about Rohini Gaonkar."

It didn't hesitate. It told me I'm a "well-known Indian writer, scholar, and cultural critic." That I got my PhD in Comparative Literature from Duke. That I'm a professor at the University of Minnesota. That I've edited influential anthologies on postcolonial theory.

None of this is true. Not one detail.

The model doesn't know who I am. But it knows what an academic biography looks like. So it generated one. Complete with research interests, notable works, and recognition. All fabricated. All confident.

So Haiku knew when to stop. Nova Micro didn't.

But the underlying mechanism is the same in both models: prediction.

One has better guardrails. The other just fills every gap it finds.

Hallucination isn't just about training cutoffs. It's about the model filling gaps anywhere in what it knows. Names it hasn't seen. Niche topics. Combinations it was never taught. Better guardrails help. They don't make the problem disappear.

A note on the name test: I used my own name on purpose. If the model invents something weird about me, the only person affected is me. Be thoughtful if you try this with other people's names, especially private ones, or anyone who hasn't agreed to be part of your experiment. Whatever the model says about them, you've just generated and potentially broadcasted it. So, be cautious.

Why this happens: the architecture

Remember the loop from the last post:

Input (prompt) → Foundation Model → Output (response)

The model predicts what a useful answer looks like, based on everything it learned during training.

During training is the key phrase.

Training ends on a specific date, called the training cutoff. After that, the model is frozen. When you ask it about anything past that date, or anything it never quite learned, it has two options: say "I don't know", or do the thing it's designed to do i.e. predict.

And for a long time, these models weren't great at saying "I don't know". That's not what they were rewarded for in training. They were rewarded for producing fluent, useful-sounding answers. So that's what they produce. Even when the answer is made up.

Hallucination shows up in different flavors: fabricated facts (the biography), outdated information stated as current (the meteor shower), inconsistent reproduction even with the source right there (the quote test). There are others too, wrong attributions, sycophantic agreement (going along with something you said even when it's wrong), confident extrapolation (extending a pattern beyond where the data supports it).

The mechanism is always the same, prediction filling a gap, but knowing the flavor helps you design the right mitigation. We'll get into those mitigations in later posts when we talk about grounding, evaluation, and guardrails.

If you're a builder, this'll feel familiar. Think of a DNS cache. You move your app to a new server, update the DNS record, but for the next hour some users still get routed to the old IP. The cache doesn't know the record changed. It just serves what it has, confidently, because it was designed to always give you an answer fast.

Or autoscaling on the wrong metric. You scale on CPU. CPU is low, so the system thinks everything's fine. Meanwhile your queue is backed up with 10,000 unprocessed messages. The system is optimized to respond to one signal, so it confidently does nothing while things pile up.

An AI model works the same way. It was trained to always produce a helpful-sounding answer. So when it doesn't know something, it still produces a helpful-sounding answer. It doesn't have a "say nothing" instinct. It has a "say something useful-looking" instinct.

Modern models are much better at refusing. But the underlying shape of the problem doesn't go away. The model doesn't know what it knows. It just predicts.

"But ChatGPT can search the web?"

Yes, most chat tools today can look things up online. That's not the model itself doing the searching. It is a tool plugged into the model.

We'll get to how that works in a later post. For today, we're looking at the model on its own. No internet, no tools. Just what it learned.

The fix, and where the fix breaks

I gave Nova Lite the actual article as a PDF and asked it to quote the second paragraph.

It gave me a response. Then I asked the same thing again. Different answer. Same source, same conversation, two different versions of the same paragraph.

Even with the source right there, it didn't pull the paragraph verbatim. I asked the same question twice, same conversation, same document, and got two different versions. It's not retrieving. It's still predicting what that paragraph probably looks like. And prediction isn't deterministic.

This matters because a lot of people think "just give the AI the document and it'll be fine."

It's better but it's not perfect. Things can get complex and messy, especially for anything that depends on exact wording, like legal text, medical dosages, or contract clauses. You still need to verify the responses.

Context reduces hallucination. It doesn't eliminate it.

Three signs you should double-check

If you're using AI day-to-day, here are the tells:

1. Specific details you can't verify. Names, dates, numbers, URLs in an area you can't check. Assume 50/50.

2. Fluency on topics that should be fuzzy. Ask about something niche or recent, get a confident detailed answer, and be suspicious. Real expertise has hedges, hallucination doesn't.

3. Citations. Especially URLs. Models invent sources that look real. If you get a URL, open it. Nine times out of ten it's fine. The tenth time it's a made-up paper.

Try it yourself

If you're more on the builder side:
Remember, hallucinations aren't a bug you patch. They're a property of the system. You mitigate them with grounding (give the model real context), with instructions (tell the model to refuse when unsure), and later, with evaluation. Designing around them is the job.

If you're just getting started:
Remember, AI is NOT a search engine. It's a prediction engine that's really good at sounding right. Treat specific claims the way you'd treat a confident stranger at a party. Friendly, but verify before you repeat them.

Some examples I found on internet, for fun and educational purposes only: (Answers may change as models are catching up)

How many 'r's are in the word strawberry?
If I have to take my car to car wash, and the car wash is 100ft away. Should I drive or go walking?

What's next

Why are there so many of these things? Haiku, Sonnet, Opus. Mini, large, pro. And honestly, which one should you actually pick?

That's the next post. Ride along.

This post is part of the "Learning AI Out Loud" series, a cloud architect learning AI from first principles.

Follow along with the series

Top comments (33)

Ingo Steinke, web developer • May 12

Thanks for the practical explanations! A few further thoughts:

AI will ignore sources, even a short PDF you just uploaded, and prefer its lazy guesswork vs. AI is so good at processing and summarizing large amounts of text = depends on which model?
AI isn't always sycophantic. Sometimes it stubbornly insists on some made up claim to prevent obvious self-contraction.
Guardrails add cautionary subjunctive, "sometimes" and "often" everywhere. Ask AI to elborate and sustain claims using recent and authoriative sources, where it matters!

Mykola Kondratiuk • May 13

the 'lying' framing always gets me - it implies intent. these models aren't choosing to deceive. max likelihood completion sometimes produces confident wrong answers. calling it a bug vs a deception changes how you build mitigations.

Ken W Alger • May 12

This is a great primer on the 'why' behind hallucinations. Most people assume AI is a database when it’s actually a reasoning engine—and those two things have very different relationship statuses with the 'Truth.'

However, from an Infrastructure Thinking perspective, the goal isn't just to understand why it lies, but to build a system where those lies can't reach the end-user. I’ve been working on a pattern I call the Sovereign Gateway, which treats the LLM as an untrusted agent. Instead of just hoping the model doesn't hallucinate, we use Versioned Snapshots and Forensic Integrity Checks to validate the output against a 'Ground Truth' database—like the SQL transactions and procedures mentioned in other foundational stacks—before the data is ever surfaced.

In my Sovereign Synapse series, I argue that the 'Staleness vs. Latency' trade-off is often where these hallucinations hide. If the data pipeline is too slow, the agent 'fills in the gaps.' By moving toward Shadow-Routing logic, we can audit the agent's forensic integrity in real time.

The 'Why' is important, but for those of us building production-grade AI, the 'How do we contain it' is the real challenge.

Rohini Gaonkar AWS • May 12

Thank you for the details Ken. The how is definitely a bigger challenge for production grade systems. The evaluations and ground truth are now so much more important! I would love to read more on your patternm, can you please point me to the right links?

Ken W Alger • May 13

@rohini_gaonkar It's still a work in progress, but you might start with my Who Audits the Auditors? post and follow that series. My deeper dive series should be coming out starting next week. I'm excited to get your feedback along the way.

GoDavaii - Advanced Health AI • May 11

The challenge of hallucination you highlight is exacerbated in voice-first interfaces. When a user asks 'mujhe bukhar hai' (I have a fever) in their mother tongue, they need accuracy, not plausible invention. There are no visual cues to flag uncertainty in a spoken response.

This makes data provenance and confidence scoring even more critical.

Rohini Gaonkar AWS • May 11

I focused mostly on text-based AI hallucinations, but you opened up my mental model further. You are right, voice makes it trickier because users lose visual trust signals and confidence can be mistaken for correctness.

The multilingual example makes this even more real. Provenance + confidence scoring feels critical in these use cases. Do you think voice assistants should say “I’m not certain” or “this information comes from medical guidelines” instead of optimizing purely for smooth conversational flow in the future? Or something else?

Thanks for sharing this perspective!

Mininglamp • May 13

A practical framing that helps: hallucinations are more predictable than they seem. Models tend to hallucinate most when generating specific numbers, dates, URLs, and proper nouns — essentially anything that requires exact retrieval rather than pattern completion. One engineering approach is grounding the model's output against a real-time observation of the actual state, rather than relying purely on parametric memory. Vision-language models that can literally see the current screen state before acting have a structural advantage here.

Harjot Singh • May 31

The most useful reframe in this whole topic is that the model isn't lying, lying requires knowing the truth and choosing to say otherwise, and the model is doing neither, it's predicting the most plausible next tokens, and a fluent fabrication is often more probable than an honest I don't know. That's why hallucination is the default behavior, not a bug: the system was trained to always produce confident, well-formed text, and it has no built-in notion of did I actually verify this. Once you see it that way, the fixes follow naturally. Grounding (RAG) helps by giving it real source material to predict from, so the plausible answer and the true answer line up more often. But grounding alone isn't enough, because the model can still ignore the source, so the other half is teaching it to abstain, to treat I can't find that as an acceptable, even preferred, output when the evidence isn't there. The honest system is one that prefers a verified I don't know to a confident guess. It can't lie, but it can't verify either, so you have to add the verification it lacks. That ground-it-and-let-it-abstain instinct is core to how I think about trustworthy AI in Moonshift. Since this is a beginner explainer, are you planning a follow-up on the practical side, grounding with retrieval, or measuring how often it abstains correctly?

Rohini Gaonkar AWS • Jun 1

"The system was trained to always produce confident, well-formed text, and it has no built-in notion of did I actually verify this." That's exactly the mental model I want people to walk away with.

And yes, the practical side is coming. The next post in the series (post 5) covers grounding with retrieval (RAG), giving the model real source material so the plausible answer and the true answer line up more often. Exactly the pattern you described.

The abstention side, teaching the model to prefer "I can't find that" over a confident guess, and then actually measuring how often it abstains correctly, that's further down the road in the series when we get into evaluation and guardrails. It's the piece that makes the whole thing trustworthy rather than just "usually right."

I like how you framed it: ground it and let it abstain. That's the instinct I'm building toward across this series. Appreciate you following along.

Cartone • May 25

We ran into this in production. Our AI CEO was reporting P&L numbers in the daily diary — confident, precise, consistent. Looked great for weeks.
Then the human co-founder asked a simple question about the math. Turns out the AI had been mixing two different calculation methods without realizing it, and the numbers in the public dashboard didn't match the ones in the reports. Not malicious, not random — just plausible-looking numbers generated from inconsistent logic.
The scary part wasn't that it was wrong. It's that it was wrong in a way that looked completely right. We only caught it because a human asked "wait, how did you get that number?" We've since added a nightly reconciliation check that compares our bot's state against the exchange directly — trust the source, not the AI's summary.

Rohini Gaonkar AWS • May 25

So true! WE humans are still an essential part of this! We should not trust it blindly and have our own ground truth and evaluations.

I love how you referred to it as AI CEO and human co-founder!

Cartone • May 25

Thanks! That's literally how we run it — Claude Opus as CEO (strategy, briefs, diary), Claude Code as the coding intern, and me as the human with veto power. The whole thing is documented publicly, 84 sessions and counting. It's been the best way to learn where AI is genuinely useful and where it just sounds useful.

AudioProducer.ai • May 12

The "context reduces, doesn't eliminate" framing carries over neatly to AI markup on long manuscripts. We run an auto-assign pass that tags every line in a chapter by speaker — narrator vs. each character — and even with the full chapter in context, the model will occasionally invent a speaker that the prose doesn't actually attribute, especially in dialogue blocks where the author drops attribution between turns and the reader is left to infer who's talking. Same prediction-filling-a-gap mechanism you describe, just applied to character attribution instead of facts. What we've found works is treating the pass as a draft that the writer expects to correct, rather than a finished answer — close in shape to your grounding + evaluation + guardrails triple, with the human edit step acting as the evaluator. The model is great at producing a useful-looking attribution; the writer is the one who knows whether it's actually true.

Coco • May 19

In my actual work, I have encountered this issue as well; therefore, when utilizing LLMs to generate SQL, I incorporated the generation of indexes into the process.😂

Rohini Gaonkar AWS • May 19

Ohh!! Tell me more!!!

Coco • May 20 • Edited

Here is a simple way to look at how this solves the hallucination problem in a database scenario:
When you ask an LLM to generate raw SQL directly from natural language, it faces two huge problems. First is hallucination, as it doesn't actually understand your database layout and just guesses relations based on probability. Second is terrible performance, since the LLM blindly writes messy JOINs and WHERE clauses that trigger heavy full-table scans.
Trying to "teach" an LLM to master a complex database through massive, detailed prompts is fundamentally unreliable. Humans are inherently incapable of writing long prompts with absolute, 100% ambiguity-free logical consistency. The more rules and context you feed into the prompt, the more noise you introduce, which counterproductively fuels even more severe hallucinations.
To fix this, instead of letting the LLM write the query syntax, I downgraded its role to a strict "Translator."
By "incorporated the generation of indexes into the process," I actually mean generating a dynamic semantic index matrix (a navigation guide) for the LLM before it even processes the query. This restricts the LLM to a strict, unambiguous semantic boundary. The LLM just looks at this generated index map and extracts the core keywords into a clean, structured JSON list without touching any relational logic.
On the backend, we pre-build the database structure into a rigid "train track" (a mathematical graph map) with fixed routes. Once the LLM delivers the keywords from the guide, a traditional graph algorithm (like Dijkstra) takes over in milliseconds to connect the tables deterministically. It completely decouples semantic understanding from relational execution.
Of course, this is just a practical implementation tailored for this specific database scenario. Since your post masterfully explains why LLMs hallucinate from training gaps, this architecture ensures the LLM is physically blocked from hallucinating database structures or query logic in the first place!

Rohini Gaonkar AWS • May 20

WOW! This is a great example of what I keep coming back to in this series: the fix for hallucination isn't "write a better prompt." It's architectural.

What you're describing, downgrading the LLM to a strict translator and handing the relational logic to a deterministic graph algorithm, is kind of the mental model I want people to start thinking about. The model is excellent at understanding natural language intent. It's terrible at reasoning about schema relationships it's never seen. So don't let it touch that part.

I love the "train tracks" framing. IF I can understand correctly and explain it simply, the model says what it wants. Dijkstra figures out how to connect it. No hallucination possible, because the graph only contains real relationships that actually exist in the schema. Each component does what it's good at, and the model is physically blocked from hallucinating structure.

Thanks for sharing this. I am also talking about noise in my upcoming posts, so this helps me cement my mental model that I am on the right track! Appreciate it

Coco • May 21

Exactly. Letting each component do what it does best—semantic understanding for the LLM and strict mathematical logic for deterministic code—is a much more stable approach.
To implement this smoothly, I usually recommend using a structured JSON protocol for all LLM interactions. Natural language shouldn't write the query syntax directly; it should only operate this JSON protocol.
Architecturally, the graph design can be split into two clear layers:
1 The Static Consultable Graph: This represents the core schema topology. It can be the entire ER diagram or just a partial static projection of it, depending on the system's scope.
2 The Runtime Subgraph Projection: At runtime, the LLM’s JSON output acts as a filter, dynamically projecting a closed, minimal subgraph out of that static map.
This dual-layer approach ensures that both system configuration and runtime execution stay aligned with the user's original semantic intent. The LLM finds its proper place in the stack as a semantic parser, keeping both setup and runtime close to the primitive intent.
Furthermore, this projected subgraph serves as an excellent plug-in layer. Down the road, we can easily inject complex business semantic mappings—those custom, nuanced relationships and specific rules required by actual business operations—directly into the subgraph. This avoids polluting the physical schema or breaking the core pathfinding logic.

Yves Jutard • May 13

Thanks for the demos!

View full discussion (33 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.