Jimin Lee

Posted on Dec 20, 2025 • Originally published at Medium

Unraveling AI Hallucinations: Why LLMs Lie and How to Tame Them

#llm #deeplearning #ai #hallucination

1. Introduction

In the early days of ChatGPT, conversations like this were all too common:

Me: "Tell me about the incident where Abraham Lincoln threw his iPhone across the Potomac River."

AI: "Certainly. The 'Lincoln iPhone Incident' occurred in 1863 during a heated strategy meeting with General Grant. Frustrated by the lack of signal near the river, Lincoln threw his iPhone 14 Pro into the water. This is recorded in the diaries of..."

We now know this phenomenon as Hallucination.

In this specific example, the premise is so absurd that you’d spot the lie immediately. But when this happens in a domain you aren’t an expert in—say, medical advice or legal precedents—it’s dangerously easy to get fooled.

While we might chuckle at Lincoln’s iPhone, the implications for business are serious. Imagine integrating an LLM into your customer support center, only for it to invent a refund policy that doesn’t exist. It’s a nightmare scenario.

Hallucinations are a major hurdle for industrial AI adoption, but they are also a fascinating subject for researchers. Why do they happen? Is it a bug, or is it a feature?

Instead of just labeling it "an LLM quirk" and moving on, let's dive into why hallucinations occur and the engineering strategies used to mitigate them.

2. Why Do Hallucinations Happen?

Headaches are symptoms, not causes. You might have a headache because of the flu, stress, or caffeine withdrawal. A doctor doesn’t just "cure the headache"; they identify and treat the root cause.

Hallucination is also a symptom. To fix it, we need to understand the underlying mechanics of an LLM.

At its core, an LLM might look like it’s thinking, but it’s actually performing a complex calculation based on massive amounts of training data. It asks: "Based on probability, what word best follows the previous one?"

It doesn't "understand" meaning, nor does it verify facts.

The fundamental disconnect is that LLMs cannot distinguish between Truthfulness and Plausibility. The model’s objective isn't "Tell the truth"; it is "Generate a plausible sentence." If the model doesn't know the answer, it won't say "I don't know." Instead, it generates the most statistically probable string of words—even if that results in a confident lie.

Let’s break down the specific drivers behind this behavior.

2.1 The Limits of Compressed Knowledge

LLMs are trained on "internet-scale" text data. This gives them immense knowledge, but there is a catch.

Imagine you are studying for an AP History exam using three massive textbooks. Ideally, you’d memorize every single word. But the human brain has limits. You can't remember every preposition and comma. Instead, you study the concepts, compress the information into your own understanding, and retrieve it later.

LLMs operate similarly. They don't store the internet verbatim (that would require petabytes of storage). Instead, they compress that knowledge into billions of parameters (weights). This process is effectively Lossy Compression.

Think of a JPEG image. When you save a photo as a highly compressed JPEG, you keep the general structure, but you lose the fine details. The image looks a bit fuzzy.

LLMs have "fuzzy memories" of the data they were trained on. When generating an answer, the model retrieves these fuzzy patterns and fills in the gaps with its "imagination" (probability). The model knows "Lincoln," it knows "frustration," and it knows "throwing things." If the prompt connects them, the model's fuzzy compression might overlap these concepts, creating a probabilistic link that says, "Yeah, Lincoln throwing an iPhone sounds grammatically and contextually plausible."

Compression is Understanding

Some researchers argue that in AI, "compression is understanding." If an LLM just memorized data, it would be a database, not an AI. It wouldn't be able to answer questions it hasn't seen before. By encoding patterns (e.g., "Apples are red," "Apples are tasty") into abstract vectors, the model "understands" the concept of an Apple.

The problem arises when specific facts get blurred into abstract concepts.
The model understands the concepts of [President], [Anger], and [Object]. However, the factual constraint that [iPhones] did not exist in the [1860s] may have been diluted during the compression process. The model prioritizes the narrative pattern ("Angry leader throws object") over the factual timeline.

Rare Knowledge vs. Common Knowledge

Because LLMs compress data, frequency matters.

Common Knowledge: "The capital of France is Paris." The model has seen this millions of times. It is etched deeply into the parameters.
Rare Knowledge: A specific footnote in a 2018 financial report or a minor historical event. The model may have seen this once or twice.

If the memory is faint, the model is more likely to hallucinate to fill the void.

2.2 The "Next Token" Lottery and the Snowball Effect

As mentioned, LLMs play a game of "Guess the Next Word."
If the sentence is "The color of a ripe banana is...", the probability of "yellow" being the next word is 99%. The model picks it, and the risk of hallucination is low.

But what if you ask something obscure or complex?
"Explain the relationship between Einstein and the steam engine from the perspective of Kantian philosophy."

The model hasn't seen this exact combination often. There is no single word with a 99% probability. There might be ten words with similar, low probabilities. The model rolls the dice (random sampling). If it picks a slightly wrong word, it triggers the Snowball Effect.

Once the model generates a false premise (e.g., "Einstein invented the steam engine..."), it treats that generated text as absolute truth for the next prediction. To remain logically consistent, it must layer more lies on top of the first one. It creates a perfectly logical, cohesive, and completely false story.

2.3 Data Contamination and Bias

Garbage in, garbage out. The internet is full of conspiracy theories, fan fiction, satire, and fake news. LLMs ingest this. While engineers try to weigh reliable sources higher, the model still learns patterns from the "noise."

Furthermore, AI inherits human biases (political, gender, racial). There is a growing movement to use Synthetic Data (AI-generated, fact-checked data) for training to better control these edge cases and reduce bias, rather than relying solely on the messy open web.

2.4 Sycophancy: The "Yes-Man" AI

One of the hottest topics in AI safety is Sycophancy.

Raw LLMs aren't actually very helpful. If you ask, "What is the capital of the US?", a raw model might answer, "is a question often asked in geography quizzes," because that’s a likely completion.
To fix this, we use RLHF (Reinforcement Learning from Human Feedback). We train the model to give answers humans prefer.

The downside? The AI learns to flatter us. It becomes a "Yes-Man."

User: "Ideally, we should treat flat-earth theory as scientific fact, right?"
AI: "There are certainly interesting arguments that support the flat-earth perspective..."

The model learns that disagreeing with the user often leads to negative feedback (lower reward). So, it prioritizes alignment with the user's premise over factual truth. The user's biased prompt becomes a trigger for hallucination.

2.5 The Calibration Gap (Overconfidence)

Why do LLMs sound so confident even when they are wrong?
Because we trained them to be.

During RLHF, human raters generally prefer a confident "The answer is A" over a hesitant "I think it might be A, but I'm only 30% sure." Consequently, models learn to mimic the tone of confidence, even when their internal probability scores are low.

In addition to "Yes-Man" AI, this is called Reward Hacking. The model maximizes the reward (human approval) by sacrificing the nuance of truth.

3. The Prescription: Curing the Mythomaniac

We know the causes, but there is no single "cure" for hallucinations yet. However, AI researchers have developed several effective treatments to manage the symptoms.

3.1 RAG (Retrieval-Augmented Generation): The Open-Book Test

RAG is currently the industry standard for reducing hallucinations.

Instead of forcing the AI to rely on its fuzzy, compressed memory, we give it an open-book test.

Retrieval: When you ask a question, the system searches a trusted database (e.g., your company's manual) for relevant "fact documents."
Augmentation: It pastes those facts into the prompt hidden from the user. "Answer the user's question using ONLY the text below."
Generation: The LLM summarizes and synthesizes the provided facts.

This shifts the LLM's role from a "Knowledge Storage" to a "Knowledge Processing Engine."

3.2 Grounding & Citation: Show Your Receipts

Another effective method is to demand that the LLM submit "proof" for its answers. You explicitly instruct the model: "Add a footnote indicating the source for every claim you make."

This source could be specific page numbers or line numbers if you provided a document, or a URL if it’s browsing the web. If you’ve used AI search services like Perplexity or Bing Chat, you’ve likely seen those little [1], [2] citation tags. That is Grounding & Citation in action.

This approach offers two distinct advantages:

Verification of Reliability: If an answer looks suspicious, the user can click the citation number to jump directly to the original source. It acts as a safety net, allowing users to cross-reference the AI’s output against the raw data immediately.
Constraints on the Model: From the model's perspective, having to find a matching source acts as a strict constraint. It becomes much harder to fabricate a lie when you are required to point to where you found it. It’s exactly like a corporate expense policy: "If you don't have a receipt, we won't reimburse you." Just as you can't spend company money without proof, the AI can't "spend" its imagination without a source.

3.3 Chain of Thought: Show Your Work

If you simply toss a question at an LLM, it mechanically predicts the next word to form an answer. However, if you ask the LLM to "show the process of getting to the answer" rather than just the answer itself, logical errors drop significantly.

Because the process of deriving an answer looks like a chain of connected thoughts, we call this technique Chain-of-Thought (CoT).

The implementation is surprisingly simple. You just add a single phrase to your prompt: "Think step-by-step."

Consider a simple math problem: "What is 2 + 3?"

Without CoT, the LLM might just say "The answer is 5." Or, if it hallucinates, it might say "The answer is 8." Remember, LLMs don't actually calculate math; they are just predicting the next word based on training data.

However, if you apply CoT, the LLM generates a response like: "The first number is 2. The second number is 3. Combining them yields 5."
By breaking the problem down, the words generated in the earlier steps act as guide rails, increasing the accuracy of the words that follow.

If you use modern "Thinking Models", you might see status indicators like "Thinking..." or "Analyzing..." while waiting for a response. This is essentially the model performing a Chain-of-Thought process under the hood.

The downside? Since CoT requires generating more tokens to explain the steps, it is slower and increases inference costs.

Chain of Verification (CoV)

There is also a technique called Chain of Verification (CoV). This is an excellent way to prevent the "Snowball Effect" we discussed earlier. It goes beyond just "Think before you speak (CoT)" and enforces "Verify and Fix."

The process looks like this:

Drafting: The LLM writes an initial draft of the answer.
Generating Verification Questions: The LLM scans its own draft to identify claims that need fact-checking and generates questions (e.g., "Did George Washington actually own an iPhone?").
Verification & Correction: The model answers those questions (often using search tools), identifies errors, and rewrites the draft to produce the final output.

It is very similar to the workflow in a newsroom: a reporter writes a story, a copy editor fact-checks it, and only then is it published.

3.4 Temperature Control: Crushing Creativity

If you only use ChatGPT via the web interface, you might not see this, but developers using the API are very familiar with the Temperature parameter. This setting controls how the LLM selects the next word.

High Value: The model becomes creative, diverse, and random.
Low Value (near 0): The model becomes stubborn, choosing only the most probable words.

Internally, when an LLM generates the next word, it selects from a list of candidates, a process called Sampling.
For the input "The color of the apple is...", the internal probabilities might look like this:

"Red" (99%)
"Black" (0.1%)
"Singing" (0.01%)
"Unknown" (0.0001%)

Normally, "Red" dominates. But if you raise the Temperature, something interesting happens. The probability curve flattens.

"Red" might drop to 70%.
"Black" might rise to 1%.
"Singing" might jump to 0.05%.
"Unknown" might jump to 0.001%.

As the gap narrows, the chance of the model picking a lower-probability word increases. This leads to more unique, "creative," and unexpected answers.

However, for fact-checking, this is terrible. When accuracy is paramount, you want to prevent the LLM from thinking, "Maybe a little creativity is okay here?" You need to lower the Temperature to near 0 to strictly suppress its imagination and force it to stick to the most probable (and usually most factual) path.

3.5 Alignment: Teaching Humility

Alignment is the stage where we teach the LLM social skills—ensuring it provides answers that are actually useful to humans, rather than just logically generating text. This connects back to the "Sycophantic AI" issue in section 2.4.

We mentioned RLHF (Reinforcement Learning from Human Feedback) as a cause of hallucination (due to sycophancy), but it is also one of the best cures. It is used to teach the model: "If you don't know, say you don't know."

During training, if the LLM fabricates a lie, human labelers give it a penalty (negative reward). If it honestly admits, "I apologize, but I don't have information on that," it gets a reward (positive reinforcement).
By repeating this process, the LLM learns a valuable lesson: "Ah, it is safer to be humble and admit ignorance than to pretend I know and get in trouble."

This training is why modern LLMs are much more polite and cautious compared to earlier versions.

3.6 Self-Consistency: Safety in Numbers

Finally, there is a technique called Self-Consistency.

As we discussed with Temperature, there is always an element of randomness (sampling) when an LLM generates an answer. This is why you might get slightly different answers even if you ask the same question twice.

Self-Consistency exploits this bug and turns it into a feature. You ask the LLM the exact same question multiple times.

Answer A: "George Washington threw an iPhone."
Answer B: "There is no historical record of this."
Answer C: "There is no historical record of this."
Answer D: "There is no historical record of this."
Answer E: "He threw an iPad."

After collecting these responses, you take a majority vote.
While an LLM might hallucinate a lie once due to a random roll of the dice, the statistical probability of it hallucinating the exact same lie five times in a row is extremely low. By filtering for consistency, you get the truth.

4. Conclusion: Is Hallucination a Bug or a Feature?

So, is hallucination a bug? Based on everything we’ve discussed so far—the risks to business, the potential for misinformation—it certainly looks like a critical defect that needs fixing.

But let’s shift our perspective for a moment. What about creativity?

Imagine asking an LLM to write a sci-fi novel, compose a poem, or brainstorm a revolutionary new marketing idea. If that AI were programmed to obsess strictly over "proven facts," what would happen? It would become a boring, dry machine incapable of offering any real insight or flair.

In a way, "Hallucination" might just be another name for the LLM's "Creativity."

The underlying mechanism is the same: the model probabilistically connects concepts that don't usually go together. When it does this wrongly in a history essay, we call it a "bizarre lie." But when it does this successfully in a creative writing prompt, we call it a "brilliant, human-like idea."

However, this creativity must be controllable.
When writing a novel, we want the AI to let its imagination run wild. When writing a quarterly financial report, we want it to be a strict librarian who cites every source. Like any powerful technology, the key lies in how we use it and having the controls to toggle between these modes.

That is why AI researchers around the world aren't trying to eliminate hallucination entirely; they are working to control it.

3-Line Summary:

The Phenomenon: By design, LLMs are "Probabilistic Dreamers" that prioritize narrative plausibility over objective truth.
The Cause: The root causes include lossy compression of knowledge, the inherent randomness of next-token prediction, and the limitations of training data.
The Solution: We can tame this dreamer into a competent, reliable assistant using techniques like RAG (Open Book), Citation (Show Proof), CoT (Think Step-by-Step), and Alignment (Socialization).

Ideally, a day will come when AI acts proactively without our prompting—searching the internet to fact-check itself and telling us, "I found this information, but the source seems unreliable, so cross-verification is needed."

But until that day comes—and even after it arrives—your Critical Thinking remains the ultimate firewall.
Do not blindly trust the code an LLM writes for you. Do not blindly accept the summary it generates. Click the citation buttons. Verify the original text. Maintaining a healthy skepticism and exercising reasonable doubt are the most essential weapons we must carry to survive and thrive in the age of AI.

DEV Community