Why LLMs Hallucinate, and How to Reduce It

#ai #beginners #llm #machinelearning

"Why does ChatGPT make things up?" Because it predicts a plausible next word, not a true one — and it never says "I don't know" unless you make it. Here's the cause, in plain words, with an interactive demo.

🌫️ Try it: https://dev48v.infy.uk/ai/days/day7-hallucinate.html

It predicts likely, not true

An LLM generates the most LIKELY continuation given everything so far. "Likely" is learned from patterns in its training text — not a lookup of verified facts. When the likely-sounding continuation is false, that's a hallucination. Same machinery whether it's right or wrong.

It has no "I don't know" by default

In its training data, questions are followed by answers, not by "I'm unsure". So when you ask something it lacks, the most probable continuation is still a confident-looking answer. It fills the gap with something that FITS the pattern.

Confidence ≠ correctness

A made-up answer is generated by the exact same fluent process as a true one, so it SOUNDS identical. There's no internal "truth meter". Worst on the obscure, the recent (post-cutoff), exact numbers/citations, and ANYTHING about your private data.

Reduce it

Ground it (RAG): give it the real docs, answer ONLY from them, cite sources.
Allow an out: "If the context doesn't contain the answer, say you don't know."
Lower temperature for factual tasks.
Verify names, numbers, citations — treat confident output as a draft.

You can't fully erase it — it's intrinsic. Toggle grounding and watch the unanswerable questions become honest "I don't know"s.