DEV Community

Gerus Lab
Gerus Lab

Posted on

LLM Hallucinations Are Compression Artifacts — And That Changes Everything About How We Build AI Products

LLM Hallucinations Are Compression Artifacts — And That Changes Everything About How We Build AI Products

At Gerus-lab, we've shipped over 14 AI-powered products. And we kept hitting the same wall: hallucinations. We tried prompting tricks, RAG, fine-tuning, system prompt gymnastics — until we finally understood the real reason LLMs lie with confidence. It's not a bug. It's information theory.


The Frame That Changes Everything

Here's a mental model that completely reframed how we build AI at Gerus-lab:

A large language model is a lossy compression algorithm.

Not metaphorically. Mathematically.

Claude Shannon proved in 1948 that prediction and compression are the same thing. If you can predict the next symbol well, you can compress data well. LLMs are trained to predict the next token — which means their weights are, fundamentally, a compressed representation of the training corpus.

GPT-4? That's 10+ terabytes of text squished into ~70GB of weights. Same energy as converting a lossless TIFF into a heavily compressed JPEG.

And if you know anything about lossy compression, you already know what happens next.


JPEG for Knowledge

Every developer has seen over-compressed JPEG artifacts: blocky edges, phantom colors, smeared text in the background. The key insight is this:

JPEG LLM
Large, high-contrast objects Common knowledge, frequent patterns
Fine details Rare facts, exact dates, specific numbers
Compression artifacts Hallucinations
Quality setting (1–100%) Model size (7B → 70B → 405B)
Original file Training data

A hallucination is a compression artifact. The model "remembers" that something of a certain type should be here — a citation, a number, a URL — but the exact bits are gone. So it reconstructs a plausible fragment. Exactly like JPEG fills in pixels that weren't there.

This isn't the model "lying." It's the fundamental behavior of any lossy codec: it doesn't know where it lost information, because the information about those losses was also lost.


Why This Explains Every LLM Quirk You've Seen

Once you have this frame, everything clicks:

Why is GPT good at code?
Code is one of the most compressible text forms. Strict syntax, repeating patterns, limited vocabulary. for i in range(n) appears millions of times. The codec "memorized" these patterns with minimal loss — like a large, high-contrast shape in a JPEG.

Why is it bad at math?
Exact numbers are the "fine details" that lossy compression destroys first. 1847 × 9283 is just three random numbers with no pattern to compress. The model can't "compute" — it reconstructs "something numeric that looks plausible."

Why does a bigger model help?
Because it's literally a higher bitrate. JPEG at 30% quality → 60% → 90%. More parameters = more bits available = fewer artifacts. That's why the parameter count race is really a bitrate race.

Why does the model confidently fabricate?
Because JPEG confidently renders non-existent pixels. Artifacts don't come labeled "WARNING: ARTIFACT." They look like real data. The codec doesn't know where it lost information.


What We Actually Do at Gerus-lab Because of This

Understanding LLMs as lossy codecs completely changed our architecture decisions. Here's what we changed:

1. We stopped fighting hallucinations with prompts alone

Prompt engineering is essentially telling the decoder "seek to this region of the compressed file." Useful, but it doesn't add bits back. You can tell a JPEG to be sharper — you can't recover pixels that were never stored.

2. We default to RAG for anything factual

RAG (Retrieval-Augmented Generation) is inserting lossless data into a lossy stream. Instead of trusting how the codec "remembered" a fact, you hand it the original. It's like embedding a PNG fragment inside a JPEG. It costs tokens (context window), but there are no artifacts.

In our AI product builds, we now treat "should this be in the context or in model weights?" as a fundamental architecture question — not an afterthought.

3. Temperature is a quality slider (and we treat it that way)

temperature = 0 means argmax at every step — you get sharper output, but artifacts get harsher. temperature = 1.0+ adds noise — artifacts blur, "creativity" increases. What we call "creativity" in LLMs is actually sampling from less-probable reconstructions in latent space.

For factual tasks: low temperature, RAG. For creative tasks: higher temperature, broader prompt space. Simple, but most teams get this backwards.

4. Fine-tuning is codec reprogramming

When a client at Gerus-lab asks us to fine-tune a model for their domain, we reframe it: we're reallocating the bitrate budget. "I don't care about 19th-century poetry — compress legal documents better." Fine-tuning redistributes the model's limited capacity toward the patterns that matter.

This framing helps clients set realistic expectations. You're not teaching the model to "know more" — you're teaching it to compress certain domains with higher fidelity.


The Uncomfortable Truth: Hallucinations Cannot Be Eliminated

If hallucinations are compression artifacts, then the math is brutal:

You cannot compress 10TB into 70GB without loss. You cannot squeeze data below its entropy without degrading it.

Anyone who says "we solved hallucinations" without specifying "via a much larger model" or "via external lossless memory" is either confused or marketing.

The honest engineering position:

  • Larger model = higher bitrate = fewer artifacts
  • RAG = lossless inserts = no artifacts for retrieved content
  • Better architecture = better codec = fewer artifacts per parameter
  • But lossy compression always loses something

This doesn't mean AI is useless. A JPEG at 80% quality is still incredibly useful. You just don't use it for archival storage or OCR-critical text. Same with LLMs — you don't use raw model recall for anything where exactness is critical.


The Humbling Parallel: We're Lossy Codecs Too

Here's the part that usually makes our team pause:

Do you remember what you had for lunch last Thursday? The exact text on slide 14 of yesterday's presentation?

Human memory is also a lossy codec. We compress the stream of experience into neural patterns, lose details, and reconstruct plausible-but-wrong memories. Psychologists call this "confabulation" — the brain fills memory gaps with invented but plausible details.

Literally hallucinations.

We do this for the same reason: input data volume is incompatible with storage capacity. Your retina sends ~10Mbps, but you don't remember the face of someone you spoke to an hour ago because your biological codec decided those bits weren't worth keeping.

The difference is we've had millions of years to calibrate when to compress hard and when to keep the original. LLMs are 5 years old and trained on human outputs, not cognition. Give them time.


Practical Architecture Checklist

Based on what we've learned at Gerus-lab building AI systems:

Use model memory (weights) for:

  • Common patterns and general reasoning
  • Code generation (high-compressibility domain)
  • Language tasks (translation, summarization)
  • Creative generation where "plausible" is acceptable

Use RAG / lossless context for:

  • Specific facts, dates, names, numbers
  • Company-specific or domain-specific knowledge
  • Any content that changes over time
  • Legal, medical, or financial accuracy requirements

Use fine-tuning for:

  • Reallocating bitrate to your domain
  • Style/format consistency
  • Reducing latency by baking common patterns in

Use larger models for:

  • Tasks where hallucination cost is high
  • Complex reasoning chains
  • When budget allows

The Bottom Line

We've stopped thinking "how do we prevent the AI from lying?" and started thinking "how do we architect around lossy compression?"

It's a small reframe that leads to completely different technical decisions. Decisions that actually work.

If you're building a product on top of an LLM — and increasingly, everyone is — you need to internalize this. Not because it's academically interesting (it is), but because it will change every architecture decision you make.

At Gerus-lab, this understanding is now baked into how we spec, design, and deliver every AI system. It's the difference between fighting the model and working with its actual properties.

Ready to build an AI system that's honest about what it knows? Talk to us at Gerus-lab — we've been down this road 14+ times and know where the compression artifacts hide.


Tags: ai, machinelearning, programming, webdev, tutorial

Top comments (0)