Gerus Lab

Posted on Apr 5

LLM Hallucinations Are Compression Artifacts. Yes, Really. And This Changes How We Build.

#ai #webdev #programming #machinelearning

Everyone's been treating LLM hallucinations like a bug. A flaw in reasoning. Something to patch with better prompts, more RLHF, or some future breakthrough in "trustworthy AI."

We disagree. And once you see it the way we see it, you can't unsee it.

Hallucinations are compression artifacts. Always were. Always will be (to some degree).

Here's the argument — and why it completely changes how we approach AI systems at Gerus-lab.

The Shannon Insight Nobody Talks About Enough

Claude Shannon proved in 1948 that predicting the next symbol and compressing data are mathematically identical operations. Not similar. Not analogous. Identical.

If you can accurately predict the next character in a sequence, you can compress data. If you can compress data well, you can predict. Arithmetic coding literally turns a good predictor into a good compressor.

Now: what does an LLM do at the most fundamental level?

def predict_next_token(context: str) -> Distribution:
    """This is simultaneously prediction AND decompression"""
    pass

GPT, Claude, Gemini — they're all trained to predict the next token. Which means they're all, at their core, compression algorithms. The model weights are the compressed file. The training data is the original.

You are talking to a ZIP file that learned to unzip itself.

JPEG for Text

Everybody knows what happens when you over-compress a JPEG:

Large, high-contrast objects survive well (faces, sky, obvious shapes)
Fine details are the first to go (text on signs, license plates, eyelashes)
Artifacts appear at boundaries — blocks, halos, colors that weren't there
Those artifacts look plausible. A non-expert might not notice.

Replace "pixels" with "knowledge":

JPEG	LLM
Large, high-contrast objects	Common patterns, general knowledge
Fine details	Rare facts, exact figures, specific dates
Boundary artifacts	Hallucinations
Quality slider (1–100%)	Model size (7B → 70B → 405B)
Original file	Training data

A hallucination is a compression artifact. The model "remembers" that something of a certain type should go here (a citation, a number, a date), but the exact bits were lost. So it reconstructs something plausible. Exactly like JPEG reconstructing pixels that weren't there.

This Explains Everything. Seriously.

At Gerus-lab, we've built AI-integrated systems for Web3 clients, SaaS platforms, and GameFi projects. We've been burned by hallucinations. We've also learned to predict exactly when they'll happen — because the compression frame makes it obvious.

Why is LLM great at code?

Code is one of the most compressible text formats. Strict syntax, repetitive patterns, limited vocabulary. for i in range(n) appears millions of times. The codec "remembers" code patterns almost losslessly. It's like compressing a geometric shape in a JPEG — clean edges survive.

Why is LLM terrible at math?

Exact numbers are exactly those "fine details" that get lost first. 23 × 47 = 1081 — to a language model, that's just three arbitrary tokens with no pattern. You can't "compress" a multiplication table. You can only memorize it in full or compute it algorithmically. LLMs do neither — they reconstruct "something numeric that looks right."

The rarer the example, the more artifacts. Just like JPEG: faces are fine, license plates in the background turn to noise.

Why does model size help?

Because it's literally higher bitrate. JPEG at 30% quality → 60% → 90%. More bits available = fewer losses. This is why the parameter race is a bitrate race.

Temperature Is the Quality Slider

When you set temperature = 0, you're telling the decoder: "take the most probable option at every step." This is like sharpening on an over-compressed JPEG — crisp image, but artifacts become harder.

When you set temperature = 1.5+, you're saying "add noise." Like dithering — artifacts blur, but sharpness drops. "Creativity" appears, which is really just sampling from less likely reconstructions.

# temperature = 0.0   -> argmax, crisp artifacts
# temperature = 0.7   -> soft sampling, balanced
# temperature = 1.5   -> noisy, "creative"
# temperature -> inf  -> random noise

LLM "creativity" isn't thinking. It's interpolation between reconstructions in latent space.

RAG, Fine-tuning, and Prompt Engineering — Reframed

RAG — you're injecting lossless data into the context. Instead of relying on the codec's "memory" of a fact, you hand it the original. Like inserting a PNG fragment into a JPEG stream.

Fine-tuning — you're re-encoding the file with different priorities. "Compress legal texts better, I don't care about poetry." Redistributing the bit budget.

Prompt engineering — you're telling the decoder which region of the compressed file to reconstruct from. "You're a Kubernetes expert" = seek in the DevOps pattern block.

Technique	In compression terms
RAG	Lossless injection into lossy stream
Fine-tuning	Re-encoding with a new profile
Prompt engineering	Seek + decoder hint
RLHF	Rebuilding codec for subjective quality

Can You Fix Hallucinations?

If hallucinations are compression artifacts, the answer is mathematically strict: no. Not completely.

You can increase bitrate (larger model). You can add lossless data (RAG). You can improve the codec (better architecture). All of this reduces artifacts.

But as long as you're compressing 10 TB into 70 GB, there will be losses. You cannot compress data below its entropy without loss.

Anyone saying "we've solved hallucinations" without specifying how is either lying or doesn't understand information theory.

How We Actually Build With This at Gerus-lab

This isn't academic. It directly shapes every AI system we ship.

For our Web3 and blockchain clients: Smart contract logic? Never trust the LLM for exact values. We use LLMs for scaffolding and pattern generation. Exact addresses, amounts, function signatures get validated against lossless sources every time.

For our AI-integrated SaaS products: We treat LLM outputs as "high-confidence approximations." Our pipelines always include a verification layer — RAG for factual accuracy, structured output parsers that fail loudly when something is implausible.

For GameFi projects: We embrace creative reconstruction where it serves the product. A character that "hallucinates" lore-consistent details is often better UX. Here, the artifact is a feature.

The model is a lossy codec. Design around that reality.

The Uncomfortable Mirror

Here's the twist: so are we.

Human memory is also a lossy codec. We compress experience into neural patterns, lose details, reconstruct plausible content. Psychologists call it confabulation — the brain fills memory gaps with invented but believable details.

Literally hallucinations.

The difference is that we sometimes have contextual awareness of our uncertainty. LLMs currently don't know what they've lost. That gap is the actual unsolved problem.

What This Means for You

Stop being surprised by hallucinations on rare facts. That's JPEG on max compression.
Use RAG for anything needing exact fidelity. It's the only lossless option.
Match task type to model capability. Code and patterns = great. Exact numbers and obscure facts = verify externally.
Bigger model ≠ no hallucinations. Fewer, but same failure mode.
Temperature is a tool. High for creative tasks, low for factual reconstruction.

Build Smarter AI Systems

At Gerus-lab, we design AI architectures that work with the nature of LLMs, not against it. We've shipped AI systems for fintech, Web3, SaaS, and gaming.

Explore what we build → gerus-lab.com

Hallucinations aren't a bug waiting to be patched. They're physics. Build accordingly.