Gerus Lab

Posted on Apr 6

Stop Blaming LLMs for Hallucinations — You're Using a JPEG as a Database

#webdev #programming #ai #machinelearning

At Gerus-lab, we've integrated AI into production systems across Web3, SaaS, and GameFi projects. And the one conversation that comes up every single time with a new client is some version of: "But what about hallucinations? Can we trust this thing?"

Short answer: yes, if you understand what you're actually dealing with.

Long answer: you've been thinking about LLMs fundamentally wrong, and that's why hallucinations keep surprising you.

LLMs Are Lossy Compression — Not Magic, Not Madness

Here's the mental model that changed everything for our engineering team.

Imagine you take 10 terabytes of text — every book, article, forum thread, and Stack Overflow answer ever written — and you compress it into a 70 GB file. Not lossless compression. Lossy compression. Like JPEG.

Now, when you ask a question, you're not "querying a database." You're asking a compressed file to reconstruct information that may or may not have survived the compression.

This isn't a metaphor. Claude Shannon proved in 1948 that prediction and compression are mathematically identical. A system that predicts the next token is a compression algorithm. The model weights are the compressed file. Every LLM you've ever used is a sophisticated JPEG for human knowledge.

And if you've ever tried to read a street sign in an aggressively compressed JPEG — you already know exactly how hallucinations work.

The JPEG Analogy Explains Everything

When you compress an image to JPEG quality 20%, a few things happen:

Large, high-contrast objects survive well — the face is recognizable, the sky stays blue
Fine details get destroyed first — the text on a sign, the license plate, individual eyelashes
Artifacts appear at boundaries — blocky regions, phantom colors, edges that don't exist in the original
The artifacts look plausible — a non-expert won't notice them

Now translate that to LLMs:

JPEG	LLM
Large, high-contrast objects	Common knowledge, frequent patterns
Fine details	Rare facts, exact numbers, specific dates
Boundary artifacts	Hallucinations
Quality setting (1–100%)	Model size (7B → 70B → 405B)

A hallucination is a compression artifact. The model "remembers" that something of a certain type belongs in this position (a citation, a number, a name), but the exact bits were lost in compression. So it reconstructs something plausible. Exactly like JPEG reconstructs pixels that weren't in the original.

This is why LLMs confidently hallucinate. JPEG doesn't mark its artifacts as artifacts. The codec doesn't know where it lost information — that metadata was lost too.

Why This Pattern Shows Up in Our Work

We've shipped AI integrations for clients ranging from Telegram-based GameFi platforms to enterprise SaaS automation tools. Here's what we see consistently:

LLMs are great at code because code is highly compressible. Strict syntax, repeating patterns, limited vocabulary. for i in range(n) appears millions of times in training data. The codec captures that pattern nearly losslessly. Ask GPT to write a REST endpoint — it'll be fine. Ask it to recall the exact API signature of an obscure library from 2019 — you're asking for a blurry license plate.

LLMs struggle with math because exact numbers are fine details — the first things lossy compression throws away. 23 × 47 = 1081 is just three random numbers with no pattern to compress. The model can't "compress" a multiplication table; it can only reconstruct "something numerical that looks right."

Larger models hallucinate less because they have more bits. Going from 7B to 70B parameters is literally increasing the bitrate. More storage space → fewer forced lossy tradeoffs → finer details preserved. This is why the race for parameters is a race for bitrate.

Temperature is your quality slider — but backwards. Temperature 0 = argmax decoding, crisp output, but artifacts are sharper. Temperature 1.5+ = you're adding noise, blurring artifacts at the cost of coherence. "Creativity" in an LLM is interpolation between possible reconstructions in latent space — the same way an over-compressed JPEG invents colors between blocks.

The Fix: Don't Ask JPEG for Lossless Data

Once you accept the compression frame, every modern AI technique clicks into place:

RAG Is Inserting a PNG Fragment Into Your JPEG

Retrieval-Augmented Generation works because you're injecting lossless data directly into a lossy system. Instead of trusting the model's compressed memory of a fact, you hand it the original. It's more expensive (context windows aren't infinite), but the artifact rate drops dramatically.

In our AI-powered SaaS projects at Gerus-lab, RAG is non-negotiable for anything that requires factual precision — pricing data, legal clauses, client-specific configurations. We treat the LLM as a reasoning engine, not a fact store.

Fine-Tuning Is Re-Encoding With New Priorities

Fine-tuning says: "I don't care if you compress Victorian poetry poorly. Spend those bits on financial documents instead." You're redistributing the bit budget.

We do this for domain-specific clients: a fine-tuned model that's seen thousands of examples from a particular industry will have compressed those patterns better than a general model. Less hallucination in domain, more in everything else — a conscious tradeoff.

Prompt Engineering Is a Seek Command

"You are an expert in Kubernetes" tells the decoder: search in the region of the compressed file where DevOps patterns are stored. You're not making the model smarter — you're navigating to a denser part of the archive.

"Think step by step" increases the surface area of the reconstruction process, giving the codec more context to interpolate from. Chain-of-thought prompting works for the same reason streaming decoders produce sharper images — more passes, more context, fewer artifacts.

Can We Eliminate Hallucinations?

Mathematically: no. Not completely. Not with this architecture.

As long as you're compressing 10TB into 70GB, you will have lossy reconstruction. You can:

Increase bitrate (larger model)
Insert lossless anchors (RAG)
Re-encode with domain priorities (fine-tuning)
Guide the decoder (prompting)

But you cannot compress data below its entropy without loss. Anyone claiming to "solve hallucinations" without specifying "by massively scaling the model" or "by adding external memory" is either selling you something or doesn't understand information theory.

At Gerus-lab, our answer to hallucination risk is system design, not model trust. We build AI systems where:

High-stakes facts come from verified sources (RAG), not model memory
Model outputs for critical paths are validated against external data
Temperature and sampling parameters are tuned per use-case, not set to defaults
We distinguish between tasks where lossy reconstruction is fine (drafting, brainstorming) and tasks where it isn't (data extraction, compliance)

The Uncomfortable Mirror

Here's the twist that genuinely changed how I think about this: we're lossy codecs too.

Do you remember what you had for lunch last Thursday? Can you recall slide 14 from yesterday's presentation?

Human memory is lossy compression. We compress streams of experience into neural patterns, lose details in the process, and reconstruct plausible narratives when queried. Psychologists call this confabulation — the brain filling gaps with invented-but-plausible details.

Literally hallucinations.

We do it for the same reason LLMs do: the volume of input data is incompatible with the storage capacity. Your retina pushes ~10 Mbit/s, and your brain decided the face of the barista you saw this morning wasn't worth storing.

The difference: we've had a million years of codec tuning. LLMs have had four years.

What This Means for Your Next AI Project

The compression frame kills two bad takes at once:

The hype take: "AI will replace all knowledge workers because it knows everything" — no, it knows a lossy approximation of what was in the training data, and it's wrong at predictable places.
The fear take: "AI is useless because it makes things up" — so does your own memory. You still use it. The question is whether you've designed your workflow to compensate for its failure modes.

A JPEG is still an incredibly useful format. You just don't use it to store medical imaging data.

LLMs are the same. Phenomenally useful. With specific, predictable failure modes that engineering can compensate for — if you understand what you're actually working with.

At Gerus-lab, we build AI systems that treat hallucination as an engineering problem, not a reason to avoid AI. If you're integrating LLMs into your product and want to do it in a way that's actually production-safe — let's talk.

We've shipped AI features across Web3, SaaS, and GameFi. We know where the artifacts are.

DEV Community