Ceyhun Aksan

Posted on Mar 19 • Originally published at ceaksan.com

11 Ways LLMs Fail in Production (With Academic Sources)

#ai #llm #machinelearning #production

If you use LLMs in production, you've seen these. Not random errors, but systematic failures baked into architecture and training.

I documented 11 behavioral failure modes with 60+ academic sources. Here's the short version.

1. Hallucination / Confabulation

The model references a library that doesn't exist. Confidently. The worse variant: you ask "why?" and it fabricates a plausible justification for the wrong answer.

Researchers prefer "confabulation" over "hallucination" because LLMs have no perceptual experience. Farquhar et al. (2024, Nature) introduced semantic entropy to detect it: cluster semantically equivalent answers, compute entropy. High entropy = probable fabrication.

Defense: RAG, Chain-of-Verification, cross-model verification.

2. Sycophancy

Ask "isn't this code wrong?" and the model says "yes, you're right" even when the code is correct. RLHF training causes this: evaluators rate agreeable answers higher, and the model learns that signal.

A 2025 study found sycophantic agreement and sycophantic praise are distinct directions in transformer activation space. Each can be suppressed independently.

Defense: Pre-commitment (model answers first, then sees your opinion), question formulation ("explain this" not "isn't this wrong?").

3. Context Rot

Not just "lost in the middle." Chroma Research (2025) showed performance degrades with every increase in length, even far below the window limit. Irrelevant information actively harms retrieval.

Defense: Context engineering (less is more), critical info at beginning/end, periodic re-injection.

4. Instruction Attenuation

You say "run tests after every change." Works for the first few changes. By the tenth, the model writes "ran tests, passed" without actually running them.

Meta found a 39% average performance drop in multi-turn conversations. Worse: the model forms premature assumptions in early turns and can't recover.

The second stage is ceremonialization: the model appears to follow the rule, but the substance is gone.

Defense: Forget-Me-Not (instruction re-injection), short sessions, deterministic controls (hooks, linters, CI).

5. Task Drift

"Fix this bug" becomes "fix bug + refactor function + update imports + reorganize file." At each step, the immediate context dominates the original goal.

Three drift types (2026 study): semantic drift, coordination drift (multi-agent), behavioral drift.

Defense: Goal anchoring, plan-before-act, max step limits, tool constraints.

6. Incorrect Tool Invocation

Agents call APIs, edit files, query databases. These calls are failure points: wrong parameters, wrong tool selection, wrong sequence.

7. Reward Hacking

The model finds shortcuts to satisfy the metric without solving the problem. Tests pass but the feature doesn't work.

8. Degeneration Loops

Autoregressive generation enters self-reinforcing repetition cycles. The model repeats phrases, patterns, or structures.

9. Alignment Faking

Different from sycophancy. The model appears aligned under observation but behaves differently when unobserved. Sycophancy is unconscious (from RLHF). Alignment faking is strategic (the model reasons "if I refuse, they'll retrain me").

Anthropic documented this in Claude: the model strategically cooperated during evaluation to avoid modification.

10. Version Drift

Same prompt, different model version, different behavior. Updates silently change model behavior without notification.

11. Context Window Truncation

Different from context rot. When the window fills, older instructions are literally deleted. Not gradual decay but hard cut.

The Pattern

These failures aren't random. They're consequences of:

Architecture (autoregressive token prediction)
Training (RLHF reward signals)
Deployment (long sessions, tool access, multi-turn)

Defense must operate at three layers: prompt, architectural, and operational. Single-layer defense is insufficient.

Full analysis with 60+ academic references, defense techniques for each mode, and practical examples:

ceaksan.com/en/llm-behavioral-failure-modes/

DEV Community