I once spent half a day debugging code that was completely correct.
The problem wasn't the logic. The problem was that the functions the LLM had written didn't exist.
Not deprecated. Not renamed. Never existed.
Here's what had happened: I caught the model using an outdated API parameter and corrected it. Instead of fixing the issue, it started compensating: hallucinating function names, inventing method signatures, generating plausible-looking code that had no basis in reality. The more I pushed back, the deeper into fiction it went.
That afternoon is why I started doing RAG before the industry had a name for it.
At the time, I was building a Kubernetes Operator using free-tier LLMs (ChatGPT and DeepSeek). No agentic tooling. No memory. No orchestration frameworks. Just a chat window and whatever I could fit into the context.
I had two problems:
Problem 1:
The model didn't know current APIs. Kubernetes controller-runtime, Operator SDK, and Delphix APIs move fast. The model's training data was already stale. Left to its own devices, it would confidently generate code against API versions that no longer existed. When corrected, it would sometimes make things worse.
Problem 2:
The context window ran out. Long sessions degraded. The model would start contradicting earlier decisions, losing track of architecture choices, rehashing solved problems. On a free tier, hitting the limit meant starting over and losing everything.
Here's what I built to solve both:
For the API problem, manual retrieval and injection. Before writing any implementation code for a new component, I would research the relevant documentation myself. Then I'd summarize it (sometimes by hand, sometimes by feeding the raw docs into a separate chat session just for summarization) and inject only the relevant fragments into the working session. Confirmed, current, scoped to exactly what the model needed.
The model wasn't searching. I was the retrieval layer.
For the context problem, session state documents. When a session was getting too long, I'd ask the model to generate a structured Markdown file: current architecture decisions, what had been built, what was left, key constraints and open questions. Then I'd start a fresh session, paste the MD file as context, and continue exactly where I'd left off.
The model wasn't remembering. I was the memory layer.
What I was doing, without knowing it:
Retrieval-Augmented Generation: surfacing accurate, current information and injecting it as context to ground model outputs.
Session state management: the manual precursor to what agent memory systems now handle automatically.
Multi-session LLM chaining: using one model to process and compress information for another, before orchestration frameworks made this trivial.
I didn't invent these patterns. I arrived at them by necessity, the hard way, after a hallucination loop cost me half a day.
That's usually how the best practices emerge.
The tools have improved dramatically since then. But the underlying problem (models that hallucinate on fast-moving APIs, context that degrades over long sessions, outputs that need grounding in verified information) hasn't gone away. It's just more visible now, at scale, in enterprise deployments.
The engineers and companies figuring this out today are rediscovering the same lessons. Usually also the hard way.
Have you hit a hallucination loop that cost you real time? What was your fix?
Top comments (0)