How I Built an AI Agent That Fixes Production Errors Using Memory — And Why Memory Changes Everything

#agents #ai #rag #sre

Production is down. Slack is on fire. Your phone is ringing. You've seen this exact error before — ConnectionResetError: [Errno 104] cascading through your FastAPI worker pool — but you can't remember exactly which Redis configuration tweak fixed it last time, who applied it, or how long the incident lasted. You're starting from zero again. Twenty minutes of context-building before you even touch a fix.
I got tired of that feeling. So I built an AI agent that never forgets.

The Problem With Generic AI in Production
When production breaks, most engineers reach for their LLM of choice and paste in the stack trace. And the response is almost always the same: a competent, thoughtful, completely useless answer. The model has no idea that your team already tried increasing max_connections six weeks ago and it made things worse. It doesn't know that your infrastructure runs on a specific internal Kubernetes setup that changes how standard fixes apply. It gives you textbook advice for textbook problems, and your problems are never textbook.
This is what I started calling the Round 1 problem.
Round 1 — generic response:
Error: ConnectionResetError: [Errno 104] Connection reset by peer
Stack: redis.exceptions.ConnectionError in worker pool
The agent responds with something like: "This typically indicates your Redis connection pool is exhausted. Try increasing max_connections in your Redis client config, add retry logic with exponential backoff, and check network stability between your app and Redis instance."
Technically correct. Practically useless if you've already tried all three. The agent is reasoning from general knowledge, not from your specific production history. It has no memory of your past incidents. Every error feels like the first error.

What I Built: Code Memory's Incident Agent
Code Memory is a developer workspace I built in Next.js with a three-pane interface — a file explorer, a code viewer with syntax highlighting, and a real-time AI fix panel. But the core innovation isn't the UI. It's what happens when the AI agent gets access to Hindsight memory.
The agent stores every incident that passes through it:

Error type and stack trace — the exact fingerprint of the failure
Root cause — what actually caused it, determined after investigation
Fix applied — the exact code change, config update, or command that resolved it
Time to resolve — how long the incident lasted from first alert to fix
Who fixed it — which engineer closed the incident and applied the patch

Over time, this builds up a searchable, structured memory of your team's entire production history. Not documentation that someone had to write. Not a runbook that gets outdated. Live memory, automatically recorded as incidents happen.
The memory layer is powered by Hindsight, an open-source agent memory framework built by Vectorize. You can try it at hindsight.vectorize.io. Hindsight handles the hard parts: semantic search over past incidents, relevance ranking, and structured retrieval that fits inside an LLM's context window cleanly.

Round 2 — What Memory-Powered Response Actually Looks Like
Same error. Same stack trace. But now the agent queries Hindsight before it responds.
Error: ConnectionResetError: [Errno 104] Connection reset by peer
Stack: redis.exceptions.ConnectionError in worker pool
The agent searches past memory and surfaces this:
[INCIDENT — 6 weeks ago]
Error: ConnectionResetError [Errno 104] — redis worker pool
Root cause: celery workers were not releasing connections on task completion
due to a missing connection.close() in the task teardown hook
Fix applied: Added explicit connection release in @task_postrun signal handler

set CELERY_REDIS_MAX_CONNECTIONS=20 (NOT increasing pool size — that made it worse) Resolved by: Priya S. Time to resolve: 34 minutes Now the agent's response is completely different: "Your team hit this exact error 6 weeks ago. The root cause was Celery workers not releasing Redis connections on task completion — not pool exhaustion. Increasing max_connections was tried and made the situation worse. The fix that worked: add an explicit connection release in your @task_postrun signal handler and cap CELERY_REDIS_MAX_CONNECTIONS at 20. Priya fixed it last time — she may have additional context." That's the difference between a Round 1 and a Round 2 agent. Round 1 costs you 45 minutes. Round 2 costs you 3. The memory doesn't just return the fix. It returns the negative space — what not to try — which is often more valuable than the fix itself.

The Frontend: What Engineers Actually See
The workspace I built reflects how engineers actually think during incidents, not how product managers imagine they do.
The left panel is a file explorer with a full project tree — expandable folders, language-coloured file icons for Python, JavaScript, JSX, and JSON files, and a drag-and-drop upload zone at the bottom. You can navigate your entire codebase without leaving the incident view.
The main panel renders your code with a minimal but precise syntax highlighting layer — keywords, string literals, JSX tags, and hook names each get distinct colours, but nothing garish. Line numbers sit in a fixed column to the left. A status bar at the bottom shows the current branch, save state, and language mode. It feels like an editor, not a chatbot wrapper.
The right panel is what I call the Hindsight Memory Log — a vertical timeline of every past AI interaction with the codebase. Each entry shows whether the suggested fix was accepted or rejected, which file it touched, the diff summary with + and − line counts, and how long ago it happened. Engineers can filter by accepted or rejected fixes. This alone changes how teams review AI suggestions — instead of treating each one in isolation, you see the full arc of what the agent has suggested and what your team actually shipped.
The AI Fix Report panel is where the Hindsight retrieval surfaces. Each identified bug renders as a card with the file name, line number, severity badge (high bugs get a subtle red border — visible without being alarming), a natural language description, and a two-panel diff showing the before and after. Three action buttons sit at the bottom of every card: Accept, Reject, and Modify. Accept applies the fix directly. Reject logs it as rejected in memory so the agent learns not to suggest the same approach again. Modify opens an inline editor pre-filled with the suggested fix so engineers can adapt it before accepting.
Every action — accept, reject, modify — feeds back into Hindsight memory. The agent gets smarter with every incident, not just by accumulating more data but by learning what your specific team accepts and rejects.

Why Agent Memory Is the Real Unlock
Most discussions about AI agents focus on tool use — can the agent call APIs, run code, search the web? Tool use matters, but it's table stakes. The real unlock for production-grade agents is memory.
I'd recommend reading Vectorize's breakdown of what agent memory actually means — it distinguishes between in-context memory (what's in the current prompt), external memory (a database the agent can query), and episodic memory (structured records of past interactions). Hindsight implements episodic memory specifically, which is the hardest to build but the most valuable in production settings.
Episodic memory is what makes the difference between an agent that gives good generic advice and an agent that gives your team's advice back to you — distilled from months of incidents, filtered by what actually worked.
The agent I built isn't smarter than a senior DevOps engineer. But with enough Hindsight memory loaded, it starts to approximate the institutional knowledge that senior engineer carries — the fixes that worked, the fixes that backfired, the edge cases specific to your stack.

What's Next
Right now the memory layer stores incidents locally, keyed per project. The next step is connecting it to a real-time alerting pipeline so incidents are captured automatically when they hit the monitoring layer, rather than requiring manual input after the fact. I'm also working on cross-project memory — when two projects share infrastructure components, incidents from one should surface as relevant context for the other.
The frontend is built in Next.js with Tailwind CSS and a FastAPI backend. The memory layer uses Hindsight. Everything else — the fix cards, the timeline, the diff viewer — is wiring those two things together into something engineers actually want to use at 2 AM when production is down.
The goal was never to replace the engineer. It was to make sure they never have to start from zero again.

Code Memory is actively in development. The Hindsight memory framework is open source at github.com/vectorize-io/hindsight.

DEV Community

How I Built an AI Agent That Fixes Production Errors Using Memory — And Why Memory Changes Everything

Top comments (0)