DEV Community

Cover image for How I Stopped Debugging the Same Production Errors Twice Using Hindsight Agent Memory
Atharv Joshi
Atharv Joshi

Posted on

How I Stopped Debugging the Same Production Errors Twice Using Hindsight Agent Memory

Every engineering team has this experience. A production error lands. It looks vaguely familiar. Someone says, "I think we saw this before. " Nobody can find where.
You spend four hours debugging something your senior engineer fixed eight months ago, documented in a postmortem nobody re-read.
That loop is not a tooling failure. It's a memory failure. And I built Deja.dev to fix it.
The Problem Nobody Has Solved
Sentry captures errors. Datadog graphs them. PagerDuty routes alerts. But none of them remember. There's no cross-incident learning, no way to semantically match a new error to one you've already resolved, no system that surfaces "we've seen this before" in an actionable form.
According to Atlassian's State of Incidents report, roughly 35% of production incidents are repeat failures—the same root cause, rediscovered from scratch each time. The knowledge to fix them exists. It just doesn't persist anywhere useful.
Runbooks help, but they're static. They don't update from outcomes. They don't match new errors to relevant past cases. And they definitely don't recall that the last two payment-service timeouts were caused by a vendor API rate limit on the third Tuesday of the month.
That kind of pattern recognition requires memory. Specifically, agent memory.
What Deja.dev? Does
Deja.dev is an AI agent that gives your codebase long-term memory for production errors. When a new error arrives, it doesn't just log it—it queries a persistent memory of every error your team has ever resolved.
Not by exact string match. Semantically. A NullPointerException in UserService today gets matched to a NullPointerException in ProfileService from six months ago if the underlying context is similar enough.
The output isn't a list of links to past tickets. It's a ranked action plan:

"Based on 3 similar past incidents, the most likely cause is connection pool exhaustion. Recommended fix: raise pool_size from 10 to 25. Average resolution time for this pattern: 22 minutes. Confidence: 91%."

That's institutional memory—the kind that normally walks out the door when a senior engineer leaves.
The Architecture
The stack is intentionally lean:
Layer Technology Memory Hindsight by VectorizeLLMGroq—qwen/qwen3-32b Backend Python + FastAPI Frontend React + Tailwind Render.com + Vercel
The memory layer is the entire product. Hindsight handles semantic search over past errors and persists context across sessions. Without it, you'd need to build a vector store, manage embeddings, and handle session persistence yourself. With it, the core pipeline has three functions.
Storing a resolved error:
from hindsight import Hindsight Client

client = HindsightClient() # reads HINDSIGHT_API_KEY from env

def store_error_resolution(error_id, error_msg, stack_trace,
root_cause, fix, service, ttf_minutes):
client.memory.create(
content=f"{error_msg}\n\n{stack_trace}\n\nFix: {fix}",
metadata={
"error_id": error_id,
"service": service,
"root_cause": root_cause,
"fix": fix,
"ttf_minutes": ttf_minutes,
}
)
Finding semantically similar past errors:
def find_similar_errors(error_msg, stack_trace, service, top_k=5):
query = f"service:{service}\n{error_msg}\n{stack_trace[:500]} "
results = client.memory.search(query=query, top_k=top_k)
return [
{
"root_cause": r.metadata.get("root_cause"),
"fix": r.metadata.get("fix"),
"ttf": r.metadata.get("ttf_minutes"),
"score": r.score,
}
for r in results. results
]
Generating a diagnosis from memory context:
python def diagnose(error_msg, stack_trace, service):
past = find_similar_errors(error_msg, stack_trace, service)
context = "\n---\n". join(
f"Root cause: {p['root_cause']}\nFix: {p['fix']} \nScore:{p['score']:.2f}"
for p in past
)
resp = groq.chat.completions. create(
model="qwen/qwen3-32b",
messages=[
{"role": "system", "content": (
"You are Deja.dev, an error resolution agent with memory of past incidents."
f"Past similar incidents:\n{context}"
)},
{"role": "user", "content": f"New error in {service}:\n{error_msg}\n\n{stack_trace}"},
]
)
return resp.choices[0].message.content
The full pipeline runs in under 3 seconds.
The UI That Makes Memory Visible
One of the most important design decisions: don't hide the memory layer.
The frontend is a split panel. Left side: error input and AI diagnosis. Right side: live memory matches as cards showing the original error, root cause, fix applied, confidence score, and date resolved.
Engineers — and hackathon judges — can watch exactly what the agent is retrieving in real time. Each match shows a confidence badge (89%, 74%, 61%). The AI's certainty is part of the interface, not a black box.
A hidden memory layer is unconvincing. If you can't show what the agent knows, it doesn't feel real.
The Inflection Point
The most interesting moment in the system isn't the first interaction. It's the fifth.
With zero memories, the agent says, "No similar errors found. This may be a new failure mode." Accurate, not useful.
After seeding realistic error histories for three services—api-gateway, payment-service, and auth-service—the advice becomes specific, confident, and backed by named precedents from your own system's history.
Watching the agent recall that the last two payment-service timeouts were both caused by a vendor API rate limit and that both were resolved within 18 minutes by adjusting retry configuration—that's what agent memory actually feels like in practice. It's not autocomplete. It's an agent that has been paying attention.
Three Things I'd Tell Anyone Building This
Make the memory visible. The split-panel design is not cosmetic. It's what makes the value tangible. If users can't see what the agent is retrieving, the intelligence feels arbitrary.
Synthetic data needs to be realistic. I generated errors for three fictional services with real-sounding stack traces, actual root causes, and plausible resolution timeframes. The moment your demo data looks fake, the demo feels fake. Spend time on this.
Deploy it before you demo it. A live URL is worth ten screenshots. Anyone who can actually interact with your agent—versus watch a recorded video—will remember it. Render.com + Vercel makes this free and takes under an hour.
What This Is Actually Solving
Institutional knowledge decay is one of the most expensive invisible costs in software engineering. McKinsey estimates engineers spend 40–60% of their time debugging rather than building. Google's SRE research suggests that access to institutional knowledge can reduce mean time to resolution by up to 70%.
Every team experiences this. Nobody has built a good solution for it because the right memory layer didn't exist at the right abstraction level. Hindsight changed that.
If you want to try it yourself, start at hindsight. vectorize.io — use promo code MEMHACK409 for $50 free credits. The GitHub repo for Deja.dev is linked below.
Your codebase has seen this before. Now it remembers.



GitHub: https://github.com/atharv348/DejaDev-MS-Office-Hackathon | Live demo: https://youtu.be/CYEAIYdpzPg?si=NsBqHlyfhRzo-JqT

Top comments (0)