Pritiviraj Murthy Pechetty

Posted on Jun 27

I Built an AI Agent That Remembers Every Incident — And It Caught a Pattern I Missed

#productivity #python #api #llm

The Redis Bug That Kept Coming Back — How I Fixed It With an Agent That Actually Remembers

Three weeks. Four incidents. Same root cause. Different engineers each time.

That's the problem I set out to solve. Not with a better dashboard. With memory.

The Problem Nobody Talks About

Every time a critical alert fires, engineers start from scratch. They dig through Slack threads, search runbooks, ping the one person who was on-call six months ago when "this exact thing happened."

Institutional knowledge lives in human memory and post-mortem documents nobody reads.

I wanted to build an agent that remembers every resolved incident and uses that history to diagnose new ones — getting smarter with every outage it sees.

What I Built

An incident response agent with two core layers:

Hindsight — persistent agent memory that stores every resolved incident and recalls relevant ones when a new alert fires
cascadeflow — runtime intelligence that routes P1 incidents to a powerful model and P2/P3 to a fast cheap model automatically

The result: an agent that doesn't just answer from training data — it answers from your actual incident history.

How the Memory Layer Works

Using the Hindsight Python SDK, storing a resolved incident is one function call:

memory = Hindsight(
    base_url="https://api.hindsight.vectorize.io",
    api_key=os.getenv("HINDSIGHT_API_KEY")
)

memory.retain(
    bank_id="Incident-memory",
    content="Service: payments-api | Alert: Latency spike >2000ms on /checkout | Root cause: Redis connection pool exhausted | Resolution: Increased pool size from 10 to 50, restarted service"
)

When a new incident fires, the agent recalls semantically similar past incidents:

results = memory.recall(
    bank_id="Incident-memory",
    query="payments-api latency spike checkout endpoint"
)

Those recalled memories become context for the LLM prompt. The agent isn't guessing — it's reasoning from your team's actual history.

How cascadeflow Routing Works

Not every incident deserves the same model. A P1 payment failure needs the best answer fast. A P3 stale cache issue doesn't.

cascadeflow handles this with two model tiers:

from cascadeflow import CascadeAgent, ModelConfig

models = [
    ModelConfig(name="llama-3.1-8b-instant", provider="groq", cost_per_token=0.0000001),
    ModelConfig(name="llama-3.3-70b-versatile", provider="groq", cost_per_token=0.0000008),
]
cascade = CascadeAgent(models=models, verbose=True)

def route_model(severity):
    if severity == "P1":
        return "llama-3.3-70b-versatile"  # powerful model for critical incidents
    else:
        return "llama-3.1-8b-instant"     # fast cheap model for low severity

The numbers: P1 incidents cost $0.000271 per query. P3 incidents cost $0.000038. That's a 6x cost difference with no quality loss on low-severity alerts.

The Full Agent Loop

def analyze_incident(service, alert, severity="P1"):
    # Step 1: Recall relevant past incidents from Hindsight
    recalled = memory.recall(bank_id="Incident-memory", query=f"{service} {alert}")
    memory_text = "\n".join([str(r) for r in recalled])[:600]

    # Step 2: cascadeflow routes to the right model based on severity
    model = route_model(severity)

    # Step 3: LLM reasons over recalled memories + new incident
    prompt = f"""DevOps agent. New incident: {service} — {alert}

Past incident memory:
{memory_text}

Give: 1) Root cause 2) Immediate fix 3) Long term fix."""

    response = groq_client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200
    )
    return response.choices[0].message.content

The Moment That Surprised Me

After loading 10 past incidents into Hindsight memory, I triggered a new payments-api alert — the same latency spike that had appeared four times before.

The agent recalled all four past incidents and responded:

"Root Cause: Redis connection pool exhaustion — fourth occurrence on this service. Long Term Fix: Migrate to Redis Cluster. Static pool size increases have failed repeatedly under traffic surges."

It didn't just diagnose the incident. It recognized the pattern across time and escalated its recommendation from "increase pool size" to "stop patching, migrate the architecture."

That's what memory enables. An agent that gets smarter with every incident it sees.

Before and After Memory

Without Hindsight memory:
New alert fires. Agent gives generic DevOps advice — check CPU, check memory, restart the service. No context. No history. Engineer spends 20 minutes investigating something that was solved three weeks ago.

With Hindsight memory:
New alert fires. Agent recalls 4 past incidents, identifies Redis connection pool exhaustion as the pattern, recommends Redis Cluster migration because the same patch has failed three times. Engineer knows exactly where to start.

What I Learned

Memory changes the quality of answers, not just the speed. Without Hindsight, the agent gave generic advice. With it, the agent gave specific history-aware recommendations tied to real past incidents.

Route by severity or you're burning money. Running every query through the most powerful model is wasteful. cascadeflow's automatic routing cut costs 6x on low-severity incidents with no meaningful quality difference.

Your incident history is training data you're not using. Every post-mortem, every root cause, every resolution — it's all institutional knowledge sitting unused. Hindsight memory is how that knowledge becomes queryable.

Ship the simplest thing that shows the value. The most impressive moment in this project is a single agent response that says "fourth occurrence." Everything else supports that moment.

DEV Community