Amudhan M

Posted on Apr 13

How Hindsight caught a vendor bug logs missed

#ai #programming #microsoft #devops

I built Kairo, then realized memory was the hard part

The first version of Kairo worked on day one.

It could send messages, call tools, search the web, even control
Spotify. From the outside, it looked like a complete system.

By day three, it was already breaking in ways that were hard to explain.

Not crashing. Not throwing errors. Just... getting worse over time.

What Kairo actually is

Kairo is a Telegram-based agent that connects to real tools.

It's not just chat. It can: - Read and send emails\

Control Spotify\
Query Notion\
Search the web\
Set reminders

All of that is wired through tool modules, so the agent can call
functions instead of just generating text.

The structure is pretty clean:

src/ ├── index.ts ├── conversation/ ├── gmail/ ├── spotify/ ├── notion/
├── productivity/

Each module exposes capabilities. The agent decides what to call.

On paper, it's straightforward.

In practice, everything depends on how you handle context.

The problem I didn't expect

The agent didn't fail loudly.

It failed subtly.

It forgot what the user asked 5 minutes ago\
It repeated actions\
It called the wrong tool\
It lost track of conversations

Nothing "broke." It just stopped being reliable.

At first, I thought this was a prompting issue.

It wasn't.

The real issue was simple:

The system had no real memory.

Why transcripts don't scale

Kairo stores conversation history with summarization.

That sounds fine until you run it long enough.

Summarization introduces drift.

After a while: - Important details disappear\

Context gets distorted\
The agent starts guessing

You end up with something that looks like memory, but behaves like
compression.

Bringing in Hindsight

I needed a way to give my agent memory that survives beyond a single
conversation.

I used Hindsight to store structured events instead of raw chat.

What I changed

Instead of storing messages, I started storing events.

Example:

intent: send_email\
tool_used: gmail.send\
result: success

Now the system remembers behavior, not just text.

How this changed Kairo

When a new request comes in:

Process request\
Search memory for similar actions\
Adapt behavior

This made responses more consistent.

A concrete example

Before:

User: "Send the same update I sent yesterday"

Agent: doesn't remember, asks again.

After:

Agent finds past event and reuses it.

Lessons learned

More context doesn't fix bad memory\
Summarization is lossy\
Tools increase need for memory\
Consistency is harder than intelligence\
Memory > prompt engineering

Closing

The hard part wasn't building the agent.

It was making it remember.

Once I fixed that, everything else became simpler.

DEV Community