I built Kairo, then realized memory was the hard part
The first version of Kairo worked on day one.
It could send messages, call tools, search the web, even control
Spotify. From the outside, it looked like a complete system.
By day three, it was already breaking in ways that were hard to explain.
Not crashing. Not throwing errors. Just... getting worse over time.
What Kairo actually is
Kairo is a Telegram-based agent that connects to real tools.
It's not just chat. It can: - Read and send emails\
- Control Spotify\
- Query Notion\
- Search the web\
- Set reminders
All of that is wired through tool modules, so the agent can call
functions instead of just generating text.
The structure is pretty clean:
src/ ├── index.ts ├── conversation/ ├── gmail/ ├── spotify/ ├── notion/
├── productivity/
Each module exposes capabilities. The agent decides what to call.
On paper, it's straightforward.
In practice, everything depends on how you handle context.
The problem I didn't expect
The agent didn't fail loudly.
It failed subtly.
- It forgot what the user asked 5 minutes ago\
- It repeated actions\
- It called the wrong tool\
- It lost track of conversations
Nothing "broke." It just stopped being reliable.
At first, I thought this was a prompting issue.
It wasn't.
The real issue was simple:
The system had no real memory.
Why transcripts don't scale
Kairo stores conversation history with summarization.
That sounds fine until you run it long enough.
Summarization introduces drift.
After a while: - Important details disappear\
- Context gets distorted\
- The agent starts guessing
You end up with something that looks like memory, but behaves like
compression.
Bringing in Hindsight
I needed a way to give my agent memory that survives beyond a single
conversation.
I used Hindsight to store structured events instead of raw chat.
What I changed
Instead of storing messages, I started storing events.
Example:
intent: send_email\
tool_used: gmail.send\
result: success
Now the system remembers behavior, not just text.
How this changed Kairo
When a new request comes in:
- Process request\
- Search memory for similar actions\
- Adapt behavior
This made responses more consistent.
A concrete example
Before:
User: "Send the same update I sent yesterday"
Agent: doesn't remember, asks again.
After:
Agent finds past event and reuses it.
Lessons learned
- More context doesn't fix bad memory\
- Summarization is lossy\
- Tools increase need for memory\
- Consistency is harder than intelligence\
- Memory > prompt engineering
Closing
The hard part wasn't building the agent.
It was making it remember.
Once I fixed that, everything else became simpler.
Top comments (0)