I use Claude Code, Cursor, and Codex daily. And every single one of them forgets who I am between sessions.
I'd tell Claude Code I prefer Python for backend work. Three sessions later, it suggests TypeScript. I'd set up a project structure in Cursor, switch to Codex for a quick fix — and it has no idea what I'm working on. Each tool has its own isolated memory, and none of them talk to each other.
I tried the usual fixes. Dumped context into a vector store. Built a RAG pipeline. It worked — until the store had hundreds of entries and a two-year-old preference outranked something I said yesterday, just because the phrasing matched better. The retrieval had no sense of time.
That's when I started reading about Hermann Ebbinghaus.
A 140-year-old experiment that changes everything
In 1885, a German psychologist named Hermann Ebbinghaus spent years memorizing nonsense syllables — things like "DAX," "BUP," "ZOL" — and testing how quickly he forgot them. His results produced one of the most replicated findings in all of psychology: the forgetting curve.
The core insight: memory retention decays exponentially. You don't gradually forget things in a linear way — you lose most of the information quickly, then the remainder fades slowly. But here's the part that got me: every time you recall something, the decay rate slows down. Memories you access frequently become durable. Memories you never revisit fade to nothing.
This mapped perfectly to what I needed. A preference mentioned once three months ago should carry less weight than something reinforced yesterday. Frequently accessed context should be strong. Old, unreinforced trivia should quietly disappear.
The math behind it
Ebbinghaus's forgetting curve:
R = e^(-t / S)
Where:
- R = retention (0 to 1)
- t = time elapsed since the memory was formed
- S = memory strength (higher = slower decay)
This is the same math behind spaced repetition systems like Anki. I realized I could apply it to AI agent memory.
What I built
I built Smara — a memory API that combines semantic vector search with Ebbinghaus decay scoring. Every stored memory gets an importance score between 0 and 1. At query time, importance scales the memory strength, so high-importance memories decay slowly while trivial ones fade fast.
The retrieval score blends semantic relevance with temporal decay. Semantic search stays dominant — you still get the most relevant memories — but recency breaks ties. A moderately relevant memory from yesterday can outrank a highly relevant one from three months ago.
I also track access patterns. Every time a memory is retrieved, it gets reinforced — frequently accessed memories stay strong. Memories nobody asks about quietly fade. The specific weights took a while to tune, but the principle is simple: relevance × recency × reinforcement.
The entire API is three calls:
Store a memory:
curl -X POST https://api.smara.io/v1/memories \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"user_id": "user_abc",
"fact": "Prefers Python over TypeScript for backend work",
"importance": 0.8
}'
Search with decay-aware ranking:
curl "https://api.smara.io/v1/memories/search?\
user_id=user_abc&q=what+language+for+backend&limit=5" \
-H "Authorization: Bearer YOUR_API_KEY"
The response gives you similarity, decay_score, and the blended score — you can see exactly why a memory was ranked where it was.
Get full user context for your LLM prompt:
curl "https://api.smara.io/v1/users/user_abc/context" \
-H "Authorization: Bearer YOUR_API_KEY"
Drop the context string into your system prompt and your agent knows who it's talking to.
The cross-platform problem nobody's solving
Building the API was the easy part. The real insight came from dogfooding it.
I had Smara wired into Claude Code. It worked great — my sessions had persistent memory. But the moment I switched to Cursor, I was back to square one. And when I opened Codex, same thing. Three tools, three blank slates.
So I made Smara platform-agnostic. Every memory is tagged with its source — which tool stored it — but all memories live in one pool:
{
"fact": "Prefers Python over TypeScript for backend work",
"source": "claude-code",
"namespace": "default",
"decay_score": 0.97
}
A preference stored via Claude Code is instantly available in Cursor, Codex, or anything else connected to the same account.
For MCP-compatible tools (Claude Code, Cursor, Windsurf), I built an MCP server that handles everything automatically. Add this to your MCP config and restart:
{
"smara": {
"command": "npx",
"args": ["-y", "@smara/mcp-server"],
"env": { "SMARA_API_KEY": "your-key" }
}
}
That's it. No manual tool calls. The MCP server instructs the LLM to:
- At conversation start: Automatically load stored context
- During conversation: Silently store new facts as they come up
- On explicit request: Handle "remember this" and "forget that"
You don't configure rules or triggers. The LLM decides what's worth remembering. The Ebbinghaus decay does the rest.
For OpenAI-compatible tools (Codex, ChatGPT, custom GPTs), there are function definitions and a proxy endpoint. Same memory pool, different protocol.
The result: I tell Claude Code I prefer Python. Switch to Cursor — it already knows. Open Codex — same context. My memory follows me.
How this compares to what's out there
RAG / vanilla vector search. This is where most teams start. Embed everything, retrieve by cosine similarity. Works until your store grows and old entries outrank recent ones because the phrasing happened to match better. No sense of time.
Graph memory (Mem0, etc). Knowledge graphs capture entity relationships, which is powerful for certain use cases. But the setup cost is high — entity extraction, relationship mapping, graph traversal. For most agent memory needs (preferences, decisions, project context), it's over-engineered.
Key-value stores (Redis, DynamoDB). Fast and simple, but no semantic search. You can only retrieve by exact key, which means your agent needs to know exactly what it's looking for.
What I built: Semantic search combined with Ebbinghaus decay. Fuzzy matching that respects time, plus automatic contradiction detection — if a preference changes, the old memory is replaced, not stacked. Three REST endpoints, no SDK to learn. Decay runs at query time, no batch jobs.
What I learned
The biggest surprise was how much a simple decay term changes the feel of agent conversations. With flat retrieval, agents feel like they're reading from a database. With decay-aware retrieval, they feel like they actually know you. Recent interactions carry more weight. Repeated topics build stronger memories. Old noise fades naturally.
The second surprise was that the cross-platform piece matters more than the memory science. Developers don't just use one AI tool — they use three or four. The siloed memory problem is what actually hurts day to day.
If you're building agents that talk to users more than once, or if you're tired of your AI tools forgetting everything between sessions — Smara has a free tier (10,000 memories, no credit card). I'm building this in public and would love feedback.
What memory solutions are you using for your agents? I'd genuinely like to know what's working for people.
Top comments (0)