This isn't a product pitch. I just want to share some real problems I ran into while building persistent memory for an AI agent, and the approach I ended up with. The code is open source — my approach might not be the best one, and I'd love to hear how others are tackling the same problems.
The Problem
When building memory for an agent, the most immediate question is: once you've stored hundreds of memories, how do you make sure the most relevant ones surface during retrieval?
Information has a shelf life. Yesterday's debug log, last week's temporary workaround, last month's architecture decision — they all have very different levels of importance. If every memory is treated equally, retrieval results get flooded with stale noise, and the actually valuable stuff gets buried.
My approach was to give memories a lifecycle — new information starts in an observation period, valuable stuff gets promoted upward, and outdated entries naturally sink to the bottom.
Three-Layer Design: Buffer → Working → Core
I settled on a three-layer structure:
Most ephemeral information ("just ran a test", "build passed") stays in Buffer and naturally sinks. The genuinely valuable stuff floats up over time. No manual curation needed — the system filters on its own.
Decay: Letting Priority Shift Over Time
Decay doesn't delete data. It adjusts retrieval ranking to reflect recency. The longer a memory goes unused, the lower it ranks in search results.
decay_score = importance × e^(−decay_rate × idle_hours / 168)
Buffer: decay_rate = 5.0 → sinks within days of inactivity
Working: decay_rate = 1.0 → takes weeks to noticeably drop
Core: decay_rate = 0.01 → practically permanent
There's one special case — procedural knowledge (deployment steps, coding standards, etc.). These get a decay rate of 0.01 regardless of layer, because process knowledge shouldn't lose priority over time. It doesn't matter if you haven't looked up "how to deploy" in a month — it needs to be there when you need it.
An early mistake I made: applying uniform decay to all memories. The result was that the agent kept losing track of deployment procedures and had to ask again every time. Once I differentiated by memory type, the problem went away.
Repetition = Reinforcement
Human memory has a well-known property: repeated exposure strengthens retention. I mimicked this in the system:
The more often the same knowledge is mentioned, the more "durable" it becomes — higher importance, still ranks well even after decay. This wasn't part of the original design; it was added after noticing in practice that the agent kept failing to recall things I'd told it multiple times.
Retrieval: Semantic + Keyword Hybrid
Storing memories is only half the problem — you also need to find them. Retrieval uses a hybrid strategy:
One real-world gotcha I ran into: short CJK queries produce unreliable embeddings. For example, searching "部署" (deploy) — the embedding model returns nearly identical similarity scores for all Chinese-language memories, making discrimination impossible. The fix was a special case: for short CJK queries, reduce the weight of semantic search and lean harder on keyword matching.
Why SQLite
This might be the most controversial choice, but I think it fits the use case well.
My scenario is single-agent use with hundreds to a few thousand memories. At this scale, SQLite's read/write performance is more than sufficient, and it comes with built-in SQL queries and FTS5 full-text search — no extra dependencies needed.
The end result: the entire system compiles to a single binary, runs directly on any machine, and all data lives in one .db file. Backup is cp. Migration is scp.
Of course, for scenarios with many agents writing concurrently or significantly larger data volumes, the storage choice would need to be reconsidered.
Results So Far
It's been running for a few days now, with 80+ memories distributed across the three layers. Ephemeral information in Buffer typically sinks within hours to a day, while valuable entries gradually promote to Working and Core.
One interesting case: the agent genuinely stops repeating past mistakes — because lesson-type memories are tagged with triggers, and those triggers fire automatically before related operations.
Open Questions
- How to organize memories at scale? 80 entries is manageable; what about 800? I've since built a self-organizing topic tree (k-means clustering), but that's a separate discussion.
- Cross-agent memory sharing — the system supports multiple agents on a single instance via namespace isolation, but how agents could safely share subsets of memory is still an open question.
- Evaluation metrics — how do you quantify "memory quality"? Right now I'm eyeballing logs, which isn't exactly scientific.
Code is on GitHub. Written in Rust, MIT licensed.
If you're working on agent memory too, I'd love to hear from you — especially around how you handle memory lifecycle management.



Top comments (0)