Microsoft Research has released Memora, a memory system for AI agents, along with public code on GitHub. Memora lets agents recall more by storing and searching memories more cleverly—attaching short labels to each memory and searching only those labels, then pulling up the full detail on demand—rather than stuffing entire conversation histories back into the context window every time.
Key facts
- What: Memora keeps the rich detail of a conversation but searches it using tiny six-word labels, cutting the cost of remembering by up to 98 percent. The code is public.
- When: 2026-06-29
- Primary source: read the source
Language models are fundamentally forgetful: each session, they only know what sits in the context window, and once a conversation grows long, early details fall off the edge. Two common fixes both fall short. Stuffing the entire history back in on every turn gets expensive and degrades quality, since models lose track of details buried in a huge wall of text. Aggressively summarizing the past is cheap but throws away the specific details you might need later. You're stuck choosing between remembering everything badly or remembering a blurry sketch.
Memora separates what you store from how you find it. For each memory, it keeps the full rich content—the memory's body—and attaches a tiny label: a six-to-eight-word phrase that captures the gist, plus a few context-aware tags it calls "cue anchors." When the agent searches its memory, it searches only the tiny labels, not the full bodies. Once it finds the right label, it pulls up the full detail behind it.
The analogy is a library card catalog. You don't find a book by speed-reading every volume on the shelves; you flip through the index cards, each a few lines long, until you land on the right one, then go pull the actual book. Memora gives every memory a card. New information on an existing topic can be merged into the card that already covers it, so the system avoids the fragmentation that plagues simpler memory tools, where the same subject ends up scattered across dozens of disconnected entries. A "policy-guided retriever" can also hop from one card to related ones through those cue anchors, chasing a chain of connected memories the way a person follows a train of thought—a more capable cousin of retrieval-augmented generation, the standard technique for letting models look things up.
On benchmarks that test whether an AI can recall facts from long, sprawling conversations, Memora claims a new best score, beating popular memory systems like Mem0 and plain retrieval. The efficiency gains are striking: it cuts token use by up to 98 percent compared with the stuff-everything-in approach, and it stores roughly half as many entries per conversation as Mem0—because merging beats fragmenting. The retriever can be hand-prompted, or trained into a small dedicated model so it runs cheaply.
Durable memory is the missing piece for agents that work alongside you over weeks or months—a coding assistant that remembers your project's history, a workplace tool that accumulates institutional knowledge. Doing that without re-paying for the entire history on every turn is what makes long-term collaboration economically practical, and an open implementation means others can build on it directly.
The honest caveat: "98 percent fewer tokens" is measured against the most wasteful baseline—dumping the full context every time. Against other smart memory systems, the margin is real but much narrower, and memory benchmarks have been a fast-moving, somewhat gameable target where today's record rarely lasts. The good news is that the code is public, so Memora's claims are checkable rather than just announced. For anyone tracking what an AI agent should remember, it's a concrete, testable step rather than another closed black box.
Originally published on Ground Truth, where every claim is checked against the primary source.
Top comments (0)