Today we are entering the era of context engineering, and this will probably become the most important discipline in AI-powered software.
When lar...
For further actions, you may consider blocking this person and/or reporting abuse
This nails it. I've been building AI agent systems and the "retrieval is not memory" framing is exactly right. RAG gives you search, not cognition.
The layered memory model you describe is what actually works in practice - we use ephemeral conversation buffers, persistent user/domain stores, and compressed episodic summaries that decay over time. The forgetting problem is genuinely harder than remembering.
One thing I'd add: the economic moat is real. Two teams using the same base model can deliver wildly different products based purely on context architecture. The model is commoditized - the orchestration is the product.
Great framing of context engineering as backend engineering reborn. That's exactly how it feels building these systems day to day.
Thanks Chovy. Completely agree: once base models converge, context orchestration becomes the real differentiator. Same model, radically different cognition depending on memory layering and decay design...And you’re right in that forgetting is the unsolved frontier...Relevance over time is harder than recall.
In many ways we are just building better minds around smarter models!
First of all thanks for this great article. I really enjoyed reading this.
My pleasure Sanjay 🙏🏻
"Retrieval is not memory" is the most important sentence in this post.
I've been attacking this from the data layer. The layered memory model you describe — short-term buffers, persistent domain stores, compressed episodic summaries — is exactly right, but it assumes the underlying data is already structured well enough to layer. Most of it isn't.
What I built: Compressed Knowledge Graph files in
.mdformat. Not RAG chunks. Not documents. Actual graph-structured files built from public sources — SEC EDGAR, USPTO, GDELT, federal spending data — where entities, relationships, and temporal context are already encoded before they ever touch a model.The result is that the "decide what to inject and when" problem gets simpler because the file itself carries the hierarchy. An agent reading it doesn't need a supervisor pruning noise — the noise isn't there.
Tested on 1,000 databases. 10x fewer tokens, 170x reasoning density. Happy to share examples if useful — I'm specifically looking for teams working on the memory architecture problem in legal, pharma, or finance where the public data is rich and the signal extraction problem is hardest.