The paradox of AI memory: remembering everything is easy. Remembering wisely is hard.

#ai #agents #llm #architecture

I've been building a personal AI agent — not a chatbot, a companion. One that knows my projects, preferences, and decisions. That picks up where we left off without me re-explaining everything.

But here's what nobody talks about: naive memory is expensive. And not just in dollars.

Give an agent a massive context window and fill it with everything it's ever seen. More context doesn't mean more understanding — it means more noise. The signal-to-noise ratio collapses. The agent hallucinates connections between unrelated things, loses track of what matters right now, and slows down while becoming less accurate.

Context isn't just a resource — it's a cognitive environment. Pollute it, and your agent gets dumber the more it "knows."

The human brain doesn't work this way. You don't replay every conversation you've ever had before answering a question. You forget most things. That forgetting isn't a bug — it's the architecture.

So I built memory that works more like ours:

Structured extraction over raw storage. Facts are extracted and stored independently. Decisions are recorded with confidence levels, reasoning, and outcomes. Conversations get summarized when they close — the insight survives, the verbatim dies.

Frame-aware budgets. Every interaction gets classified into a cognitive frame — conversation, task, decision, debug, research — each with a different token budget. A casual chat loads 3K tokens of context. A complex decision loads 12K with 3x more past decisions pulled in. The agent doesn't decide how much to remember — the frame does.

Batched retrieval. When the agent needs data from multiple sources, a single embedded script runs all the queries, filters and compresses results, and returns only what matters. Three tool calls that would each dump full results into context become one compact summary.

Aggressive pruning. Tool outputs get automatically trimmed as they age — results over 4K characters are soft-trimmed to the first and last 1,500 characters. After 6 tool calls, old outputs are cleared entirely. The agent never carries dead weight.

Intentional forgetting. Some things are forgotten on purpose.

The result? An agent that knows me across hundreds of conversations while using fewer tokens per turn than a basic chat with no memory at all. That is the idea :)

This is the real challenge in agentic AI. Not making agents that can do things — that's mostly solved. Making agents that can think economically. That carry context without carrying cost. That remember like a trusted colleague, not like a court stenographer.

We're entering an era where your AI's memory architecture matters more than its model. The smartest model with wasteful memory loses to a good model with intelligent recall.

Build agents that remember wisely. Not agents that remember everything.
https://github.com/tfatykhov/nous

P.S Still work in progress, but a lot has been done.