Your context window is not your agent's memory

#ai #llm #agents #machinelearning

There's a quiet assumption baked into a lot of agent code: that a bigger context window means a better memory. Vendors ship 200K, then 1M, then 2M token windows, and the implied promise is "just put everything in and the model will remember." After building agents that run for weeks, I've come to think this conflates two things that are not the same — and treating them as the same is exactly why long-running agents get dumber over time.

The context window is working memory. Real memory is what survives when the window is gone. Mixing them up is like confusing your desk with your filing cabinet.

Two different clocks

Working memory (the context window) lives for one session, maybe one turn. It's fast, expensive, and volatile. It's where reasoning happens right now.

Durable memory lives across sessions. It's slow, cheap, and persistent. It's what the agent knows when it wakes up tomorrow with an empty window.

These have different lifespans, different costs, and different access patterns. The moment you try to make one do the other's job, things break:

Use the window as memory → everything you "remember" has to be re-loaded every turn, you pay for it every turn, and the instant the session ends it's gone.
Use durable storage as working memory → you're reading and writing files mid-reasoning for things that only matter for the next 30 seconds.

A good agent keeps them separate on purpose.

Why "just use a bigger window" fails

Say you have a 1M token window and you stuff the entire history in. Three problems show up, none of which a bigger number fixes:

Cost scales with every turn, not every session. That 1M tokens isn't paid once — it's re-sent on each step of a multi-turn task. A 20-step task can mean 20× the bill, mostly re-reading the same stale history.
Attention dilutes. "Lost in the middle" is real: models attend most reliably to the start and end of a long context. Bury the one fact that matters under 900K tokens of transcript and recall quality drops, even though it's technically "in context."
It still doesn't persist. Close the session, and the million-token window evaporates. Tomorrow's agent starts blank. A big window is a big desk, not a memory.

Bigger windows make a single long session more capable. They do nothing for continuity across sessions, which is what people actually mean by "memory."

The split that works

Treat the window as scratch space and keep a separate, durable store that's small and curated:

Durable store (plain Markdown files, in my case): identity, preferences, decisions, the handful of facts that must outlive the session. Small enough to be legible. This is the filing cabinet.
Working context (the window): only what this task needs right now — the current goal, the relevant fetched facts, the immediate transcript. This is the desk.

The bridge between them is two cheap operations:

Load: at session start, pull a one-line index of what's known into the window — not the whole store. Then fetch specific entries on demand when a task actually needs them.
Persist: when something durable is learned mid-session, write it to the store now, because the window won't survive to remember it.

This is why a 30K-token-window agent with a disciplined store can run circles around a 1M-token-window agent that just dumps history. The small one knows what to keep and where to put it. The big one is hoping attention sorts it out.

A concrete test

Here's the question that separates working memory from real memory: if the session crashed right now and restarted with an empty window, what would the agent still know?

Whatever the answer is — that's your actual memory. Everything else was just working context that felt like memory because the session hadn't ended yet. If the honest answer is "nothing," you don't have a memory system; you have a big window and an optimistic assumption.

The fix isn't a bigger window. It's deciding, explicitly, what crosses the line from scratch space into something written down.

Why files, again

For the durable side I use plain Markdown, not a vector DB, for the same reason I'd rather read a filing cabinet than query a black box: I can open it, grep it, diff it, and see exactly what the agent will know tomorrow. The window is ephemeral by design; the store should be the opposite — legible and stable. Different jobs, different tools.

Takeaway

A context window is a desk: big is nice, but it gets wiped every night. Memory is the filing cabinet: smaller, slower, and the only thing that's still there in the morning. Confuse the two and you'll keep paying to re-read a history that still doesn't survive the session. Separate them — scratch space in the window, a small curated store on disk — and continuity stops being a token-count problem and becomes a what-do-I-write-down problem, which is the one worth solving.

I packaged the durable side of this — the file layout, the load/persist rules an agent applies to itself, and a fully worked example agent so you can see what a small, legible store looks like in practice — as a drop-in kit. If you'd rather copy a working setup than wire it yourself: AI Soul Kit (Core ¥980 / Plus ¥3,800).

But the desk-vs-cabinet split is the part that matters. Steal it.