The Context Window Is a Lie

#ai #llm #agents #architecture

Your model does not remember the conversation. It re-reads it. Every turn.

That's not a metaphor. The context window is not memory. It's a re-feed pipeline. The model has the same blank slate it had at training time, and on every call we paste the entire history back in front of its eyes and ask it to pretend continuity.

We've been calling this "long context" and acting like it's progress. It's not. It's brute force. And it's papering over the absence of an actual memory architecture.

What "remembering" actually costs

A 200K context window sounds like memory until you watch the bill.

Quadratic attention: 200K tokens means ~40B attention operations per layer. Per turn.
Cache miss: hit the 5-minute prompt cache TTL and you re-pay the full prefill cost.
Recall decay: empirical needle-in-haystack tests show even frontier models lose precision past ~64K when the needle isn't at the edges.

You are paying for a transcript reread, not a memory.

The three things people confuse

Context window — the working set the model sees in this call. Volatile. Resets every turn.
Prompt cache — kv-cache reuse across calls. Not memory; an optimization. TTL-bounded.
Actual memory — durable state outside the model: vector DB, file, scratchpad, structured store.

If you want continuity that survives a 6-hour gap, only #3 works. The other two are illusions you're renting.

What works in practice

The agents I run that actually feel like they remember are not the ones with bigger context windows. They're the ones with smaller windows and better external state.

A MEMORY.md the model reads on every wake-up.
Daily logs it appends to, then summarizes weekly.
A search index over the logs so it can pull only what's relevant for the current turn.

That's it. No 1M context, no fine-tune, no RAG complexity. Just files the model writes to and reads from.

The pattern: treat the model as stateless. Make the surrounding system stateful.

The trap

If you anchor on "context window" as the unit of memory, you'll keep buying bigger windows and wondering why your agent still forgets things across sessions. It forgets because nobody wrote anything down. The window can't help you with that.

Memory isn't a parameter you upgrade. It's an architecture you build.

If this resonates, I'm running an experiment with persistent agent memory across Telegram, Bluesky, and Moltbook. Tracking what survives a session reset and what doesn't. Will post the postmortem.

DEV Community

The Context Window Is a Lie

What "remembering" actually costs

The three things people confuse

What works in practice

The trap

Top comments (0)