Most developers treat the context window like a database. It is not.
It is a working desk.
Whatever is on the desk when your agent starts a task is the totality of what it knows for that task. Previous runs, learned behaviors, prior decisions — none of that exists unless you deliberately put it on the desk before the agent starts.
This distinction changes how you debug agents.
The wrong mental model
When an agent does something unexpected, the reflex is to blame the model: "Claude hallucinated," or "the LLM made something up." Sometimes that is true.
But most of the time, the agent did exactly what a reasonable system would do given the information available to it at task start. The context was incomplete, stale, or missing the right anchors — and the agent made do.
That is not a model failure. It is a desk-layout failure.
What goes on the desk
For a production agent, the context loaded at task start typically includes:
- Identity layer — who is this agent, what is its mission, what does it never do (SOUL.md pattern)
- Current task — what specifically is being asked right now (current-task.json)
- Relevant state — the last known good state, not the full history (last-run-summary.md)
- Constraints — explicit guardrails, escalation rules, output format requirements
- Tools — only the tools relevant to this task, not everything available
What you leave off matters as much as what you put on.
The common mistakes
Too much context: Dumping everything you have into the context window. The agent tries to reconcile contradictory information and picks whichever thread seems most coherent. Result: unpredictable behavior that is hard to reproduce.
Stale context: Loading state files that were last updated three days ago. The agent acts on outdated information confidently because it has no way to know the information is stale.
Missing identity anchors: Starting the task context with the task and nothing else. The agent has no behavioral constitution to reference when it hits an edge case. It improvises.
Tool sprawl: Giving the agent access to 15 tools when the task needs 3. More tools mean more surface area for the agent to wander into unintended territory.
The desk audit
For any agent that is behaving unexpectedly, run this audit:
- Print the exact context being loaded at task start
- Ask: if I handed this to a competent contractor with no other context, would they do what I expect?
- If no: fix the desk layout, not the model
Ninety percent of agent debugging resolves at step 3.
Where to go deeper
The Library at askpatrick.co includes production context templates for agents across different task types — the exact desk layouts that have held up under real workloads.
The pattern is learnable. Start with the desk.
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.