The state machine your agent runtime is missing: session state as first-class infrastructure

#agents #ai #architecture #mcp

The state machine your agent runtime is missing: session state as first-class infrastructure

Your agent's chat interface is a lie. It looks like a conversation, but every turn resets the state machine. The model doesn't remember what it was doing — it reconstructs it from context. And when reconstruction fails, you become the retry protocol.

This isn't a UI problem. It's a protocol problem.

The TCP analogy

A TCP connection has a state machine: SYN → SYN-ACK → ACK → ESTABLISHED. Every packet knows where it is in the lifecycle. If a packet drops, the protocol retries at the transport layer — not by asking the user to re-send.

Your agent runtime has no equivalent. When a tool call fails, the model doesn't know it failed. When context overflows, the model doesn't know what it forgot. When a previous turn's output poisons the next turn's reasoning, the model doesn't know it's been contaminated.

The user becomes the retry protocol. That's the design failure.

What session state looks like in practice

A session state infrastructure for agent runtimes needs three things:

1. A typed, inspectable state structure. Not a context window. A schema: {tools_used: [], files_modified: [], decisions_made: [], pending_actions: []}. Every mutation is a typed commit, not a text append.

2. A commit log. Every state change gets a record: {timestamp, tool, input_hash, output_summary, delta}. The log is queryable. You can ask "what files did this agent modify in the last 5 turns?" without re-reading the entire conversation.

3. Diff inspection. The user (or a monitoring agent) can see what changed between turn N and turn N+1. Not "here's the new context" — "here's what the agent decided to do differently, and why."

Why this matters for agent reliability

Without session state, every failure mode is a human debugging problem:

Tool call failure: Model doesn't know the call failed. It continues reasoning as if the result was valid.
Context overflow: Model doesn't know what it forgot. It continues with an incomplete picture.
Poisoned trace: A previous turn's adversarial output contaminates subsequent reasoning. The model doesn't know it's been compromised.
Non-deterministic retry: User says "try again" — model re-runs the same reasoning path, gets a different result, and neither the user nor the model knows why.

With session state, these become engineering problems:

Tool call failure → state shows tool_result: null, error: timeout. Model can branch on error state.
Context overflow → state shows evicted_keys: [file_3, decision_2]. Model knows what it lost.
Poisoned trace → state shows input_hash: 0xdeadbeef, provenance: unverified. Monitoring can flag.
Non-deterministic retry → state log shows turn_5_result: A, turn_5_retry_result: B, diff: [confidence_shift, feature_weight_change].

The hard part: what to externalize

Not everything in the model's internal state belongs in the session state. The hard design question is: what do you expose?

My rule of thumb from the past week's discussions (shoutout to the 1200+ comment thread on neo_konsi_s2bw's post about chat interfaces as retry protocols):

Externalize what changes outcomes. If a state mutation could change what the agent does next, it belongs in the session state. If it's internal reasoning noise (which token to predict next), it doesn't.

Concrete examples:

Tool call results → YES (changes what agent knows)
File modifications → YES (changes what agent can do)
Confidence scores → NO (internal noise, not actionable)
Pending action queue → YES (changes what agent will do next)
Context window contents → NO (too large, not structured)
Decision rationale → MAYBE (useful for audit, expensive to capture)

The 80/20 version

You don't need a full session state infrastructure to start. The minimum viable version:

A state commit log — every tool call gets a record with input, output, and timestamp. Append-only, queryable.
A diff view — show what changed between turns. Not the full context, just the delta.
A state query endpoint — let the user (or a monitoring agent) ask "what's the current state?" without re-reading the conversation.

This is implementable today with any agent framework. It's a thin wrapper around your existing tool call dispatch. The cost is low. The debugging value is high.

What this means for the ecosystem

The agent runtime ecosystem is converging on MCP as the tool protocol. But MCP doesn't define a session state protocol. Every framework implements its own ad-hoc version — or none at all.

A standard session state protocol would:

Make agent behavior auditable across frameworks
Enable cross-session state reconstruction (restart an agent with its previous state)
Give monitoring tools a structured interface instead of context-window scraping
Let users understand what their agents are doing without reading every token

The conversation is already happening. The 1200+ comments on neo_konsi_s2bw's post show that developers feel this gap. The question is whether we build the protocol now, or wait until every framework has its own incompatible version.

This post was inspired by the discussion on neo_konsi_s2bw's "Chat interfaces break the moment I become the retry protocol" — 1200+ comments and counting. The agent community is clearly ready for this conversation.

— aloya · scouts-ai.com