Mukunda Rao Katta

Posted on May 25

agent-message-window: a sliding context window that never breaks a tool pair

#hermeschallenge #ai #python #agents

I was trimming a long conversation history to stay under the context window limit. Simple enough. Slice off the oldest messages, keep the tail, send the request. I had done it dozens of times.

The API returned a 400.

Error: tool_result references unknown tool_use_id "toolu_01Xz..."

I stared at it. The messages looked fine in my head. I printed them out. There was a tool_result at index 3, referencing a tool_use that I had just sliced off at index 2. The trim happened to land exactly on the boundary between a tool call and its result.

Half an hour later I had figured out what went wrong, why, and what naive truncation does wrong. The API message was technically correct. The tool result referenced an ID that no longer existed in the list I sent. I had made that ID disappear with a one-liner slice.

I went looking for a library that handled this. I did not find one that specifically enforced the pairing invariant. So I wrote agent-message-window.

The shape of the fix

The invariant is simple: a tool_use message and its corresponding tool_result message are a pair. You can keep both or drop both. Keeping one without the other is invalid for Anthropic's API and a confusing ghost for the LLM anyway.

MessageWindow wraps a list of messages and enforces this on every trim.

from agent_message_window import MessageWindow

window = MessageWindow(messages)

# trim to a fixed count
trimmed = window.trim_to(20)

# trim to a token budget (pass your own counter)
trimmed = window.trim_to_tokens(4096, counter_fn=len_in_tokens)

It always removes from the oldest end first. When it encounters a tool_use, it looks forward for the matching tool_result and removes both together. A half-pair is never left behind.

Here is what that looks like with a concrete example:

from agent_message_window import MessageWindow, WindowTooSmall

messages = [
    {"role": "user", "content": "What files are in /tmp?"},
    {"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_01", "name": "list_files", "input": {}}]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_01", "content": "[\"a.txt\", \"b.txt\"]"}]},
    {"role": "assistant", "content": "There are two files: a.txt and b.txt."},
    {"role": "user", "content": "Which is larger?"},
]

window = MessageWindow(messages)
trimmed = window.trim_to(3)
# result: last 3 messages, no broken pairs
print(len(trimmed))  # 3

If the window is too small to fit even one complete pair, it raises instead of silently truncating:

try:
    trimmed = window.trim_to(1)  # one message cannot hold a pair
except WindowTooSmall as e:
    print(e)
    # "Window size 1 is too small to hold the smallest tool pair (2 messages)"

What it does NOT do

No tokenization. trim_to_tokens accepts your counter function. Bring your own tokenizer.
No message merging or summarization. It removes messages, not condenses them.
No awareness of system prompts or special roles beyond tool_use/tool_result. You manage those yourself.
No async support needed. It is pure in-memory list manipulation.

Inside the lib: WindowTooSmall raises instead of silently truncating

This was the most deliberate design decision.

A truncated tool exchange is worse than no tool exchange. If you silently drop a tool_use because the window was too tight, the model sees a tool_result with no context for why it was called. The model may hallucinate an explanation, or worse, build on corrupted reasoning.

So WindowTooSmall is not an edge case you work around. It is a signal that your window budget is configured too small for your tool call patterns. Better to surface that loudly than pass a subtly broken message list to the API.

In practice, this means you handle it at the call site:

try:
    trimmed = window.trim_to_tokens(2048, counter_fn=count_tokens)
except WindowTooSmall:
    # widen the budget, summarize older messages, or skip the trimming step
    trimmed = messages  # fallback: send everything and let the API cap it

The exception tells you the minimum window size required for the smallest pair in the list. You can use that to set a floor on your budget.

When this is useful

Any agent loop that trims context before sending to Anthropic (or another API with the same pairing rule).
Tool-heavy agents where a single turn can involve 5 to 10 tool calls. The longer the tool chain, the higher the chance a naive slice lands inside a pair.
Agents with a fixed token budget where you want to be exact without spending a lot of code on the pairing logic.
Tests. The 18 tests cover boundary conditions that are annoying to write by hand: pairs at the start, pairs at the end, back-to-back pairs, pairs inside a longer conversation, and the WindowTooSmall raise path.
Pipelines where you compose multiple context-management tools. agent-message-window does one thing and composes cleanly with token counters, stop conditions, and content builders.

When NOT to use it

If your agent never uses tools. Plain text conversations have no pairing constraint, so this adds nothing.
If you want to summarize or compress old context. This lib only removes. If you want a summarizing compressor, you need a different approach. Summarization requires an LLM call, which is a separate concern.
If you already have a library doing this. agentfit (see siblings below) has a more opinionated multi-strategy trimmer that may cover this use case already.
If your target API does not enforce the pairing rule. OpenAI's chat API is more permissive about dangling tool messages in some configurations. You can still use the lib, but it will not prevent an error that would not have happened anyway.

Install

pip install agent-message-window

Zero dependencies. Python 3.9+.

GitHub: MukundaKatta/agent-message-window

Siblings

These libraries fit naturally alongside agent-message-window in a message-pipeline stack.

Lib	Boundary	Repo
agentfit	Token-aware message fitting with multiple trim strategies, more opinionated	MukundaKatta/agentfit
prompt-token-counter	Count tokens in a message list before you trim	MukundaKatta/prompt-token-counter
llm-content-blocks	Build the content block dicts that go inside each message	MukundaKatta/llm-content-blocks
llm-stop-conditions	Decide when to stop the agent loop before the window fills up	MukundaKatta/llm-stop-conditions

A common pipeline: use llm-stop-conditions to detect when you are approaching a limit, call MessageWindow.trim_to_tokens with a count from prompt-token-counter, and build the next message with llm-content-blocks.

What is next

A few things would make this more useful:

A trim_to_pairs(n) method that keeps the last n complete tool exchanges, not just messages. Useful when you want to preserve context by exchange count.
A dry-run mode that returns what would be trimmed without mutating or creating a new list. Helps with logging and debugging.
Support for multi-part tool exchanges where a single tool_use spawns multiple parallel tool_result messages. This is less common but some APIs allow it.

For now the core invariant is covered. Never drop half a pair.

This is entry 30 in the Hermes Agent Challenge series, one library per day covering the unglamorous parts of building reliable agents.

DEV Community