DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

My agent kept hitting context limits. This one function fixed it.

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

My Hermes research agent was failing after about 40 turns. The cause: conversation history growing past the context window. The fix everyone reaches for is "just drop old messages" — but if you drop a tool_use without its matching tool_result, Anthropic's API rejects the whole request.

I needed something smarter. That's agent-message-trim.

One call

from agent_message_trim import trim_messages

result = trim_messages(messages, max_tokens=4000)

# Send result.messages to the model — it's safe.
response = client.messages.create(
    model="claude-sonnet-4-5",
    messages=result.messages,
    ...
)

print(f"Dropped {result.dropped_count} messages to fit")
Enter fullscreen mode Exit fullscreen mode

Tool pair safety

This is the part that matters. If your history looks like this:

[
    {"role": "user", "content": "search for X"},
    {"role": "assistant", "content": [{"type": "tool_use", "id": "call_001", ...}]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "call_001", ...}]},
    {"role": "assistant", "content": "Here is what I found."},
]
Enter fullscreen mode Exit fullscreen mode

trim_messages never drops the tool_use without also dropping its tool_result. They move as a unit. The conversation you get back is always API-valid.

Keep your system prompt

result = trim_messages(messages, max_tokens=4000, keep_system=True)
# system-role messages are pinned — never dropped, not counted toward drop candidates
Enter fullscreen mode Exit fullscreen mode

Two strategies

# Default: drop from the front (oldest messages go first)
result = trim_messages(messages, max_tokens=4000, strategy="drop_oldest")

# Keep first + last, remove from the middle
result = trim_messages(messages, max_tokens=4000, strategy="drop_middle")
Enter fullscreen mode Exit fullscreen mode

drop_middle is useful when you want to keep the original task context AND the most recent exchange, but can sacrifice the middle of a long conversation.

Custom token counter

The built-in estimator is max(1, (len(text)+3)//4). Plug in your own:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")

result = trim_messages(
    messages,
    max_tokens=4000,
    count_tokens=lambda text: len(enc.encode(text)),
)
Enter fullscreen mode Exit fullscreen mode

TrimResult tells you what happened

result = trim_messages(messages, max_tokens=4000)
result.messages        # trimmed list
result.token_count     # estimated tokens used
result.original_count  # how many messages came in
result.dropped_count   # how many were removed
result.ok              # True if nothing was dropped
result.kept_count      # len(result.messages)
Enter fullscreen mode Exit fullscreen mode

Just want the list?

from agent_message_trim import trim_to_fit

trimmed = trim_to_fit(messages, max_tokens=4000)
# returns the list directly
Enter fullscreen mode Exit fullscreen mode

Zero dependencies

Standard library only: json, dataclasses. Nothing else.

pip install agent-message-trim
Enter fullscreen mode Exit fullscreen mode

Repo: https://github.com/MukundaKatta/agent-message-trim

Top comments (0)