DEV Community: Mukunda Rao Katta

Context window exceeded at turn 23. Here's how I track token usage without a tokenizer.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:14 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes agent's context window fills up gradually. Each turn adds messages. Tool call results add more. After 23 turns, the API returned a context length error — with no warning, no graceful handling, just an exception in the middle of a synthesis step.

My Hermes agent spent $3 before I noticed. Now it can't.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:13 +0000

This is a submission for the Hermes Agent Challenge.

I ran a Hermes research agent across 50 literature review tasks and forgot to check the bill until the next morning. Three dollars gone. The agent had retried a bunch of failed web searches and each retry cost money.

The fix is obvious in hindsight: track cost as you go and stop when you hit the limit. That's agent-cost-guard.

Raise when the limit hits

from agent_cost_guard import CostGuard

guard = CostGuard(limit_usd=1.00)

# Inside your agent loop:
response = client.messages.create(model="claude-sonnet-4-5", ...)
cost = calculate_cost(response.usage)
guard.add(cost, label="research_turn")  # raises CostLimitExceeded if over $1

Warn before it hits

def on_warn(w):
    log.warning(f"Cost at {w.pct_used:.0%} — ${w.total_usd:.4f} of ${w.limit_usd:.4f}")

guard = CostGuard(
    limit_usd=1.00,
    warn_at=[0.5, 0.8],
    on_warn=on_warn,
)

The callback fires once per threshold and never again unless you call guard.reset().

Track by label

guard.add(0.05, label="web_search")
guard.add(0.12, label="llm_synthesis")
guard.add(0.03, label="web_search")

s = guard.summary()
print(s.by_label)
# {"web_search": 0.08, "llm_synthesis": 0.12}

Now you know where the money went.

Keep going past the limit

guard = CostGuard(limit_usd=0.50, stop_on_limit=False)
guard.add(1.00)  # no exception
print(guard.ok)         # False
print(guard.remaining_usd)  # -0.50

Useful for logging-only mode when you want to measure but not block.

Check manually

guard.check()  # raises CostLimitExceeded if total > limit

Call it at checkpoints rather than after every single add.

Summary report

s = guard.summary()
print(str(s))
# Cost: $0.20 / $1.00 (20.0% used)
# Entries: 3
# Breakdown:
#   llm_synthesis: $0.12
#   web_search: $0.08

Factory with sensible defaults

from agent_cost_guard import make_cost_guard

guard = make_cost_guard(limit_usd=1.00, on_warn=on_warn)
# warn_at defaults to [0.5, 0.8]

Zero dependencies

Standard library only: dataclasses, time. Nothing else.

pip install agent-cost-guard

Repo: https://github.com/MukundaKatta/agent-cost-guard

My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:12 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes research agent's stop logic had grown into a 40-line if/elif block. Stop after 20 turns. Stop if cost exceeds $2. Stop if the response contains "FINAL ANSWER". Stop if the last tool called was "write_summary". Each condition was written out longhand, tested independently, and hard to reuse across different agents.

I extracted the pattern into agent-loop-stop.

Three lines replace forty

from agent_loop_stop import any_of, after_n_turns, cost_exceeds, response_contains

stopper = any_of(
    after_n_turns(20),
    cost_exceeds(2.00),
    response_contains("FINAL ANSWER"),
)

for turn in range(1, 100):
    response = call_llm(messages)
    state = {"turn": turn, "cost_usd": running_cost, "response": response.text}
    if stopper.check(state):
        break

That's it. stopper.check(state) returns True when any condition fires. The state dict can have whatever you want in it — built-in conditions read well-known keys.

All built-in conditions

after_n_turns(20)                           # state["turn"] >= 20
cost_exceeds(2.00)                          # state["cost_usd"] > 2.00
response_contains("FINAL ANSWER")          # case-insensitive substring
last_tool_was("write_summary")             # state["last_tool"] == name
custom(lambda s: s.get("retries") > 3)    # any callable
always()                                   # always True (testing)
never()                                    # always False (placeholder)

Compose with operators

# Stop when either fires
c = after_n_turns(20) | cost_exceeds(1.00)

# Stop only when both fire
c = after_n_turns(10) & cost_exceeds(0.50)

# Invert
c = ~response_contains("continue")

Or use the function form:

any_of(after_n_turns(20), cost_exceeds(2.00), response_contains("done"))
all_of(after_n_turns(5), cost_exceeds(0.25))
negate(response_contains("error"))

Both styles work identically. The operator form is more concise; the function form is more explicit about what's happening.

Diagnostic: which condition fired?

from agent_loop_stop import check_all

result = check_all(
    state,
    {
        "turn_limit": after_n_turns(20),
        "cost_limit": cost_exceeds(2.00),
        "done_signal": response_contains("FINAL ANSWER"),
    },
)

if result.stopped:
    log.info(f"Agent stopped. Reason(s): {result.triggered}")
    # ["turn_limit"] or ["done_signal"] or ["cost_limit", "done_signal"]

check_all checks every named condition individually and returns a StopResult with which ones triggered. This is what I log in my Hermes agent — if I see "turn_limit" fired instead of "done_signal", that means the agent ran out of turns without finishing.

Custom conditions

c = custom(lambda s: len(s.get("tool_calls_this_turn", [])) > 5)

Or subclass for reusable predicates:

from agent_loop_stop import StopCondition

class TokenBudgetStop(StopCondition):
    def __init__(self, limit: int):
        self._limit = limit

    def check(self, state):
        return state.get("tokens_used", 0) > self._limit

stopper = any_of(after_n_turns(20), TokenBudgetStop(4000))

Per-agent stop configs

Different Hermes agents have different stop requirements:

SUPERVISOR_STOP = any_of(
    after_n_turns(50),
    cost_exceeds(5.00),
    response_contains("SYNTHESIS COMPLETE"),
)

WORKER_STOP = any_of(
    after_n_turns(15),
    cost_exceeds(0.50),
    last_tool_was("submit_findings"),
)

Named, reusable, composable. Each agent gets its own stop config that says exactly what it means.

Zero dependencies

Standard library only: dataclasses, typing. No third-party packages.

pip install agent-loop-stop

Repo: https://github.com/MukundaKatta/agent-loop-stop

My agent kept hitting context limits. This one function fixed it.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:11 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes research agent was failing after about 40 turns. The cause: conversation history growing past the context window. The fix everyone reaches for is "just drop old messages" — but if you drop a tool_use without its matching tool_result, Anthropic's API rejects the whole request.

I needed something smarter. That's agent-message-trim.

One call

from agent_message_trim import trim_messages

result = trim_messages(messages, max_tokens=4000)

# Send result.messages to the model — it's safe.
response = client.messages.create(
    model="claude-sonnet-4-5",
    messages=result.messages,
    ...
)

print(f"Dropped {result.dropped_count} messages to fit")

Tool pair safety

This is the part that matters. If your history looks like this:

[
    {"role": "user", "content": "search for X"},
    {"role": "assistant", "content": [{"type": "tool_use", "id": "call_001", ...}]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "call_001", ...}]},
    {"role": "assistant", "content": "Here is what I found."},
]

trim_messages never drops the tool_use without also dropping its tool_result. They move as a unit. The conversation you get back is always API-valid.

Keep your system prompt

result = trim_messages(messages, max_tokens=4000, keep_system=True)
# system-role messages are pinned — never dropped, not counted toward drop candidates

Two strategies

# Default: drop from the front (oldest messages go first)
result = trim_messages(messages, max_tokens=4000, strategy="drop_oldest")

# Keep first + last, remove from the middle
result = trim_messages(messages, max_tokens=4000, strategy="drop_middle")

drop_middle is useful when you want to keep the original task context AND the most recent exchange, but can sacrifice the middle of a long conversation.

Custom token counter

The built-in estimator is max(1, (len(text)+3)//4). Plug in your own:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")

result = trim_messages(
    messages,
    max_tokens=4000,
    count_tokens=lambda text: len(enc.encode(text)),
)

TrimResult tells you what happened

result = trim_messages(messages, max_tokens=4000)
result.messages        # trimmed list
result.token_count     # estimated tokens used
result.original_count  # how many messages came in
result.dropped_count   # how many were removed
result.ok              # True if nothing was dropped
result.kept_count      # len(result.messages)

Just want the list?

from agent_message_trim import trim_to_fit

trimmed = trim_to_fit(messages, max_tokens=4000)
# returns the list directly

Zero dependencies

Standard library only: json, dataclasses. Nothing else.

pip install agent-message-trim

Repo: https://github.com/MukundaKatta/agent-message-trim

Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:10 +0000

This is a submission for the Hermes Agent Challenge.

I built a Hermes agent last week that takes a customer support email, decides whether it needs a refund, and either issues one or escalates to a human. Standard stuff. The agent worked. The problem started the moment I turned on audit logging.

Every run wrote a JSONL row to disk. Every row contained the full inbound message, the tool calls, the tool outputs, and the final reply. Within an hour the log had:

41 customer email addresses
7 partial credit card numbers (people paste them into support tickets, then apologize)
1 JWT from a webhook payload my agent decoded
1 leaked Stripe test key from a vendor reply
12 phone numbers
3 internal ticket IDs that should not have left the system

I was about to ship that log to S3 for run-history search. The log was also being mirrored to Sentry on error, and to a Slack channel on escalation. Three places to leak from. Zero scrubbing.

I went looking for a small lib that would clean a string before I wrote it. The options were either a paid API, a heavy NER-based PII detector, or a hand-rolled regex I would have to maintain myself. None of those fit a 200-line agent script.

So I built one. It is called agent-redact. The whole thing is around 130 lines, zero runtime dependencies, and pip-installs as agent-redact. Repo: MukundaKatta/agent-redact.

What it looks like

from agent_redact import redact

audit_line = (
    "tool=charge_customer args={'email': 'jane.doe@acme.com', "
    "'card': '4111 1111 1111 1111'} key=sk-" + "Z" * 40
)
print(redact(audit_line))

Output:

tool=charge_customer args={'email': '<email>', 'card': '<credit-card>'} key=<openai-key>

That is the default mode. One function call, one line of output. The pattern set covers email, US SSN, phone numbers, credit cards (with optional Luhn), and the common provider keys (OpenAI, Anthropic, AWS, GitHub, Google, Stripe, Slack), plus JWTs. No model call, no network, no config file.

Hash mode, when you need to keep rows distinguishable

The bigger pain with naive redaction is that you lose all join keys. If jane.doe@acme.com shows up in 30 audit rows, replacing every one with <email> means you can no longer ask "how many runs did this user trigger today" without going back to raw logs.

agent-redact ships a hash mode for exactly that case:

redact("user jane.doe@acme.com retried 3 times", mode="hash", salt="rotate-monthly")
# -> "user <email:7c3a91> retried 3 times"

Same email, same salt, same six-char tag every time. Different user, different tag. You can group, count, and filter on those tags without ever seeing the underlying address. Rotate the salt monthly and the tags rotate too.

Where this fits in the rest of the stack

This is the seventh small Python lib I have shipped in the same "boring middleware for agents" family. The others compose with it directly:

agenttrace writes per-run JSONL with token counts and latency. Pipe that through agent-redact before storage. There is a 30-line example in examples/integrate_with_agenttrace.py that walks rows recursively and scrubs every string node.
agentleash writes an audit log proving an agent stayed under a USD cap. Same scrubber, same hash mode, and now the proof you keep around does not double as a PII spill.
birddog is a scraping middleware. If the scraped page is going to a downstream LLM, run redact() on the page body first so the model never sees the raw payload.

Three integration points, one function, no extra deps.

Design notes worth calling out

Two things are worth flagging if you read the source.

First, overlap resolution. The Anthropic sk-ant- prefix is a strict superset of the OpenAI sk- prefix. Naive iteration over patterns would wrap a key twice, or wrap the wrong label. The fix: collect all matches across all patterns, sort by (earliest start, longest length), then walk forward and skip anything that starts before the previous match ended. Provider keys are listed before generic shapes so the tie-break goes the right way.

Second, the phone pattern needed a separator. The first version matched any 13-digit run, which meant a 16-digit credit card got partially eaten by the phone rule before the card rule ran. Requiring at least one space, dash, or + prefix in the phone pattern fixed that without losing real phone hits.

Try it

pip install agent-redact

from agent_redact import redact
print(redact("contact me at jane@example.com"))

Repo with all the patterns and tests: github.com/MukundaKatta/agent-redact.

If you are building a Hermes agent that touches user input, take 10 minutes this weekend and wrap your audit writer. Future-you, the one explaining the S3 bucket to your security team, will thank you.

My agent kept forgetting what it was doing. A scratchpad fixed it.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:09 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes research agent was asking the same questions twice. It would identify a paper, start analyzing it, then two turns later ask if anyone had studied the same topic. The context window had the answer but the agent wasn't tracking its own progress.

The fix isn't more context — it's structured working memory. That's agent-scratchpad.

The idea

A scratchpad is just a keyed dict with helpers for list building and counting. The useful part is to_text() — it renders the current state as plain text you can inject into any system prompt.

from agent_scratchpad import Scratchpad

pad = Scratchpad()
pad.set("topic", "quantum error correction")
pad.append("papers_found", "Shor 1995")
pad.append("papers_found", "Steane 1996")
pad.increment("search_count")
pad.append("hypotheses", "Surface codes may be more practical than Steane codes")

print(pad.to_text(title="Research progress"))
# Research progress:
# hypotheses:
#   - Surface codes may be more practical than Steane codes
# papers_found:
#   - Shor 1995
#   - Steane 1996
# search_count: 1
# topic: quantum error correction

Inject into prompts

context = pad.to_text(title="What I know so far")

response = client.messages.create(
    model="claude-sonnet-4-5",
    system=f"You are a research assistant.\n\n{context}",
    messages=messages,
)

The scratchpad goes in the system prompt. The agent can read what it's already found and not repeat itself.

All the operations

pad.set("key", value)          # set scalar
pad.get("key", default=None)   # deep copy
pad.delete("key")
pad.has("key")

pad.append("papers", "Shor 1995")   # build lists
pad.prepend("queue", "urgent item") # front-of-list
pad.extend_list("papers", [...])    # bulk append

pad.increment("search_count")   # counter (init to 0)
pad.decrement("errors")
pad.increment("cost_cents", 5)

pad.update({"a": 1, "b": 2})  # set multiple
pad.clear()

JSONL log

pad = Scratchpad("logs/scratchpad.jsonl")
pad.set("topic", "ML")
# appends {"ts": ..., "op": "set", "key": "topic", "value": "ML"}

Replay the scratchpad log to see every decision the agent made.

Save and restore

pad.save("state.json")

# Next run
pad = Scratchpad.load("state.json")

Full JSON snapshot for resuming long-running agents.

Zero dependencies

Standard library only: json, copy, time, pathlib. Nothing else.

pip install agent-scratchpad

Repo: https://github.com/MukundaKatta/agent-scratchpad

I replaced 200 lines of ad-hoc state management in my Hermes agent with one object.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:08 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes research agent was tracking state across 20+ variables. Turn counter. Running cost. Message history. Sub-tasks done. Sub-tasks pending. Errors per tool. Each was a standalone variable at the top of the loop, updated individually, saved separately, and restored manually after a crash.

By turn 47, the state management was 200 lines of ad-hoc code spread across the loop. I replaced it with one object.

One object for all agent state

from agent_state_bag import StateBag

state = StateBag({
    "turn": 0,
    "cost_usd": 0.0,
    "messages": [],
    "sub_tasks_done": [],
    "errors": 0,
})

for turn in range(1, 50):
    state["turn"] = turn
    state.increment("cost_usd", 0.05)
    state.increment("errors") if tool_failed else None

    response = call_llm(state["messages"])
    state["messages"].append({"role": "assistant", "content": response.text})

StateBag is a dict wrapper with extra features. It passes through all the dict methods you expect, plus the things a long-running agent actually needs.

Snapshot before each turn

state.push_turn()   # saves current state to history

# If the turn fails badly, restore to before this turn
state.reset_to(state.last_snapshot())

At the start of each turn, push a snapshot. If something goes catastrophically wrong mid-turn, you can roll back to the clean state before that turn started.

Diff what changed

snap = state.snapshot()

# ... agent does stuff ...

changes = state.diff(snap)
# {"cost_usd": (0.15, 0.20), "messages": (old_list, new_list), "sub_tasks_done": ([], ["task1"])}

I log the diff to my trace file at the end of each turn. When reviewing a run, I can see exactly what each turn changed — without comparing full state snapshots manually.

Numeric helpers

state.increment("turn")               # += 1
state.increment("cost_usd", 0.05)     # += 0.05
state.decrement("retries_left")       # -= 1

Returns the new value. Initializes to 0 if the key is missing.

Persist to disk

state.save("state.json")

state = StateBag.load("state.json")

Plain JSON. Grep-able, inspectable, resumable. I save state after every successful turn. On crash, StateBag.load restores exactly where I was.

Turn history

state.push_turn()    # checkpoint this turn
# ...10 more turns...
state.history        # list of all saved snapshots
state.turn_count     # 11
state.last_snapshot() # most recent snapshot

The history grows across the run. After the run, I can replay it to see how state evolved turn by turn.

Merge from another source

# Worker agent's output state
state.merge(worker_output)   # other wins on conflict; deep-copied

In my multi-agent setup, workers return their results as dicts. The supervisor merges them into its own state.

Zero dependencies

Standard library only: json, copy, dataclasses, typing. No third-party packages.

pip install agent-state-bag

Repo: https://github.com/MukundaKatta/agent-state-bag

The two-line Hermes agent logger I wish existed a month ago

Mukunda Rao Katta — Mon, 25 May 2026 21:21:07 +0000

This is a submission for the Hermes Agent Challenge.

A month ago my Hermes agents were completely unobservable. When a run failed, I had a Python traceback and nothing else. No record of which steps had completed, how long each one had taken, what the model had said at step 4. If the process died at step 47 of 60, I had to restart from step 0.

I needed a step logger. I built a few in-line — a list of dicts that I json.dumps at the end. That works until the process dies. Then I built a file-backed version that writes on each step exit. That's the one that stuck. I packaged it as agent-step-log.

Two lines to add observability to any Hermes loop

from agent_step_log import StepLogger

log = StepLogger("runs/2026-05-24.jsonl")

# In your agent loop:
for task in tasks:
    with log.step("process_task") as step:
        step.input = task["query"]
        step.model = "hermes-3"
        result = call_hermes(task["query"])
        step.output = result
        step.cost_usd = 0.0012

When the with block exits, one JSON line is written to the file:

{
  "name": "process_task",
  "started_at": 1779638601.262,
  "duration_ms": 843,
  "run_id": "a3f7c2",
  "input": "What is the capital of France?",
  "model": "hermes-3",
  "output": "Paris",
  "cost_usd": 0.0012
}

The file is written line by line as steps complete, so tail -f works while the agent is still running.

Crash safety

The reason I went file-backed instead of in-memory is crash safety. If your Hermes agent calls an external API at step 47 and that call hangs, the Python process might eventually be killed by a timeout or OOM. With in-memory logging you lose everything. With agent-step-log, every completed step is already on disk.

log = StepLogger("runs/run.jsonl", fsync=True)

fsync=True adds a flush + fsync after each write. Slower, but the file on disk reflects every committed step even if the process is killed between steps.

Exceptions are captured and re-raised

If an exception fires inside a step block, the library writes the step record with an error field before re-raising:

with log.step("call_api") as step:
    step.tool_name = "fetch_prices"
    result = fetch_prices("AAPL")   # TimeoutError at step 47

# Written to disk:
# {"name": "call_api", "started_at": ..., "duration_ms": 12401, "error": "TimeoutError: upstream timed out"}
# Then TimeoutError propagates out of the with block normally.

This is the part that saved me the most debugging time. The step record on disk tells me exactly which step timed out, how long it waited, and what arguments were passed. The traceback tells me the line number. Together they close the loop without me having to add any extra try/except instrumentation.

Read back and summarize

After a run completes (or is interrupted):

from agent_step_log import read_log, summarize_log

steps = read_log("runs/2026-05-24.jsonl")
for step in steps:
    print(f"{step.name}: {step.duration_ms}ms, ${step.cost_usd:.4f}")

summary = summarize_log("runs/2026-05-24.jsonl")
print(f"Total: {summary.step_count} steps, ${summary.total_cost_usd:.4f}")

Works with the rest of the trace toolchain

The JSONL format that agent-step-log writes is the same format that the rest of my tools read:

trace-merge — merge logs from N agents into one chronological stream
trace-filter — filter events by lane, kind, time, or any field
trace-tree — render any JSONL as an ASCII call tree
tool-call-diff — compare two runs

The chain is: instrument your Hermes agent with agent-step-log, run it, then pass the output file through whichever analysis tool you need. No schema agreement required — just JSONL with a timestamp field.

Technical notes

30 tests. Zero runtime dependencies. Python 3.10+. The step context manager uses time.monotonic() for duration and time.time() for the wall clock started_at so duration is accurate even if the system clock adjusts mid-run. The run_id is a 6-character hex snippet from os.urandom(3) — short enough to type, long enough to distinguish runs in a directory of logs.

Repo: https://github.com/MukundaKatta/agent-step-log

pip install agent-step-log

My Hermes agent called exec_shell. It shouldn't have been able to.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:06 +0000

This is a submission for the Hermes Agent Challenge.

I gave my Hermes research agent a few tools: web_search, read_file, write_file. The model was only supposed to use those. Then in one run it called exec_shell. I had included that tool in the schema earlier during testing and forgot to remove it.

The call went through. The command ran. Nothing bad happened that time, but it could have.

The fix is explicit enforcement. agent-tool-whitelist blocks any tool not on the list, before the call is dispatched.

Check before dispatching

from agent_tool_whitelist import ToolWhitelist, ToolNotAllowedError

whitelist = ToolWhitelist(["web_search", "read_file", "write_file"])

for block in response.content:
    if block.type == "tool_use":
        whitelist.check(block.name)      # raises ToolNotAllowedError if blocked
        result = dispatch(block.name, block.input)

One line. If the model tries to call something you didn't approve, you get an exception before the call happens.

Filter the whole tool_use list

If you'd rather drop blocked calls silently (log them, move on):

safe_calls = whitelist.filter_calls(response.content)
# safe_calls contains only allowed tool_use blocks

for call in safe_calls:
    result = dispatch(call["name"], call.get("input", {}))

# See what was dropped
print(whitelist.denied_names)  # ["exec_shell"]

filter_calls handles both Anthropic content blocks ({"name": "..."}) and OpenAI function call format ({"function": {"name": "..."}}).

Decorator pattern

from agent_tool_whitelist import tool_guard

@tool_guard(whitelist)
def dispatch(name: str, args: dict) -> Any:
    return tool_registry[name](**args)

dispatch("web_search", {"query": "..."})  # ok
dispatch("exec_shell", {"cmd": "rm -rf /"})  # ToolNotAllowedError before dispatch

The decorator wraps any dispatch function. The real dispatch code never sees the blocked name.

raise_on_deny=False for non-exception paths

whitelist = ToolWhitelist(["search"], raise_on_deny=False)
if whitelist.check("exec_shell"):
    dispatch("exec_shell", {})
else:
    logging.warning("Blocked: exec_shell")

What the error looks like

ToolNotAllowedError: Tool 'exec_shell' is not in the allowed list

Configurable:

whitelist = ToolWhitelist(
    ["web_search"],
    deny_message="Agent tried to call '{name}', which is not approved for this run",
)

Audit

blocked = whitelist.denied_names   # all blocked calls across the run
whitelist.reset_denied()           # clear

Useful for logging in a long run — at the end you can see if the model tried to escape the allowed tool set.

Dynamic whitelist

Add or remove tools at runtime:

whitelist.add("calculator")       # now allowed
whitelist.remove("write_file")    # removed mid-run

What I actually whitelist in my Hermes agent

READ_ONLY_WHITELIST = ToolWhitelist([
    "web_search",
    "read_file",
    "arxiv_search",
    "semantic_scholar_search",
])

WRITE_WHITELIST = ToolWhitelist([
    "web_search",
    "read_file",
    "write_file",
    "arxiv_search",
    "semantic_scholar_search",
])

The supervisor gets WRITE_WHITELIST. Workers get READ_ONLY_WHITELIST. Workers can search and read but can't write output files — only the supervisor can.

Zero dependencies

Standard library only: dataclasses, typing. No third-party packages.

pip install agent-tool-whitelist

Repo: https://github.com/MukundaKatta/agent-tool-whitelist

My Hermes agent ran 500 turns and cost $40 before I noticed. Now it can't.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:05 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes agent had a bug in its stop condition. The exit check evaluated to False every time, so the agent kept going. The tool calls chained. The loop ran. I came back to 500 turns and a $40 bill.

The fix is a hard cap with a warning before the cap hits. That's agent-turn-limit.

One line to add a hard limit

from agent_turn_limit import make_turn_counter, TurnLimitExceeded

counter = make_turn_counter(hard_limit=20)

while True:
    counter.tick()   # raises TurnLimitExceeded at turn 21
    response = client.messages.create(...)
    if is_done(response):
        break

That's the whole integration. One make_turn_counter, one tick() call per iteration. If the loop runs 21 turns, you get an explicit exception instead of a runaway process.

Warnings before the limit

The default warning thresholds are 50% and 80% of hard_limit. At those turns, your callback fires:

import logging
from agent_turn_limit import make_turn_counter, TurnWarning

def warn(w: TurnWarning) -> None:
    logging.warning(w.message)
    # "agent: turn 10/20 (10 remaining)"

counter = make_turn_counter(hard_limit=20, on_warn=warn)

You can set absolute turn numbers or fractions:

# Integer thresholds
counter = make_turn_counter(hard_limit=20, warn_at=[10, 16], on_warn=warn)

# Fraction of hard_limit (same result)
counter = make_turn_counter(hard_limit=20, warn_at=[0.5, 0.8], on_warn=warn)

This is useful in multi-agent systems where "turn 16/20" in a sub-agent should trigger
a summary or escalation before the hard stop.

stop_on_limit=False for graceful handling

If you'd rather check a return value than catch an exception:

from agent_turn_limit import TurnCounter

counter = TurnCounter(hard_limit=20, stop_on_limit=False)

while True:
    if not counter.tick():
        summary = summarize_so_far()
        break
    ...

tick() returns False when the limit is hit. No exception. You handle it however you want.

Context manager form

from agent_turn_limit import TurnCounter, TurnLimitExceeded

try:
    with TurnCounter(hard_limit=10, label="summarizer") as c:
        while True:
            c.tick()
            ...
except TurnLimitExceeded as e:
    print(e)  # "summarizer: hard turn limit of 10 exceeded (turn 11)"

The label parameter makes the error message identify which agent hit the limit when
you have multiple running concurrently.

TurnWarning has the data you need

@dataclass
class TurnWarning:
    turn: int          # current turn number
    hard_limit: int    # the cap you set
    remaining: int     # turns left
    message: str       # human-readable summary

Log it, emit a metric, attach it to a span — it has everything.

What this solves in a Hermes agent

My Hermes research agent runs a supervisor + worker loop. The supervisor dispatches
sub-tasks to workers. Each worker has its own turn counter:

worker_counter = make_turn_counter(
    hard_limit=15,
    warn_at=[0.6, 0.85],
    on_warn=lambda w: supervisor.log(f"worker {worker_id}: {w.message}"),
    label=f"worker-{worker_id}",
)

At 60% (turn 9) the supervisor knows the worker is taking longer than expected. At 85%
(turn 12) it pre-emptively summarizes what the worker has found so far. At turn 16 the
worker is stopped with an exception.

The supervisor handles TurnLimitExceeded, uses the partial summary, and marks the
sub-task as incomplete rather than letting it drain the full token budget on a dead end.

Zero dependencies

Standard library only: dataclasses, typing. No third-party packages.

pip install agent-turn-limit

Repo: https://github.com/MukundaKatta/agent-turn-limit

My Hermes agent loop blew the context window at turn 47. llm-context-trim fixed it.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:04 +0000

This is a submission for the Hermes Agent Challenge.

My Hermes research agent ran in a loop. The supervisor asked a question, a worker searched, the supervisor synthesized, repeat. After 47 turns, the API returned a context length error.

I knew this would happen eventually. I needed to trim the message list before each call, but with two hard rules: never drop the system prompt, and never drop the last two turns (the current question and the previous answer). Everything else was fair game.

That's llm-context-trim.

The problem with rolling windows

The obvious fix is a rolling window: keep the last N messages. But N what? If I keep 20 messages and the system prompt is 800 tokens, I still need to count. If a few messages have tool call results that are unusually long, 20 messages might still overflow. And a fixed N doesn't adapt to the actual content of the messages.

What I wanted was: keep as many middle messages as fit in the remaining budget, newest first, always guarantee system + tail.

One function

from llm_context_trim import trim_messages

result = trim_messages(
    messages,        # the full conversation history
    max_tokens=4096, # my budget for the messages portion of the call
    keep_last=2,     # always keep the last 2 messages
)

print(f"Was {result.original_count} messages, now {result.trimmed_count} removed")
print(f"~{result.estimated_tokens} tokens")

# Pass to the next LLM call
response = client.messages.create(
    model="claude-sonnet-4-6",
    messages=result.messages,
    max_tokens=1024,
)

What it keeps

In priority order:

System message — always kept if it's the first message with role="system". Disable with keep_system=False.
Last keep_last messages — always kept. Default is 2 (the current user turn and the previous assistant turn).
Middle messages — added newest-first until the budget runs out. Older middle messages are dropped first.

Integration in an agent loop

from llm_context_trim import trim_messages, ContextTrimError

def run_loop(system_prompt, history, new_user_msg, max_context_tokens=6000):
    history.append({"role": "user", "content": new_user_msg})

    try:
        trimmed = trim_messages(history, max_tokens=max_context_tokens, keep_last=3)
    except ContextTrimError as e:
        # System + last 3 already over budget — need to shorten keep_last or system prompt
        raise RuntimeError(f"Context too tight: {e}") from e

    response = client.messages.create(
        model="claude-sonnet-4-6",
        system=system_prompt,
        messages=trimmed.messages,
        max_tokens=1024,
    )

    history.append({"role": "assistant", "content": response.content[0].text})
    return response.content[0].text

I pass system separately in Anthropic's API, so keep_system=False in that case and I don't add the system message to my history list at all. Either pattern works.

Token estimation

No tokenizer dependency. The estimate uses chars / 4 + 4 per message — the same rough heuristic that most LLM providers document for planning purposes. It's deliberately conservative: it over-estimates slightly so trimming never cuts too close to the edge.

If you need exact token counts, run your tokenizer first and pass the result as max_tokens:

from llm_context_trim import estimate_tokens

rough_estimate = sum(estimate_tokens(m["content"]) for m in messages)

Error handling

If the system message + last keep_last messages alone already exceed max_tokens, the function raises ContextTrimError instead of returning a list that's already over budget. You get an explicit failure rather than a silent overflow:

ContextTrimError: System + last 2 messages already use ~4800 tokens
which exceeds max_tokens=4096. Increase max_tokens or reduce keep_last.

Technical notes

19 tests. Zero runtime dependencies. Python 3.10+. The test suite covers the basic no-trim case, zero/negative budget errors, the mandatory-exceeds-budget error path, system message preservation, keep_system=False, keep_last edge cases (zero, all), order preservation after trimming, Anthropic content blocks, and TrimResult metadata correctness.

Repo: https://github.com/MukundaKatta/llm-context-trim

pip install llm-context-trim

Building LLM message lists by hand is error-prone. There's a better way.

Mukunda Rao Katta — Mon, 25 May 2026 21:21:03 +0000

This is a submission for the Hermes Agent Challenge.

Every Hermes agent I wrote had a different helper function for building the messages list. Some used append, some built dicts inline, some forgot to deep copy before passing to the API. After accidentally sharing state between two conversation forks I decided to make this its own library.

That's llm-message-builder.

Basic conversation

from llm_message_builder import MessageBuilder

messages = (
    MessageBuilder()
    .system("You are a research assistant.")
    .user("What is the boiling point of water?")
    .assistant("100°C at standard pressure.")
    .user("What about at altitude?")
    .build()
)

response = client.messages.create(
    model="claude-sonnet-4-5",
    messages=messages,
)

Tool use (the hard part)

Building tool_use/tool_result pairs by hand is where bugs hide. The builder does it right:

messages = (
    MessageBuilder()
    .system("You are helpful.")
    .user("Search for the latest Python release.")
    .assistant_tool_use("call_1", "web_search", {"q": "Python latest release"})
    .user_tool_result("call_1", "Python 3.14 was released in 2025.")
    .assistant("The latest Python release is 3.14.")
    .build()
)

The tool_use_id in the result block always matches the id in the tool_use block. No manual string tracking.

Optional text before tool_use

builder.assistant_tool_use(
    "call_1", "web_search", {"q": "query"},
    text="Let me look that up for you.",
)
# content: [{"type": "text", "text": "..."}, {"type": "tool_use", ...}]

Tool result as error

builder.user_tool_result("call_1", "connection timed out", is_error=True)

Fork conversations without sharing state

base = MessageBuilder().system("You are a research assistant.")

# Fork — each gets its own deep copy
thread_a = base.copy().user("Question A").assistant("Answer A")
thread_b = base.copy().user("Question B").assistant("Answer B")

# base is unchanged

Inspect

builder.count()   # 3
builder.roles()   # ["system", "user", "assistant"]
builder.last()    # {"role": "assistant", "content": "..."}
len(builder)      # 3

Extend from existing messages

# Add pre-built message dicts from another source
builder.extend(prior_conversation_messages)

Zero dependencies

Standard library only: copy, dataclasses. Nothing else.

pip install llm-message-builder

Repo: https://github.com/MukundaKatta/llm-message-builder