This is a submission for the Hermes Agent Challenge.
My Hermes research agent was misbehaving — bad summaries, wrong citations. I suspected a tool call was returning unexpected data. But I had no log of what the tools actually returned. I was debugging from model outputs alone, which are one layer of indirection away from the real problem.
After I added tool-call-log, the next time it happened I opened the log file and saw the answer in 30 seconds.
One logger, two patterns
Pattern 1 — inline, after the call:
from tool_call_log import ToolCallLogger
logger = ToolCallLogger("logs/tools.jsonl", meta={"run_id": "run-001"})
result = web_search(query)
logger.log("web_search", {"query": query}, result=result)
Pattern 2 — context manager, auto-times the call:
with logger.record("web_search", {"query": "climate policy 2026"}) as r:
r.result = web_search(r.args["query"])
# Duration measured automatically. Logged on exit.
Both write the same JSONL format. Both capture name, args, result, duration, call_id, error, and any metadata you attach.
The log file
Each call is one JSON line:
{"name":"web_search","args":{"query":"climate policy 2026"},"result":[{"title":"...","url":"..."}],"started_at":1716566401.2,"ended_at":1716566402.8,"call_id":"toolu_abc","error":"","meta":{"run_id":"run-001"},"duration_ms":1600.0}
Plain text. Grep-able. You can jq it, tail it during a run, diff two runs, or replay it.
Errors are logged too
If the tool call raises an exception, the context manager logs the error and re-raises:
with logger.record("flaky_api", {"id": 42}) as r:
r.result = call_flaky_api(42) # raises TimeoutError
# Logged: error="", ok=False — exception still propagates
This is what I needed. I wasn't sure if the tool was failing silently or returning bad data. Now I can tell: ok=False means it raised, a bad result with ok=True means the tool returned garbage.
Read it back
from tool_call_log import load_tool_log
records = load_tool_log("logs/tools.jsonl")
failed = [r for r in records if not r.ok]
slow = sorted(records, key=lambda r: r.duration_ms or 0, reverse=True)[:5]
Meta propagates to every record
logger = ToolCallLogger(
"logs/tools.jsonl",
meta={"run_id": "run-001", "agent": "research-worker"},
)
Every record gets those fields in meta. Per-call meta overrides:
logger.log("search", {"q": "foo"}, meta={"turn": 7})
# meta = {"run_id": "run-001", "agent": "research-worker", "turn": 7}
What I log in my Hermes agent
with logger.record("search_papers", {"query": sub_task}) as r:
papers = search_semantic_scholar(r.args["query"])
r.result = [{"id": p.id, "title": p.title, "abstract": p.abstract} for p in papers]
Six months from now I can trace any agent output back to the exact tool call that produced it. The JSONL files are small enough to keep around — 1000 tool calls is about 500KB.
Zero dependencies
Standard library only: json, pathlib, time, dataclasses. No third-party packages.
pip install tool-call-log
Top comments (0)