This is a submission for the Hermes Agent Challenge.
Last week I wrote about trace-merge, which takes N agent JSONL logs and stitches them into one chronological stream. It works. But the merged file from a three-agent run with a few hundred steps has somewhere around 5000 events in it. That file is not debuggable by eye.
What I actually wanted was: show me only the events from worker2, between t=1.2s and t=3.8s, where the event had an error field. That's a query. I needed a filter.
The problem with grep + jq pipelines
My first attempt was a shell one-liner:
cat merged.jsonl | jq 'select(.lane == "worker2" and .error != null)'
This works once. The second time I need a slightly different query I write a slightly different one-liner. Three runs in, I have three different one-liners and no memory of which one was for which debugging session. The fourth time, I want to combine lane + time range + kind, and the jq expression is getting long enough that I start making typos.
The pattern I wanted was composable predicates that I could mix and match, with a CLI for the common cases. That is trace-filter.
Basic usage
from trace_filter import filter_trace, load_jsonl
events = load_jsonl("merged.jsonl")
# Simple keyword filters (all ANDed)
result = filter_trace(events,
lane="worker2",
kind="tool_call",
after=1779638601.2,
before=1779638603.8,
)
print(f"{len(result)} matching events")
CLI version of the same query:
python3 -m trace_filter merged.jsonl \
--lane worker2 \
--kind tool_call \
--after 1779638601.2 \
--before 1779638603.8
Composable predicates
For anything more complex, you build predicates and combine them:
from trace_filter import (
filter_trace,
all_of,
any_of,
lane_is,
kind_is,
has_error,
negate,
field_contains,
)
# Worker2 events that are tool calls OR tool errors, but not session lifecycle
result = filter_trace(events, predicate=all_of(
lane_is("worker2"),
any_of(kind_is("tool_call"), kind_is("tool_ok")),
negate(has_error()),
))
Every predicate is a Callable[[dict], bool]. They compose cleanly because the types are trivial. all_of() is just all(p(event) for p in preds). No magic, no DSL.
The predicate table
| Predicate | What it does |
|---|---|
lane_is(name) |
event["lane"] == name |
kind_is(name) |
matches kind, event_type, or type field |
field_equals(key, value) |
any field, supports dot-notation (meta.tool) |
field_contains(key, substr) |
substring match on any string field |
after_ts(ts) |
ts >= ts (seconds since epoch) |
before_ts(ts) |
ts < ts |
has_error() |
truthy error field |
all_of(*preds) |
AND |
any_of(*preds) |
OR |
negate(pred) |
NOT |
The dot-notation in field_equals was a small thing I added after needing it once: my Hermes agents log structured metadata like {"meta": {"tool": "search_web"}} and I wanted to filter on meta.tool without flattening the whole event first.
Timestamp handling
trace-filter recognizes the same timestamp shapes as trace-merge: float seconds, int seconds, int milliseconds (heuristic: anything above 1e12), and ISO 8601 strings. The key is auto-detected from ts, timestamp, or time on each event.
This matters because three of the libraries I use for agent logging do not agree on timestamp format. trace-filter normalizes them all so --after and --before work across mixed-format files without any preprocessing step.
Writing filtered results
If I want to keep the filtered slice for later:
from trace_filter import write_jsonl
n = write_jsonl(result, "worker2_errors.jsonl")
Or from the CLI:
python3 -m trace_filter merged.jsonl --errors -o errors.jsonl
Then I can feed errors.jsonl into trace-tree to see the call tree for just those events, or into tool-call-diff to compare the error pattern across two runs.
Where it fits in the chain
I have a small toolchain now that I use across my Hermes multi-agent runs:
- Agents write JSONL logs (agenttrace, agentleash, or my own scripts)
- trace-merge stitches them into one chronological stream
- trace-filter drills down to the events I care about
- trace-tree renders the filtered slice as a readable tree
- tool-call-diff compares filtered slices across runs
Each tool reads and writes plain JSONL, so I can pipe them. The filter step is the piece that keeps the later steps from being overwhelmed by noise.
Technical notes
58 tests. Zero runtime dependencies. Python 3.10+. About 200 lines of library code (predicates + filter engine), another 80 lines of CLI wrapper.
The test file for predicates goes through every leaf predicate and every combinator, including the edge cases that caught me during development: has_error() with a falsy error field (empty string vs. None vs. missing), before_ts being strictly less-than rather than less-than-or-equal (which matters when two events share the same timestamp), and the dot-notation case where the parent field exists but is a string rather than a dict.
Repo: https://github.com/MukundaKatta/trace-filter
pip install trace-filter
Top comments (0)