5000 events, one worker, one bug: trace-filter for agent JSONL traces

#devchallenge #hermesagentchallenge #agents #observability

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

Last week I wrote about trace-merge, which takes N agent JSONL logs and stitches them into one chronological stream. It works. But the merged file from a three-agent run with a few hundred steps has somewhere around 5000 events in it. That file is not debuggable by eye.

What I actually wanted was: show me only the events from worker2, between t=1.2s and t=3.8s, where the event had an error field. That's a query. I needed a filter.

The problem with grep + jq pipelines

My first attempt was a shell one-liner:

cat merged.jsonl | jq 'select(.lane == "worker2" and .error != null)'

This works once. The second time I need a slightly different query I write a slightly different one-liner. Three runs in, I have three different one-liners and no memory of which one was for which debugging session. The fourth time, I want to combine lane + time range + kind, and the jq expression is getting long enough that I start making typos.

The pattern I wanted was composable predicates that I could mix and match, with a CLI for the common cases. That is trace-filter.

Basic usage

from trace_filter import filter_trace, load_jsonl

events = load_jsonl("merged.jsonl")

# Simple keyword filters (all ANDed)
result = filter_trace(events,
    lane="worker2",
    kind="tool_call",
    after=1779638601.2,
    before=1779638603.8,
)
print(f"{len(result)} matching events")

CLI version of the same query:

python3 -m trace_filter merged.jsonl \
  --lane worker2 \
  --kind tool_call \
  --after 1779638601.2 \
  --before 1779638603.8

Composable predicates

For anything more complex, you build predicates and combine them:

from trace_filter import (
    filter_trace,
    all_of,
    any_of,
    lane_is,
    kind_is,
    has_error,
    negate,
    field_contains,
)

# Worker2 events that are tool calls OR tool errors, but not session lifecycle
result = filter_trace(events, predicate=all_of(
    lane_is("worker2"),
    any_of(kind_is("tool_call"), kind_is("tool_ok")),
    negate(has_error()),
))

Every predicate is a Callable[[dict], bool]. They compose cleanly because the types are trivial. all_of() is just all(p(event) for p in preds). No magic, no DSL.

The predicate table

Predicate	What it does
`lane_is(name)`	`event["lane"] == name`
`kind_is(name)`	matches `kind`, `event_type`, or `type` field
`field_equals(key, value)`	any field, supports dot-notation (`meta.tool`)
`field_contains(key, substr)`	substring match on any string field
`after_ts(ts)`	ts >= ts (seconds since epoch)
`before_ts(ts)`	ts < ts
`has_error()`	truthy `error` field
`all_of(*preds)`	AND
`any_of(*preds)`	OR
`negate(pred)`	NOT

The dot-notation in field_equals was a small thing I added after needing it once: my Hermes agents log structured metadata like {"meta": {"tool": "search_web"}} and I wanted to filter on meta.tool without flattening the whole event first.

Timestamp handling

trace-filter recognizes the same timestamp shapes as trace-merge: float seconds, int seconds, int milliseconds (heuristic: anything above 1e12), and ISO 8601 strings. The key is auto-detected from ts, timestamp, or time on each event.

This matters because three of the libraries I use for agent logging do not agree on timestamp format. trace-filter normalizes them all so --after and --before work across mixed-format files without any preprocessing step.

Writing filtered results

If I want to keep the filtered slice for later:

from trace_filter import write_jsonl
n = write_jsonl(result, "worker2_errors.jsonl")

Or from the CLI:

python3 -m trace_filter merged.jsonl --errors -o errors.jsonl

Then I can feed errors.jsonl into trace-tree to see the call tree for just those events, or into tool-call-diff to compare the error pattern across two runs.

Where it fits in the chain

I have a small toolchain now that I use across my Hermes multi-agent runs:

Agents write JSONL logs (agenttrace, agentleash, or my own scripts)
trace-merge stitches them into one chronological stream
trace-filter drills down to the events I care about
trace-tree renders the filtered slice as a readable tree
tool-call-diff compares filtered slices across runs

Each tool reads and writes plain JSONL, so I can pipe them. The filter step is the piece that keeps the later steps from being overwhelmed by noise.

Technical notes

58 tests. Zero runtime dependencies. Python 3.10+. About 200 lines of library code (predicates + filter engine), another 80 lines of CLI wrapper.

The test file for predicates goes through every leaf predicate and every combinator, including the edge cases that caught me during development: has_error() with a falsy error field (empty string vs. None vs. missing), before_ts being strictly less-than rather than less-than-or-equal (which matters when two events share the same timestamp), and the dot-notation case where the parent field exists but is a string rather than a dict.

Repo: https://github.com/MukundaKatta/trace-filter