Mukunda Rao Katta

Posted on May 25

I ran three Hermes agents in parallel. trace-merge stitched them into one timeline I could read.

#devchallenge #hermesagentchallenge #agents #observability

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge.

Last week I ran a small Hermes orchestra: a supervisor agent plus two worker agents, fanning out a research task. Each one wrote its own JSONL audit log. When the run finished I had three files open in three terminal panes.

Reading them side by side did not work. My eyes kept drifting to whichever pane was tallest. I lost the thread of which worker had done what at the same moment the supervisor had done what. I scrolled back and forth for a few minutes, gave up, and wrote down what I actually wanted: one file, every event from every agent, sorted by time, with a label per row that said which agent it came from.

That is trace-merge.

The three files

Here is roughly what each log looked like. Truncated to four or five lines each so this post stays readable.

supervisor.jsonl:

{"ts":1779638601.000,"kind":"session_open","msg":"supervisor starting"}
{"ts":1779638601.050,"kind":"tool_ok","tool":"spawn_worker","msg":"spawned worker1"}
{"ts":1779638601.100,"kind":"tool_ok","tool":"spawn_worker","msg":"spawned worker2"}
{"ts":1779638605.200,"kind":"session_close","msg":"all workers reported"}

worker1.jsonl:

{"ts":1779638601.300,"kind":"session_open","msg":"worker1 starting"}
{"ts":1779638601.450,"kind":"tool_ok","tool":"search","msg":"fetched 10 results"}
{"ts":1779638602.100,"kind":"tool_ok","tool":"summarize","msg":"summary ready"}
{"ts":1779638602.800,"kind":"session_close","msg":"worker1 done"}

worker2.jsonl:

{"ts":1779638601.400,"kind":"session_open","msg":"worker2 starting"}
{"ts":1779638601.700,"kind":"tool_ok","tool":"search","msg":"fetched 25 results"}
{"ts":1779638603.500,"kind":"tool_denied","tool":"fetch_url","error":"host not in allowlist"}
{"ts":1779638604.100,"kind":"tool_ok","tool":"summarize","msg":"summary ready"}
{"ts":1779638604.900,"kind":"session_close","msg":"worker2 done"}

Each file is fine on its own. Together they are a small headache. Worker2 hit an egress block somewhere in the middle of the run, but the supervisor's log does not mention that, and worker1's log does not mention it either. To see when the block happened relative to what the supervisor and worker1 were doing, I needed all three on one ruler.

What trace-merge does

It takes a list of JSONL files and gives you one merged stream:

from trace_merge import merge_traces

merged = merge_traces([
    "runs/supervisor.jsonl",
    "runs/worker1.jsonl",
    "runs/worker2.jsonl",
])

merged.write_jsonl("runs/all.jsonl")

Every event in all.jsonl carries a new lane field. The lane is the filename stem by default, so worker1.jsonl becomes lane worker1. You can override with lane_names=["sup", "w1", "w2"] when the filenames are not friendly.

Here is the merged view of the three logs above:

t=1779638601.000  [sup]   session_open   supervisor starting
t=1779638601.050  [sup]   tool_ok        spawned worker1
t=1779638601.100  [sup]   tool_ok        spawned worker2
t=1779638601.300  [w1]    session_open   worker1 starting
t=1779638601.400  [w2]    session_open   worker2 starting
t=1779638601.450  [w1]    tool_ok        fetched 10 results
t=1779638601.700  [w2]    tool_ok        fetched 25 results
t=1779638602.100  [w1]    tool_ok        summary ready
t=1779638602.800  [w1]    session_close  worker1 done
t=1779638603.500  [w2]    tool_denied    host not in allowlist
t=1779638604.100  [w2]    tool_ok        summary ready
t=1779638604.900  [w2]    session_close  worker2 done
t=1779638605.200  [sup]   session_close  all workers reported

Now I can read it. The supervisor spawned both workers within 100 ms. Worker1 got back ten results, summarized them, and was done in 1.5 seconds. Worker2 fetched twenty-five results but then tried to call out to a URL that was not in its egress allowlist, lost about 1.8 seconds to that failure path, summarized what it had, and finished a couple of seconds after worker1. The supervisor sat idle for most of the run waiting for both workers to finish.

That story was sitting in three files the whole time. Reading the three of them in three panes, I would have caught the egress block eventually, but the timing context relative to the rest of the run would have been a guess.

The gantt view

Sometimes I do not even want the merged events. I just want to know which agent was busy when. trace-merge has a --gantt mode for exactly that:

python3 -m trace_merge --gantt sup.jsonl w1.jsonl w2.jsonl

sup  |=========================================================>|
w1   |    ====================>                                 |
w2   |     ================================================>    |

events: sup=4, w1=4, w2=5
total wall clock: 4.2s

Each row is a lane. The bar runs from the lane's first event to its last event. Reading top to bottom you get the shape of the run in one glance: supervisor was alive for the full window, worker1 finished early, worker2 ran almost to the end. If a worker had taken thirty seconds to start, the gap on the left of its bar would show that. If a worker had finished early, the bar would stop short.

How the timestamps line up

The three libraries I use for audit logs do not agree on timestamp format. agenttrace writes float seconds. agentleash writes int milliseconds for one field and float seconds for another. Some scratch scripts I wrote use ISO strings. trace-merge normalizes all of them to float seconds before sorting, with three rules:

float or int below 1e12 is treated as seconds
int above 1e12 is treated as milliseconds (the heuristic split is unambiguous because 1e12 seconds is the year 33658)
ISO 8601 strings are parsed with datetime.fromisoformat, including the trailing Z shorthand

The timestamp key is auto-detected per file. It tries ts, timestamp, time against the first event of each file. You can pin it explicitly with ts_key="when" if your logs use something else.

Ties on timestamp are broken by the file order you passed in. So if you list the supervisor first, "supervisor before worker" survives even when the clock readings are identical down to the microsecond. That detail mattered for me because two of my agents share a clock source and routinely write events with the exact same ts.

Where it fits

I have a small chain of these tools now. agenttrace and agentleash write the logs. trace-merge stitches multiple logs into one stream. trace-tree reads any of those streams and prints a tree. tool-call-diff compares a baseline merged run against a candidate when I am trying to figure out which prompt change moved which tool call.

trace-merge is the piece in the middle for the multi-agent case. With one agent you do not need it. With three agents and a fan-out, it is the difference between debugging in three panes and debugging in one file.

What it is not

It is not a tracing system. It does not own the write side. It does not tail live processes. It reads N files start to finish and produces one file. Zero runtime dependencies, about 150 lines of stdlib Python. 33 tests covering the timestamp normalizer, the tie-breaking rule, the gantt renderer, the per-lane summary, and the error paths when a ts key is missing.

Try it

pip install trace-merge
python3 -m trace_merge sup.jsonl w1.jsonl w2.jsonl > merged.jsonl
python3 -m trace_merge --gantt sup.jsonl w1.jsonl w2.jsonl

Repo: https://github.com/MukundaKatta/trace-merge

DEV Community