Mukunda Rao Katta

Posted on May 25

I had 800 lines of Hermes agent audit log. trace-tree turned it into a tree I could read.

#devchallenge #hermesagentchallenge #observability #agents

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge.

I ran a Hermes agent overnight. In the morning I opened the audit log. There were 800 lines of JSONL. I scrolled for a minute, sighed, and closed the file.

The problem was not that the agent had failed. The problem was that I could not tell what it had done.

This is what the raw log looked like. Five lines, picked at random:

{"ts":1779638601.262,"session_id":"abc12","kind":"session_open","tool":null,"args_hash":null,"url":null,"usd":0.0,"error":null,"extra":{}}
{"ts":1779638601.265,"session_id":"abc12","kind":"tool_ok","tool":"locus.payments.charge","args_hash":"aeff9a9e","url":null,"usd":4.99,"error":null,"extra":{"latency_ms":12}}
{"ts":1779638601.266,"session_id":"abc12","kind":"budget_denied","tool":"locus.payments.charge","args_hash":"5715bbc0","url":null,"usd":7.0,"error":"budget exceeded: 11.99 > 10","extra":{"latency_ms":1}}
{"ts":1779638601.266,"session_id":"abc12","kind":"tool_denied","tool":"locus.payments.charge","args_hash":"84f50ae9","url":null,"usd":0.0,"error":"args invalid: -1.0 < min 0.5","extra":{}}
{"ts":1779638601.267,"session_id":"abc12","kind":"egress_denied","tool":null,"args_hash":null,"url":"https://evil.attacker.example/exfil","usd":0.0,"error":"host not in allowlist","extra":{}}

You can read one of those, sure. Maybe two. By line forty your brain has tuned out. There is no shape to it. Every row is the same width. Every row carries six fields you do not care about and one you do, and the one you care about is in a different column each time.

What I actually wanted was a tree. A session is a root. Tool calls are children. Denied calls are children with an error attached. Stuff the agent tried to do that got blocked should look blocked. That is how I read traces in any other system, and there was no reason this log could not be drawn the same way.

So I wrote trace-tree.

What trace-tree does

It reads a JSONL audit log and prints a tree. That is the whole library.

pip install trace-tree
trace-tree runs/audit.jsonl

The same five lines above render as this:

session-abc12 [4.99 USD, 1 call, 850 ms]
├─ session_open [ts=1779638601.262143]
├─ tool_ok locus.payments.charge [4.99 USD, 12 ms]
│  └─ args_hash=aeff9a9ed25b8e06
├─ budget_denied locus.payments.charge [7.00 USD attempted, 1 ms]
│  ├─ args_hash=5715bbc0d738a5a0
│  └─ error="budget exceeded: 11.99 > 10"
├─ tool_denied locus.payments.charge
│  ├─ args_hash=84f50ae9b21ff1d0
│  └─ error="args invalid: -1.0 < min 0.5"
├─ egress_denied
│  ├─ url=https://evil.attacker.example/exfil
│  └─ error="host not in allowlist"
└─ session_close [4.99 USD, 837 ms]

I can read that. I can see the agent tried to charge a customer five dollars, the charge went through, then the agent tried to charge seven dollars and the budget guard stopped it, then the agent tried to charge a negative amount and arg validation stopped it, then the agent tried to talk to an attacker URL and the egress guard stopped it. The session ended with 4.99 USD actually spent. The whole run took 850 ms.

The session root carries an aggregate. Only tool_ok rows count toward spend, so attempted-and-denied charges are visible but they do not pollute the total. That single rule made the difference between a tree I could trust and a tree that lied about my bill.

Why a tree

Most agent audit tooling I see falls into two camps. Either you ship the log to a hosted observability vendor and view it in their UI, or you grep it. Both are too much for the case where you just want to know what the agent did in the last run.

The tree fits in the terminal. No login. No upload. No vendor lock. You open the file, you read the tree, you close the file. If the run was good you move on. If the run was bad you have a clear pointer at which step went wrong.

Reading several shapes

I have a bunch of small libraries that write audit logs in slightly different shapes. agenttrace writes parent_span_id and latency_ms. agentleash writes the agentleash shape you saw above with args_hash and a top-level error field. agentsnap writes a single object per run with a steps list nested inside. agent-step-log writes per-step rows with step instead of kind.

I did not want a separate reader for each one. So the parser normalizes all of them into one Event shape. It accepts a few common aliases per field:

concept	accepted keys
event kind	`kind`, `event`, `type`, `step`, `name`
tool name	`tool`, `tool_name`, `function`, `name`
parent id	`parent_span_id`, `parent_id`, `parent`
cost	`usd`, `cost_usd`, `cost`, `price_usd`
latency	`latency_ms`, `duration_ms`, `elapsed_ms` (top level or under `extra`)
error	`error`, `err`, `message` (top level or under `extra`)

If your shape is none of those, the tree still draws something. The row just collapses to its kind. You lose nothing for trying.

Two tree modes

By default trace-tree groups events by session_id. Every session becomes a root. Every event in that session is a direct child. This matches what agentleash and agent-step-log actually emit, since those formats do not track per-call parent ids.

If your log does carry parent ids (like agenttrace does), you pass --parent-key parent_span_id and trace-tree builds a real nested tree. A tool call that triggers a sub-call shows up as a child of the parent. Same library, two modes, picked at the CLI.

from trace_tree import render_file, Tree

# Flat session view
print(render_file("runs/audit.jsonl"))

# Real parent-child nesting
tree = Tree.from_jsonl("runs/audit.jsonl", parent_key="parent_span_id")
print(tree.render(max_depth=10, show_timing=True))

What it is not

trace-tree does not try to be a tracing system. It does not own the write side. It does not send anything anywhere. It does not parse non-JSONL files. It does not do colors (yet).

It reads a file and prints a tree. That is the whole thing. About 150 lines of stdlib Python.

Where it fits with my other Hermes libraries

trace-tree sits at the end of a small chain. agenttrace writes the log. agentleash writes a stricter log with budget and egress guards. agentsnap snapshots tool-call traces. trace-tree reads any of those and gives you something you can actually look at.

If you run a Hermes agent for any real workload, sooner or later you will end up staring at a JSONL file and wondering what happened. This is a tiny tool for that moment.

Try it

pip install trace-tree
trace-tree your-log.jsonl

Repo: https://github.com/MukundaKatta/trace-tree

DEV Community