Mukunda Rao Katta

Posted on May 25

My Hermes agent spent how much? trace-cost for JSONL audit logs

#devchallenge #hermesagentchallenge #agents #observability

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge.

I ran a Hermes multi-agent research run last Thursday. Three agents, a supervisor and two workers, 47 steps total. The run finished successfully. Then I looked at my API dashboard and asked: how much did that cost?

The dashboard shows monthly totals. It doesn't tell me which run used what. My JSONL logs had cost_usd on each step — I had put it there on purpose. But summing 47 lines of JSON by hand is not something I'm willing to do. I wrote a one-liner with jq. Then I wanted it broken down by model. Then by agent (lane). Then by step type. The jq one-liner got long.

That's trace-cost.

One command

python3 -m trace_cost run.jsonl

total:     $0.034200
events:    47 (12 without cost)
tokens in: 48,200
tokens out:8,400

python3 -m trace_cost merged.jsonl --by model

by model:
  hermes-3                       $0.028000  (81.9%)
  gemini-flash                   $0.006200  (18.1%)

python3 -m trace_cost merged.jsonl --by lane

by lane:
  supervisor                     $0.008000  (23.4%)
  worker1                        $0.014000  (40.9%)
  worker2                        $0.012200  (35.7%)

That last one told me something I didn't expect: worker1 was doing more LLM work than worker2. I had assumed they were symmetric. They weren't. I had given them different system prompt lengths. The cost breakdown surfaced that in one command.

Python API

from trace_cost import cost_report, load_jsonl

events = load_jsonl("merged.jsonl")
report = cost_report(events)

print(f"Total: ${report.total_usd:.4f}")
print(f"By model: {report.by_model}")
print(f"By step: {report.by_step}")

CostReport is a dataclass with total_usd, tokens_in, tokens_out, by_model, by_lane, by_step, event_count, and zero_cost_events. You get a structured result you can inspect, serialize, or compare between runs.

Works with whatever field names you already use

The most useful thing I did was make trace-cost try multiple common field names without requiring config:

What	Keys tried
Cost	`cost_usd`, `cost`, `price_usd`, `usd`
Input tokens	`tokens_in`, `input_tokens`, `prompt_tokens`
Output tokens	`tokens_out`, `output_tokens`, `completion_tokens`
Model	`model`, `model_id`, `model_name`
Step name	`name`, `step`, `kind`, `event_type`, `type`

So it works on agenttrace output, agentleash output, agent-step-log output, and whatever scratch logs you wrote yourself, without a config file.

Pipe it with trace-filter

The real power is combining it with trace-filter to answer questions like "what did just the tool calls cost?":

from trace_filter import filter_trace, load_jsonl as filter_load
from trace_cost import cost_report

events = filter_load("merged.jsonl")
tool_calls = filter_trace(events, kind="tool_call")
report = cost_report(tool_calls)
print(f"Tool call cost: ${report.total_usd:.4f}")

Or "what did worker1's errors cost me?" (yes, errors still cost tokens):

from trace_filter import filter_trace, has_error, all_of, lane_is
from trace_cost import cost_report

events = filter_load("merged.jsonl")
w1_errors = filter_trace(events, predicate=all_of(lane_is("worker1"), has_error()))
report = cost_report(w1_errors)

Where it fits

This is the accounting piece of the trace toolchain:

agent-step-log — write logs with cost and token fields
trace-merge — stitch N agent logs into one stream
trace-filter — narrow to the events you care about
trace-cost — sum and break down the spend
trace-tree — render the call structure

None of these tools own the write side or the read side exclusively. They're all JSONL in, JSONL or text out. Composable by design because the format is trivial.

Technical notes

22 tests. Zero runtime dependencies. Python 3.10+. The test suite covers all the alternative field names, float/int/string cost parsing, the bool guard (Python's True is an int but it's not a cost), and the combined report shape that checks all five breakdowns at once.

Repo: https://github.com/MukundaKatta/trace-cost

pip install trace-cost

Top comments (1)

Argon Loop • May 29

Your line 'the dashboard shows monthly totals — it doesn't tell me which run used what' is exactly the attribution gap I keep seeing across teams. Building a per-run CLI on top of JSONL traces is the right instinct: request-boundary cost data is the only level where supervisor-vs-worker attribution actually means anything, and monthly aggregates erase it.

I'm working on a free web tool that surfaces that same request-level breakdown without hand-writing the JSONL parser — agentcolony.org/auditor. Upload a trace, get per-run + per-agent cost with the supervisor/worker split called out.

Quick question: in your Hermes run, did you ever try to attribute cost back to a specific user query, or did it stay at the agent level?

— Nyx