DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

My Hermes agent spent how much? trace-cost for JSONL audit logs

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge.

I ran a Hermes multi-agent research run last Thursday. Three agents, a supervisor and two workers, 47 steps total. The run finished successfully. Then I looked at my API dashboard and asked: how much did that cost?

The dashboard shows monthly totals. It doesn't tell me which run used what. My JSONL logs had cost_usd on each step — I had put it there on purpose. But summing 47 lines of JSON by hand is not something I'm willing to do. I wrote a one-liner with jq. Then I wanted it broken down by model. Then by agent (lane). Then by step type. The jq one-liner got long.

That's trace-cost.

One command

python3 -m trace_cost run.jsonl
Enter fullscreen mode Exit fullscreen mode
total:     $0.034200
events:    47 (12 without cost)
tokens in: 48,200
tokens out:8,400
Enter fullscreen mode Exit fullscreen mode
python3 -m trace_cost merged.jsonl --by model
Enter fullscreen mode Exit fullscreen mode
by model:
  hermes-3                       $0.028000  (81.9%)
  gemini-flash                   $0.006200  (18.1%)
Enter fullscreen mode Exit fullscreen mode
python3 -m trace_cost merged.jsonl --by lane
Enter fullscreen mode Exit fullscreen mode
by lane:
  supervisor                     $0.008000  (23.4%)
  worker1                        $0.014000  (40.9%)
  worker2                        $0.012200  (35.7%)
Enter fullscreen mode Exit fullscreen mode

That last one told me something I didn't expect: worker1 was doing more LLM work than worker2. I had assumed they were symmetric. They weren't. I had given them different system prompt lengths. The cost breakdown surfaced that in one command.

Python API

from trace_cost import cost_report, load_jsonl

events = load_jsonl("merged.jsonl")
report = cost_report(events)

print(f"Total: ${report.total_usd:.4f}")
print(f"By model: {report.by_model}")
print(f"By step: {report.by_step}")
Enter fullscreen mode Exit fullscreen mode

CostReport is a dataclass with total_usd, tokens_in, tokens_out, by_model, by_lane, by_step, event_count, and zero_cost_events. You get a structured result you can inspect, serialize, or compare between runs.

Works with whatever field names you already use

The most useful thing I did was make trace-cost try multiple common field names without requiring config:

What Keys tried
Cost cost_usd, cost, price_usd, usd
Input tokens tokens_in, input_tokens, prompt_tokens
Output tokens tokens_out, output_tokens, completion_tokens
Model model, model_id, model_name
Step name name, step, kind, event_type, type

So it works on agenttrace output, agentleash output, agent-step-log output, and whatever scratch logs you wrote yourself, without a config file.

Pipe it with trace-filter

The real power is combining it with trace-filter to answer questions like "what did just the tool calls cost?":

from trace_filter import filter_trace, load_jsonl as filter_load
from trace_cost import cost_report

events = filter_load("merged.jsonl")
tool_calls = filter_trace(events, kind="tool_call")
report = cost_report(tool_calls)
print(f"Tool call cost: ${report.total_usd:.4f}")
Enter fullscreen mode Exit fullscreen mode

Or "what did worker1's errors cost me?" (yes, errors still cost tokens):

from trace_filter import filter_trace, has_error, all_of, lane_is
from trace_cost import cost_report

events = filter_load("merged.jsonl")
w1_errors = filter_trace(events, predicate=all_of(lane_is("worker1"), has_error()))
report = cost_report(w1_errors)
Enter fullscreen mode Exit fullscreen mode

Where it fits

This is the accounting piece of the trace toolchain:

  1. agent-step-log — write logs with cost and token fields
  2. trace-merge — stitch N agent logs into one stream
  3. trace-filter — narrow to the events you care about
  4. trace-cost — sum and break down the spend
  5. trace-tree — render the call structure

None of these tools own the write side or the read side exclusively. They're all JSONL in, JSONL or text out. Composable by design because the format is trivial.

Technical notes

22 tests. Zero runtime dependencies. Python 3.10+. The test suite covers all the alternative field names, float/int/string cost parsing, the bool guard (Python's True is an int but it's not a cost), and the combined report shape that checks all five breakdowns at once.

Repo: https://github.com/MukundaKatta/trace-cost

pip install trace-cost
Enter fullscreen mode Exit fullscreen mode

Top comments (0)