This is a submission for the Hermes Agent Challenge.
I ran a Hermes multi-agent research run last Thursday. Three agents, a supervisor and two workers, 47 steps total. The run finished successfully. Then I looked at my API dashboard and asked: how much did that cost?
The dashboard shows monthly totals. It doesn't tell me which run used what. My JSONL logs had cost_usd on each step — I had put it there on purpose. But summing 47 lines of JSON by hand is not something I'm willing to do. I wrote a one-liner with jq. Then I wanted it broken down by model. Then by agent (lane). Then by step type. The jq one-liner got long.
That's trace-cost.
One command
python3 -m trace_cost run.jsonl
total: $0.034200
events: 47 (12 without cost)
tokens in: 48,200
tokens out:8,400
python3 -m trace_cost merged.jsonl --by model
by model:
hermes-3 $0.028000 (81.9%)
gemini-flash $0.006200 (18.1%)
python3 -m trace_cost merged.jsonl --by lane
by lane:
supervisor $0.008000 (23.4%)
worker1 $0.014000 (40.9%)
worker2 $0.012200 (35.7%)
That last one told me something I didn't expect: worker1 was doing more LLM work than worker2. I had assumed they were symmetric. They weren't. I had given them different system prompt lengths. The cost breakdown surfaced that in one command.
Python API
from trace_cost import cost_report, load_jsonl
events = load_jsonl("merged.jsonl")
report = cost_report(events)
print(f"Total: ${report.total_usd:.4f}")
print(f"By model: {report.by_model}")
print(f"By step: {report.by_step}")
CostReport is a dataclass with total_usd, tokens_in, tokens_out, by_model, by_lane, by_step, event_count, and zero_cost_events. You get a structured result you can inspect, serialize, or compare between runs.
Works with whatever field names you already use
The most useful thing I did was make trace-cost try multiple common field names without requiring config:
| What | Keys tried |
|---|---|
| Cost |
cost_usd, cost, price_usd, usd
|
| Input tokens |
tokens_in, input_tokens, prompt_tokens
|
| Output tokens |
tokens_out, output_tokens, completion_tokens
|
| Model |
model, model_id, model_name
|
| Step name |
name, step, kind, event_type, type
|
So it works on agenttrace output, agentleash output, agent-step-log output, and whatever scratch logs you wrote yourself, without a config file.
Pipe it with trace-filter
The real power is combining it with trace-filter to answer questions like "what did just the tool calls cost?":
from trace_filter import filter_trace, load_jsonl as filter_load
from trace_cost import cost_report
events = filter_load("merged.jsonl")
tool_calls = filter_trace(events, kind="tool_call")
report = cost_report(tool_calls)
print(f"Tool call cost: ${report.total_usd:.4f}")
Or "what did worker1's errors cost me?" (yes, errors still cost tokens):
from trace_filter import filter_trace, has_error, all_of, lane_is
from trace_cost import cost_report
events = filter_load("merged.jsonl")
w1_errors = filter_trace(events, predicate=all_of(lane_is("worker1"), has_error()))
report = cost_report(w1_errors)
Where it fits
This is the accounting piece of the trace toolchain:
- agent-step-log — write logs with cost and token fields
- trace-merge — stitch N agent logs into one stream
- trace-filter — narrow to the events you care about
- trace-cost — sum and break down the spend
- trace-tree — render the call structure
None of these tools own the write side or the read side exclusively. They're all JSONL in, JSONL or text out. Composable by design because the format is trivial.
Technical notes
22 tests. Zero runtime dependencies. Python 3.10+. The test suite covers all the alternative field names, float/int/string cost parsing, the bool guard (Python's True is an int but it's not a cost), and the combined report shape that checks all five breakdowns at once.
Repo: https://github.com/MukundaKatta/trace-cost
pip install trace-cost
Top comments (0)