I ran a small agent. Three steps. One web search, one summarize, one cite-check. I had budgeted maybe 12 cents.
The bill at the end of the run was $4.20.
I knew something was off but the per-call invoice line items were not telling me anything useful. They were just a list of messages.create calls. I needed to group them into the run that produced them and look at the cost shape.
That is the gap agenttrace-rs fills. It is a Rust crate that aggregates LLM calls into runs and gives you cost, latency, and a by-model breakdown.
The breakdown that surfaced the bug
use agenttrace::{Trace, Run};
let mut trace = Trace::new();
let run = trace.start_run("cite-check-agent");
run.record_call(claude_cost::estimate(&req1, &resp1));
run.record_call(claude_cost::estimate(&req2, &resp2));
run.record_call(claude_cost::estimate(&req3, &resp3));
// ... and so on for every tool result/follow-up step
let summary = run.finish();
println!("{}", summary.report());
The report it printed for the $4.20 run:
run: cite-check-agent duration: 38.4s total_cost_usd: 4.2031
calls: 11
p50_latency_ms: 2710
p95_latency_ms: 4920
by-model:
claude-opus-4-7: 9 calls $4.1880 avg_input_tok: 18,420 avg_output_tok: 540
claude-haiku-4: 2 calls $0.0151 avg_input_tok: 1,200 avg_output_tok: 180
by-step:
step_1_search: 1 call $0.0184 1,800 in 220 out
step_2_summarize: 1 call $0.0312 3,100 in 280 out
step_3_cite_check: 9 calls $4.1535 avg 22,400 in avg 510 out
Step 3 was supposed to be one call. It was nine. And the average input tokens were 22,400. That is the smoking gun.
What was actually happening
The cite-check step had a tool the model could call to fetch a source URL. When the model called the tool, I appended the tool result to the messages list and re-called messages.create. Standard pattern.
What I missed: every iteration was re-attaching the full prior history including the search results from step 1 and the summary from step 2. So call 4 had everything from calls 1-3 in its input. Call 5 had everything from calls 1-4. And so on. Input tokens grew linearly per call, total tokens grew quadratically over the step.
The model kept calling the tool again because the prompt was structured ambiguously. So I had an unbounded loop hidden behind a 9-iteration tool dance. O(n²) input tokens for n iterations.
The fix was small. I stopped re-attaching the full history on each tool turn and used a sliding window. Re-ran the same run cold:
run: cite-check-agent duration: 11.2s total_cost_usd: 0.1432
calls: 5
p50_latency_ms: 2200
p95_latency_ms: 3050
by-model:
claude-opus-4-7: 3 calls $0.1290
claude-haiku-4: 2 calls $0.0142
by-step:
step_1_search: 1 call $0.0181
step_2_summarize: 1 call $0.0308
step_3_cite_check: 3 calls $0.0943
14 cents. About 30x cheaper. I would not have found the bug without the by-step grouping.
What agenttrace actually does
use agenttrace::{Trace, Tag};
let mut trace = Trace::new();
let run = trace.start_run("my-agent");
run.tag("user_id", "u_8821");
run.tag("step", "search");
// for each LLM call
run.record(agenttrace::CallRecord {
model: "claude-opus-4-7".into(),
input_tokens: 1800,
output_tokens: 220,
cache_read_tokens: 0,
cache_write_tokens: 0,
latency_ms: 2710,
cost_usd: 0.0184,
tags: vec![Tag::step("search")],
});
let summary = run.finish();
trace.append(summary);
// serialize all runs
let json = serde_json::to_string(&trace.runs())?;
It is a thin aggregator. It does not call the API. It does not make pricing decisions. You feed it call records (typically computed from claude-cost or your own pricing function) and it composes them into a run with cost, p50/p95, and per-tag breakdowns.
Why p95 matters more than mean
avg_latency_ms lies. A run with one slow call (the model thought for 12 seconds, the rest returned in 2) shows a mean of about 4 seconds. The p95 shows the actual tail. For agents this is the number that tells you whether your user-facing experience is going to feel snappy or laggy. agenttrace exposes p50, p95, and p99 by default.
Composing with other crates
-
claude-costfor the per-call cost estimate (cache-aware). -
cachebenchto see the cache hit ratio across the run. -
llm-circuit-breakerto short-circuit a run when an upstream is degraded so you do not pay $4.20 to discover that.
A typical pipeline in our service looks like: cachebench records hit/miss ā claude-cost computes cost given hits ā agenttrace aggregates into a run summary.
What this does not solve
- It does not store traces durably.
Traceis in-memory. You serialize to disk or to a remote sink yourself. I do that with a one-lineserde_json::to_writerto a sqlite blob. - It does not visualize. There is no UI. You get JSON or text reports. If you want a flamegraph, pipe to your own viewer.
- It does not capture the request bodies. Pair with
agenttapfor that. agenttrace is the cost/latency layer, not the wire layer. - The tagging system is flat. There is no nested-span model. If you need that, OpenTelemetry is the right tool and
otel-genai-bridge-rscan translate between conventions.
The crate is about 600 lines of pure Rust. No async lock-in.
Repo: https://github.com/MukundaKatta/agenttrace-rs
crates.io: agenttrace = { package = "agenttrace-rs", version = "0.1" }
Part of a small Rust stack I publish for AI agent plumbing: cost, retry, breakers, repair, trace. Built piece by piece from real incidents.
Top comments (0)