This is a submission for the Hermes Agent Challenge.
I was profiling a Hermes research agent last week. The run had 47 steps. My aggregate stats from trace-stats showed a p95 latency of 1.68 seconds — not terrible, but higher than I expected. The mean was 900ms. Something was pulling the tail up.
I could have sorted the steps by duration_ms and looked at the top of the list. That works. But I wanted something that would tell me which events were statistically anomalous — not just "the slowest ones" but "the ones that are outliers relative to the distribution." That's different when most events cluster around 900ms and one event takes 4.8 seconds.
That's trace-anomaly.
One command
python3 -m trace_anomaly run.jsonl duration_ms
field: duration_ms
events: 47
iqr: 250.0 (q1=650.0, q3=900.0)
fences: [275.0, 1275.0]
anomalies: 2
# event index value dir score name/kind
1 34 4800.0 high 14.10 tool_call
2 21 1890.0 high 2.46 tool_call
Event #34 took 4.8 seconds. It's 14 IQR units above the upper fence. Event #21 took 1.89 seconds — less extreme but still an outlier. Both were tool_call steps. I looked at the event payloads and found that both were calling an external search API that has rate limiting. The 4.8s step was a retry after a 429.
How IQR detection works
No ML, no trained model. Just statistics:
- Compute Q1 (25th percentile) and Q3 (75th percentile) of the distribution
- IQR = Q3 - Q1
- Lower fence = Q1 - 1.5 × IQR
- Upper fence = Q3 + 1.5 × IQR
- Flag events outside [lower_fence, upper_fence]
The 1.5 multiplier is Tukey's inner fence — the standard choice for "mild outliers." Use k=3.0 for "extreme outliers only":
report = detect_anomalies(events, "duration_ms", k=3.0)
The score tells you how many IQR units the anomalous value is from the fence. Score of 14 is very extreme. Score of 2.46 is mild. You can sort or filter by score:
for a in report.anomalies:
if a.score > 5.0:
print(f"Severe anomaly at event #{a.index}: {a.value:.0f}ms")
Python API
from trace_anomaly import detect_anomalies, load_jsonl
events = load_jsonl("run.jsonl")
report = detect_anomalies(events, "duration_ms")
print(f"IQR: {report.iqr:.0f}ms")
print(f"Fences: [{report.lower_fence:.0f}, {report.upper_fence:.0f}]")
print(f"Anomalies: {len(report.anomalies)}")
for a in report.anomalies:
# a.event is the full original event dict
name = a.event.get("name") or a.event.get("kind", "")
print(f" {name}: {a.value:.0f}ms (score={a.score:.2f})")
Works with any numeric field
The algorithm is field-agnostic. Apply it to cost if you want to find the unexpectedly expensive steps:
python3 -m trace_anomaly run.jsonl cost_usd
Or tokens if you want to find the steps that consumed far more context than expected:
python3 -m trace_anomaly run.jsonl tokens_in
Combine with trace-filter to scope the analysis
Find anomalous tool calls only, not session lifecycle events:
from trace_filter import filter_trace, load_jsonl as filter_load, kind_is
from trace_anomaly import detect_anomalies
events = filter_load("merged.jsonl")
tool_calls = filter_trace(events, predicate=kind_is("tool_call"))
report = detect_anomalies(tool_calls, "duration_ms")
This gives you a tighter IQR because you're comparing tool calls against other tool calls, not against fast session_open events that would pull the distribution down.
What it does not do
It doesn't predict which events will be anomalous in the future. It doesn't train a model. It has no configuration beyond k. For a 47-step trace that runs in under a second, it's ready to use with zero setup.
If all events have the same value (IQR = 0), it reports no anomalies and returns a report with an empty anomaly list. You can check report.iqr == 0 if you want to handle that case explicitly.
Technical notes
19 tests. Zero runtime dependencies. Python 3.10+. The test suite covers the IQR computation on a known dataset, the k parameter effect on fences, identical-value non-detection, high and low outliers, anomaly score ordering, index correctness, and the non-numeric value handling (booleans, non-numeric strings, missing fields).
Repo: https://github.com/MukundaKatta/trace-anomaly
pip install trace-anomaly
Top comments (0)