Mukunda Rao Katta

Posted on May 25

Pipe your Hermes Agent audit log into Datadog in 5 lines (no OTel SDK required)

#devchallenge #hermesagentchallenge #agents #observability

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

I have a folder of JSONL files. Each one is the audit log from a different Hermes agent run. Tool calls, costs, errors, blocked egress, the works. They look like this:

{"ts":1779638601.262,"session_id":"abc12","kind":"session_open"}
{"ts":1779638601.265,"session_id":"abc12","kind":"tool_ok","tool":"locus.payments.charge","usd":4.99,"extra":{"latency_ms":12}}
{"ts":1779638601.266,"session_id":"abc12","kind":"budget_denied","tool":"locus.payments.charge","usd":7.0,"error":"budget exceeded: 11.99 > 10"}

I read these in the terminal with trace-tree. That works fine for one run. But once I had thirty runs I wanted them in a real tracing UI. Something with a search bar. Something where I could group by tool and see the slowest five from last week.

The standard answer is OpenTelemetry. Drop the OTel SDK into your agent, set up a tracer provider, configure a span processor, configure a batch exporter, point it at a collector. Then your live runs flow into Datadog or Grafana or Jaeger.

That is the right answer for new runs. It is the wrong answer for a folder of old runs.

I do not want to bolt the OTel SDK into a script I wrote last week. I do not want to re-execute a run that already happened just so I can capture spans. I have the data. It is sitting in a JSONL file. I just want to push it into the collector.

So I wrote trace-to-otel.

pip install trace-to-otel

What it does

It reads a JSONL audit log and produces an OTLP/JSON payload. That payload is the exact wire format an OpenTelemetry collector accepts on POST /v1/traces. You can write it to a file, you can pipe it through curl, or you can let the library POST it for you.

from trace_to_otel import jsonl_to_otlp

jsonl_to_otlp(
    src="runs/audit.jsonl",
    dst="runs/spans.otlp.json",
    service_name="my-hermes-agent",
    semconv="otel-genai",
)

That is the whole API for the common case. Five lines if you count the import.

If you want to send it straight to a collector instead of dumping a file:

from trace_to_otel import JsonlSource, OtlpExporter

src = JsonlSource("runs/audit.jsonl")
exporter = OtlpExporter(service_name="my-hermes-agent")
payload = exporter.spans_from(src)
exporter.post_to("http://localhost:4318/v1/traces", payload)

No SDK. No tracer provider. No span processor. Just a dict and a POST.

Before and after

The three lines of JSONL at the top of this post turn into this span (trimmed for the screen):

{
  "traceId": "a8b1f7c4d3e6a2b5c8d1f4e7a0b3c6d9",
  "spanId": "b9c2a3d4e5f60718",
  "name": "tool_ok.locus.payments.charge",
  "startTimeUnixNano": "1779638601265000000",
  "endTimeUnixNano": "1779638601277000000",
  "kind": 1,
  "status": {"code": 1},
  "attributes": [
    {"key": "agent.event.kind", "value": {"stringValue": "tool_ok"}},
    {"key": "tool.name", "value": {"stringValue": "locus.payments.charge"}},
    {"key": "session.id", "value": {"stringValue": "abc12"}},
    {"key": "gen_ai.usage.cost_usd", "value": {"doubleValue": 4.99}}
  ]
}

POST that at localhost:4318/v1/traces. Open Jaeger at localhost:16686. Search for service my-hermes-agent. The run shows up as a trace. The denied tool call shows up as a red span with status.code = 2 and an error.message attribute attached. The cost shows up as a queryable number. You can group by tool, sort by latency, alert on cost per run, all the things you already use a tracing UI for.

In Datadog it looks like a normal trace under APM. In Grafana Tempo it looks like a normal trace under TraceQL. The point is that any backend that speaks OTLP, which is most of them now, accepts it as input.

Why the OTel SDK is wrong for replay

The OTel SDK assumes you are instrumenting a live program. The tracer is global. The span context flows through the call stack. A span begins when the code enters a block and ends when it exits. You cannot easily say "I have a JSON record from yesterday and I want it to be a span that started at this nanosecond and ended at that nanosecond."

You can do it with low-level SDK APIs. You have to wire up a non-recording context, set a custom start time, set a custom end time, set attributes one by one. By the time you finish you have written more code than the entire OTLP/JSON encoding is, and you have made your replay tool depend on the SDK version your agent was instrumented with.

OTLP/JSON is small. The schema is a few dozen lines of public protobuf. Encoding a span by hand is twenty lines of Python. The library is six small files, no dependencies, and it ships a CLI.

One library, many input shapes

The reason a CLI like this can exist at all is that several of the small agent libraries I write end up emitting similar JSONL. agenttrace writes one shape. agentleash writes another. agentsnap writes a third. They all carry roughly the same concepts: a session id, a tool name, a cost, a latency, an optional error.

trace-to-otel accepts any of them. The parser hits a handful of aliases per field (kind or event or step, tool or tool_name, usd or cost_usd, etc.) and normalizes them into one Event shape. The exporter walks the events and emits spans. If your log shape is none of those, the row still flows through as an event kind with whatever fields it had.

Semantic conventions, two options

OpenTelemetry has a GenAI semantic convention. Arize has OpenInference. They are mostly the same idea with different attribute names. gen_ai.usage.cost_usd versus openinference.llm.token.cost. tool.name versus openinference.tool.name. Pick one at the CLI:

trace-to-otel runs/audit.jsonl out.otlp.json --semconv otel-genai
trace-to-otel runs/audit.jsonl out.otlp.json --semconv openinference

If you need to translate between the two on the Rust side after ingest, I have a sibling crate (otel-genai-bridge-rs) that does the same mapping for telemetry attributes flowing the other direction.

What this slots into

I write a lot of small agent libraries. They each do one thing. agenttrace records cost and latency. agentleash enforces budget and egress caps. agentsnap snapshots a run for regression tests. trace-tree prints the audit log as a terminal tree. trace-to-otel is the one that sends the same log to a real observability backend.

The whole stack assumes the audit log is the source of truth. Once you commit to that shape, you can render it, snapshot it, replay it, ship it to a collector, or pipe it through whatever else you want. The agent code does not change.

Try it

pip install trace-to-otel
python -m trace_to_otel examples/sample_audit.jsonl spans.otlp.json

Then point it at a collector:

trace-to-otel runs/audit.jsonl --post http://localhost:4318/v1/traces

Repo: github.com/MukundaKatta/trace-to-otel

Sibling libraries:

trace-tree for the terminal view
agenttrace for the audit log writer
otel-genai-bridge-rs for the semconv translation in Rust

It is MIT, zero deps, and small enough to read in one sitting.

DEV Community