The two-line Hermes agent logger I wish existed a month ago

#devchallenge #hermesagentchallenge #agents #observability

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

A month ago my Hermes agents were completely unobservable. When a run failed, I had a Python traceback and nothing else. No record of which steps had completed, how long each one had taken, what the model had said at step 4. If the process died at step 47 of 60, I had to restart from step 0.

I needed a step logger. I built a few in-line — a list of dicts that I json.dumps at the end. That works until the process dies. Then I built a file-backed version that writes on each step exit. That's the one that stuck. I packaged it as agent-step-log.

Two lines to add observability to any Hermes loop

from agent_step_log import StepLogger

log = StepLogger("runs/2026-05-24.jsonl")

# In your agent loop:
for task in tasks:
    with log.step("process_task") as step:
        step.input = task["query"]
        step.model = "hermes-3"
        result = call_hermes(task["query"])
        step.output = result
        step.cost_usd = 0.0012

When the with block exits, one JSON line is written to the file:

{
  "name": "process_task",
  "started_at": 1779638601.262,
  "duration_ms": 843,
  "run_id": "a3f7c2",
  "input": "What is the capital of France?",
  "model": "hermes-3",
  "output": "Paris",
  "cost_usd": 0.0012
}

The file is written line by line as steps complete, so tail -f works while the agent is still running.

Crash safety

The reason I went file-backed instead of in-memory is crash safety. If your Hermes agent calls an external API at step 47 and that call hangs, the Python process might eventually be killed by a timeout or OOM. With in-memory logging you lose everything. With agent-step-log, every completed step is already on disk.

log = StepLogger("runs/run.jsonl", fsync=True)

fsync=True adds a flush + fsync after each write. Slower, but the file on disk reflects every committed step even if the process is killed between steps.

Exceptions are captured and re-raised

If an exception fires inside a step block, the library writes the step record with an error field before re-raising:

with log.step("call_api") as step:
    step.tool_name = "fetch_prices"
    result = fetch_prices("AAPL")   # TimeoutError at step 47

# Written to disk:
# {"name": "call_api", "started_at": ..., "duration_ms": 12401, "error": "TimeoutError: upstream timed out"}
# Then TimeoutError propagates out of the with block normally.

This is the part that saved me the most debugging time. The step record on disk tells me exactly which step timed out, how long it waited, and what arguments were passed. The traceback tells me the line number. Together they close the loop without me having to add any extra try/except instrumentation.

Read back and summarize

After a run completes (or is interrupted):

from agent_step_log import read_log, summarize_log

steps = read_log("runs/2026-05-24.jsonl")
for step in steps:
    print(f"{step.name}: {step.duration_ms}ms, ${step.cost_usd:.4f}")

summary = summarize_log("runs/2026-05-24.jsonl")
print(f"Total: {summary.step_count} steps, ${summary.total_cost_usd:.4f}")

Works with the rest of the trace toolchain

The JSONL format that agent-step-log writes is the same format that the rest of my tools read:

trace-merge — merge logs from N agents into one chronological stream
trace-filter — filter events by lane, kind, time, or any field
trace-tree — render any JSONL as an ASCII call tree
tool-call-diff — compare two runs

The chain is: instrument your Hermes agent with agent-step-log, run it, then pass the output file through whichever analysis tool you need. No schema agreement required — just JSONL with a timestamp field.

Technical notes

30 tests. Zero runtime dependencies. Python 3.10+. The step context manager uses time.monotonic() for duration and time.time() for the wall clock started_at so duration is accurate even if the system clock adjusts mid-run. The run_id is a 6-character hex snippet from os.urandom(3) — short enough to type, long enough to distinguish runs in a directory of logs.

Repo: https://github.com/MukundaKatta/agent-step-log