Radoslav Tsvetkov

Posted on May 14

Replay AI coding sessions deterministically: the divergence detector for your repo

#ai #rust #devops #debugging

There is a moment in every AI coding workflow where you wish you could roll the tape back. The agent did something on Tuesday. By Thursday the model has shifted. The tool surface has shifted. You cannot reproduce the issue, and you cannot prove the change was sound.

Replay is the answer. Akmon's akmon replay command takes a recorded session and re-executes it against the providers and tools that were recorded with it. The output is a pass or fail. In strict mode, it is a CI gate. In default mode, it is a divergence report.

This article walks the actual command, the modes, the exit codes, and the failure patterns I have caught with it. Everything here is real Akmon v2.0.0 surface.

Why replay matters more than logs

A log file tells you what happened. Replay tells you whether the same session would happen again. The two are not the same.

If a session passes replay, the artifact is meaningful, even months later. If it diverges, the diff is the bug report. Either way you know more than you did from the log.

Three concrete jobs replay does well.

Detect regressions in your tool surface (a tool change that breaks an old session you already shipped).
Detect regressions in your provider behavior (a model upgrade that changes the agent's plan).
Validate that a redacted bundle is still a valid session.

In a regulated codebase, the first two are the difference between trust and a quarterly fight.

The command

The replay command is small.

akmon replay <session-id> [OPTIONS]

Common options:

akmon replay <session-id> \
  --journal <path> \
  --mode <default|strict> \
  --persist --persist-to <path> \
  --format <human|json>

A first run, in default mode:

$ akmon replay 550e8400-e29b-41d4-a716-446655440000
replay completed: passed=true (events: 47)

In strict mode for tighter mismatch handling:

$ akmon replay 550e8400-e29b-41d4-a716-446655440000 --mode strict
replay completed: passed=true (strict)

In CI with JSON:

$ akmon replay 550e8400-e29b-41d4-a716-446655440000 --format json | jq '.passed'
true

The exit code map

Code	Meaning
`0`	Replay completed with no divergences (`passed: true`)
`1`	Replay completed with divergences (`passed: false`)
`2`	Usage error (invalid arguments or invalid flag combinations)
`3`	I/O or environment error (missing source session, malformed source, unwritable persist target)

For CI gating, 0 is success, 1 is a hard failure, 2 and 3 are operator errors that should not be ignored.

Persisting a replay

If you want to keep the replay output for later review (good practice when investigating an incident), persist it to a target journal.

akmon replay 550e8400-e29b-41d4-a716-446655440000 \
  --persist \
  --persist-to ./replay-journal

The persisted output is itself a journal. It carries its own evidence and is itself replayable. That sounds recursive. It is. It is also exactly what you want for a forensic record.

Patterns that catch real bugs

A few patterns from my own work and from testers.

Pattern 1. Run replay nightly against last week's sessions

If your team runs Akmon in CI for a week, you have a sample of real sessions. A nightly job that replays them in strict mode is the cheapest regression detector you can build.

for s in $(ls .akmon/audit/ | tail -n 50); do
  sid=$(basename "$s" .jsonl)
  akmon replay "$sid" --mode strict --format json | jq --arg sid "$sid" '.passed | tostring + " " + $sid'
done

If a tool change broke an old session, you know.

Pattern 2. Run replay on PRs that touch tools

Tools are how AI agents reach the world. A PR that touches a tool wrapper, an MCP server, or a CI runner is the most common place to break replay. Wire a job that replays a small set of canonical sessions whenever those paths change.

Pattern 3. Replay a redacted bundle

After running akmon redact to produce a sanitized derivative bundle, replay confirms the bundle is still a valid session.

akmon redact <session-id> \
  --output sanitized.akmon \
  --object <object-hash> \
  --reason "PII removal"

akmon bundle import sanitized.akmon --verify-only

Verify-only checks integrity. For a fuller round trip, replay the redacted session inside a sandbox.

Pattern 4. Replay across providers

If you switch model providers (say, from a cloud provider to Ollama for sensitive work), replay tells you whether the new provider produces an equivalent session for the same prompt and tools. This is not a guarantee of behavior, but it is the closest thing in practice.

What replay cannot do

Three honest disclaimers.

First, replay does not reach across the network for fresh model calls by default. The point is to compare against recorded behavior. If you re-execute against a live model, that is a different test, and it should be labeled differently in your CI.

Second, divergence in default mode is informational. The session might still be useful even if a tool returned a slightly different output. Strict mode is the right setting when the test is "is this session reproducible".

Third, replay does not inspect the meaning of the output. It checks structural and field-level equivalence. Semantics are still your job.

Reading a divergence report

When replay fails, the JSON output is the place to start.

$ akmon replay <session-id> --mode strict --format json | jq
{
  "session_id": "550e8400-...",
  "passed": false,
  "mode": "strict",
  "divergences": [
    {
      "event_index": 14,
      "event_kind": "ToolCall",
      "tool_id": "mcp.orders/lookup_order",
      "field": "output_hash",
      "expected": "9c1f...",
      "actual": "a72b...",
      "note": "side effects hash differs"
    }
  ]
}

The structure points you straight at the change. From there, the workflow is the usual one: inspect the source session, inspect the live behavior, decide whether the change was intentional, and either fix the bug or update the recorded baseline.

Putting replay in CI

A small GitHub Actions snippet:

- name: Replay session in strict mode
  run: |
    akmon replay ${{ inputs.session_id }} --mode strict --format json | tee replay.json
    test "$(jq -r '.passed' replay.json)" = "true"

Pair this with the trust pipeline (audit verify, evidence verify, slo verify) and you have a very small set of commands that gate every AI-assisted change in your repo.

Storage and retention

Sessions are content-addressed. Identical content hashes once. A typical session lands in a few hundred kilobytes of audit JSONL plus a small evidence JSON. Bundles are larger because they carry the resolved objects, but compression and content addressing keep them under a megabyte for most engineering tasks.

For retention, follow the longest of (your contractual minimum, your regulatory minimum, your incident review window). Keep the audit JSONL with the same lifecycle as your CI artifacts.

Where replay fits in the bigger story

The trust pipeline gives you crisp signals from a single session. Replay gives you crisp signals across time. Together they turn an AI coding agent into a system you can defend in front of a reviewer.

The repo is at github.com/radotsvetkov/akmon. The format spec is at github.com/radotsvetkov/agef. The site is at radotsvetkov.github.io/akmon.

The next article in this series is the redaction workflow, the part of the kit you reach for the day before an external review.

DEV Community