Radoslav Tsvetkov

Posted on May 14

The trust pipeline: three commands to run before merging an AI-assisted change

#ai #rust #devops #security

The most common failure mode I see when teams adopt AI coding agents is not a bad diff. It is a good diff that no one can defend. The agent ran. The session closed. Three days later, somebody asks how the change came to be, and there is nothing to point at.

This article is about closing that gap with three commands. Akmon's trust pipeline is audit verify, evidence verify, and slo verify. Each one is fast, deterministic, and gates cleanly in CI. With them, "the agent did it" stops being a hand wave and becomes an artifact.

The code in this post uses real commands from Akmon v2.0.0.

What "trust pipeline" means

Every Akmon session writes two files when it ends.

.akmon/audit/<session-id>.jsonl: a tamper-evident audit chain of every prompt, model response, tool call, and policy decision.
.akmon/evidence/<session-id>.json: a structured evidence summary, with replay metadata and a hash that links back to the audit chain.

Three commands take those files and produce signals you can trust.

# 1. Audit chain integrity.
akmon audit verify .akmon/audit/<session-id>.jsonl

# 2. Evidence schema and linkage to the audit chain.
akmon evidence verify .akmon/evidence/<session-id>.json

# 3. Reliability metrics against thresholds.
akmon slo verify .akmon/evidence/<session-id>.json --strict

Each command exits 0 for pass, 1 for failure, and (for SLO) 2 for invalid input or config. Three exit codes, three crisp signals.

Step 1. Audit chain verification

akmon audit verify walks the JSONL chain and checks the cryptographic linkage between events. If a single byte was edited or a record was dropped, it fails.

$ akmon audit verify .akmon/audit/2025-05-06_abcd.jsonl
audit chain valid (events: 47, head: 5c1f...)
$ echo $?
0

For CI:

$ akmon --output json audit verify .akmon/audit/2025-05-06_abcd.jsonl | jq '.valid'
true

Why this matters: the audit chain is the substrate. If the chain does not verify, nothing downstream is meaningful. Pin this command at the start of any review workflow.

Step 2. Evidence verification

akmon evidence verify checks the evidence summary's schema, the replay metadata shape, and the linkage to the audit chain. Schema means the file matches the documented evidence schema for v2.0.0. Linkage means the recorded audit hash matches the actual head of the audit chain.

$ akmon evidence verify .akmon/evidence/2025-05-06_abcd.json
evidence valid (linked audit head: 5c1f...)

This step catches a class of failure most teams underestimate: an evidence file that was generated correctly but later got out of sync with its audit chain (for example, because of a partial copy). Evidence verify is cheap. Run it on every artifact you intend to share.

Step 3. SLO verification

akmon slo verify evaluates run reliability metrics against thresholds. Examples include tool success rate, replay determinism, attempt counts, and policy gate denials. The CLI accepts thresholds inline or from a TOML file:

$ akmon slo verify .akmon/evidence/2025-05-06_abcd.json --strict
all SLO thresholds met

$ akmon slo verify run.json --thresholds .akmon/slo.toml
threshold breached: tool_success_rate (0.93 < 0.95)
$ echo $?
1

You can also pass a single threshold inline:

$ akmon --output json slo verify run.json --min-tool-success-rate 0.95 | jq '.passed'
false

Strict mode treats skipped checks as failures. That is the right setting for CI gating.

Bonus step. SLO trend

Single runs are noisy. Trend mode compares the current run against a baseline window, so you catch regressions you would miss in a one-shot check.

$ akmon slo trend .akmon/evidence/2025-05-06_abcd.json \
  --baseline-dir .akmon/evidence/history \
  --window 20 \
  --strict
regression: median tool latency increased from 142ms to 318ms over baseline

With JSON output, this is a clean alert you can post to Slack or PagerDuty. Plug it in once and forget it.

Putting the pipeline in CI

A practical GitHub Actions snippet:

- name: Run Akmon task headlessly
  run: |
    akmon --yes --output json --task "$AKMON_TASK" | tee run.json
    echo "session_id=$(jq -r '.session_id' run.json)" >> $GITHUB_OUTPUT

- name: Verify audit chain
  run: akmon audit verify .akmon/audit/${{ steps.run.outputs.session_id }}.jsonl

- name: Verify evidence
  run: akmon evidence verify .akmon/evidence/${{ steps.run.outputs.session_id }}.json

- name: Enforce SLO thresholds
  run: |
    akmon slo verify .akmon/evidence/${{ steps.run.outputs.session_id }}.json \
      --thresholds .akmon/slo.toml --strict

- name: Trend against baseline
  run: |
    akmon slo trend .akmon/evidence/${{ steps.run.outputs.session_id }}.json \
      --baseline-dir .akmon/evidence/history \
      --window 20 \
      --strict

If any step fails, the merge stops. If all four pass, the change has the artifacts behind it.

Picking the right SLO thresholds

Two patterns work in practice.

First, conservative thresholds for production policy profiles:

# .akmon/slo.toml
min_tool_success_rate = 0.97
max_replay_divergence_ratio = 0.0
max_provider_retry_attempts = 3
max_permission_denials = 0

Second, looser thresholds for exploratory work:

min_tool_success_rate = 0.85
max_replay_divergence_ratio = 0.05
max_provider_retry_attempts = 5

Keep the file in version control. When a threshold changes, the change is reviewable.

What this catches in real life

A quick list of incidents the pipeline has caught for me or my testers:

A flaky tool that was failing once in twenty calls. SLO trend caught it in a week.
A model that started retrying on transient 5xx errors after a vendor change. Audit chain still valid, evidence valid, but SLO max_provider_retry_attempts surfaced the new pattern.
A reviewer accidentally edited the JSONL file (added a newline). akmon audit verify failed loudly.
A copy-paste of the evidence file from one machine to another, where the audit file did not come along. Evidence verify caught the missing linkage.

None of these are exotic. Each one is the kind of failure that quietly degrades trust in AI-assisted work over a quarter.

Why this is different from a dashboard

Three reasons.

First, the pipeline runs in CI. Dashboards do not gate merges. Exit codes do.

Second, the artifacts are portable. The verifier on your machine is the verifier on the auditor's machine.

Third, the schema is fixed. Evidence verifies against a documented schema. SLO thresholds live in a TOML file. There is no dashboard to misconfigure between two clicks.

If you want both (the visual story and the gating), keep your existing observability. Akmon's trust pipeline runs alongside it.

Where this fits in the bigger picture

The trust pipeline is the first half of the answer. The second half is replay, which I cover in the next post in this series. Replay turns "the artifact verifies" into "we can re-execute the session and see if anything diverges". For now, the trust pipeline is enough to gate any AI-assisted change with confidence.

The repo is at github.com/radotsvetkov/akmon. The format spec is at github.com/radotsvetkov/agef. The site is at radotsvetkov.github.io/akmon.

If your team wants the AI productivity but cannot give up review discipline, three commands are a fair price.