<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Radoslav Tsvetkov</title>
    <description>The latest articles on DEV Community by Radoslav Tsvetkov (@radotsvetkov).</description>
    <link>https://dev.to/radotsvetkov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873179%2Ffec4dcd5-6606-4a6b-a397-76c98c39d6b0.png</url>
      <title>DEV Community: Radoslav Tsvetkov</title>
      <link>https://dev.to/radotsvetkov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/radotsvetkov"/>
    <language>en</language>
    <item>
      <title>AGEF explained: a portable evidence format for AI agent sessions</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:19:44 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/agef-explained-a-portable-evidence-format-for-ai-agent-sessions-40fn</link>
      <guid>https://dev.to/radotsvetkov/agef-explained-a-portable-evidence-format-for-ai-agent-sessions-40fn</guid>
      <description>&lt;p&gt;If you ship AI-assisted code in a regulated codebase and somebody asks "show me what the agent did", you have about a week before that question turns into a finding. The data exists somewhere. It is not in a single shape. It is rarely portable. It is almost never tamper evident.&lt;/p&gt;

&lt;p&gt;AGEF is the spec I wrote to fix that. It stands for Agent Governance Evidence Format. The current text is &lt;code&gt;v0.1.1&lt;/code&gt; (pre-stable), with wire format &lt;code&gt;agef_version: "0.1"&lt;/code&gt;. The repo is at &lt;code&gt;github.com/radotsvetkov/agef&lt;/code&gt;. Spec text is CC BY 4.0. Code is Apache-2.0. Akmon is the reference implementation; &lt;code&gt;akmon-journal&lt;/code&gt; is a Substrate Profile, and Akmon Phase 4 brings full Bundle Profile support.&lt;/p&gt;

&lt;p&gt;This article walks the spec end to end, with code, and with the design choices I made.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AGEF actually is
&lt;/h2&gt;

&lt;p&gt;AGEF defines how one AI agent session can be represented as a portable, tamper-evident bundle. A session is a logical run from &lt;code&gt;SessionStart&lt;/code&gt; to &lt;code&gt;SessionEnd&lt;/code&gt;. The bundle captures every event in order, with cryptographic linkage and content-addressed payloads.&lt;/p&gt;

&lt;p&gt;The bundle is a &lt;code&gt;tar.zst&lt;/code&gt; archive with three top-level paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;manifest.json&lt;/code&gt;, a small UTF-8 JSON file with sorted keys and LF newlines.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events.bin&lt;/code&gt;, an ordered stream of length-delimited canonical CBOR event records.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt;, a directory of content-addressed object files (one per hash).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the whole shape. Verifiers ignore unknown non-normative files unless explicitly configured to reject them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The manifest
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agef_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"producer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"akmon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2.0.0"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"session"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"550e8400-e29b-41d4-a716-446655440000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"head"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5c1f8b2a3d4e7f6a8b1d3e2c4f6a8b9d2c3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-06T09:14:02Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ended_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-06T09:14:18Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hash_algorithm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A reader who has never seen the producer can answer four things from this object: which version of the format, who wrote it, when, and how big it is. Writers must keep counts honest. Readers must reject malformed or incomplete required fields.&lt;/p&gt;

&lt;p&gt;Default hash algorithm is &lt;code&gt;sha256&lt;/code&gt;. Readers must support &lt;code&gt;sha256&lt;/code&gt; and may support &lt;code&gt;blake3&lt;/code&gt;. Bundles that declare unsupported algorithms must be rejected. Hex appears only in the manifest (&lt;code&gt;session.head&lt;/code&gt;) and in object filenames. Hashes inside CBOR-encoded events are 32-byte byte strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  The events stream
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;events.bin&lt;/code&gt; is the substance. It is a sequence of records, each framed by a 4-byte unsigned big-endian length prefix followed by exactly that many bytes of canonical CBOR encoding one event.&lt;/p&gt;

&lt;p&gt;Length-delimited framing was a deliberate trade. It supports partial recovery from truncation and lets a verifier scan deterministically. Producers commit to canonical CBOR per RFC 8949.&lt;/p&gt;

&lt;p&gt;Each event encodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;parents&lt;/code&gt;, an array of zero or more parent event hashes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kind&lt;/code&gt;, one of the closed event kinds in v0.1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;emitted_at&lt;/code&gt;, a CBOR tag 1 timestamp (integer epoch seconds, or floating point for sub-second precision).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sequence&lt;/code&gt;, monotonic per session, starting at 0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Linkage rules are strict in v0.1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sequence&lt;/code&gt; starts at 0 and increases by exactly 1 per event.&lt;/li&gt;
&lt;li&gt;Every event except &lt;code&gt;SessionStart&lt;/code&gt; has at least one parent.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SessionStart&lt;/code&gt; has exactly zero parents.&lt;/li&gt;
&lt;li&gt;Every non-&lt;code&gt;SessionStart&lt;/code&gt; event in v0.1 has exactly one parent, and that parent is the hash of the immediately preceding event by sequence. Multi-parent events are reserved for future versions.&lt;/li&gt;
&lt;li&gt;Event hashes are computed over canonical CBOR bytes of the full event envelope.&lt;/li&gt;
&lt;li&gt;Event ordering in &lt;code&gt;events.bin&lt;/code&gt; matches &lt;code&gt;sequence&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Event kinds in v0.1 (closed set)
&lt;/h2&gt;

&lt;p&gt;Readers must recognize exactly these kinds in v0.1. Unknown kinds are rejected.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SessionStart&lt;/code&gt;: opens the session, fields &lt;code&gt;cwd_hash&lt;/code&gt; and &lt;code&gt;config_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;UserTurn&lt;/code&gt;: a user prompt, field &lt;code&gt;prompt_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ProviderCall&lt;/code&gt;: a model provider call, field &lt;code&gt;provider_id&lt;/code&gt;, an &lt;code&gt;attempts[]&lt;/code&gt; array of &lt;code&gt;AttemptRecord&lt;/code&gt;, optional &lt;code&gt;stream_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ToolCall&lt;/code&gt;: a tool execution, fields &lt;code&gt;tool_id&lt;/code&gt;, &lt;code&gt;input_hash&lt;/code&gt;, &lt;code&gt;output_hash&lt;/code&gt;, optional &lt;code&gt;side_effects_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RetrievalCall&lt;/code&gt;: a retrieval, fields &lt;code&gt;index_id&lt;/code&gt;, &lt;code&gt;query_hash&lt;/code&gt;, &lt;code&gt;results_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PermissionGate&lt;/code&gt;: a policy decision, fields &lt;code&gt;policy_id&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt; (string, recommended lowercase verbs like &lt;code&gt;allowed&lt;/code&gt;, &lt;code&gt;denied&lt;/code&gt;, &lt;code&gt;deferred&lt;/code&gt;), &lt;code&gt;context_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AssistantTurn&lt;/code&gt;: an assistant message, fields &lt;code&gt;message_hash&lt;/code&gt;, optional &lt;code&gt;tool_calls_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SessionEnd&lt;/code&gt;: closes the session, optional &lt;code&gt;summary_hash&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;ProviderCall.attempts[]&lt;/code&gt; preserves chronological order. Each &lt;code&gt;AttemptRecord&lt;/code&gt; has &lt;code&gt;attempt_number&lt;/code&gt; (1-indexed), &lt;code&gt;started_at&lt;/code&gt;, &lt;code&gt;ended_at&lt;/code&gt;, a closed &lt;code&gt;AttemptStatus&lt;/code&gt;, &lt;code&gt;request_hash&lt;/code&gt;, optional &lt;code&gt;response_hash&lt;/code&gt;, optional &lt;code&gt;stream_hash&lt;/code&gt;, optional &lt;code&gt;error_message&lt;/code&gt;. &lt;code&gt;AttemptStatus&lt;/code&gt; is one of &lt;code&gt;Success&lt;/code&gt;, &lt;code&gt;RateLimited&lt;/code&gt;, &lt;code&gt;NetworkError&lt;/code&gt;, &lt;code&gt;ServerError&lt;/code&gt;, &lt;code&gt;ClientError&lt;/code&gt;, &lt;code&gt;Cancelled&lt;/code&gt;, or &lt;code&gt;Other(string)&lt;/code&gt;. v0.1 readers reject unknown variants.&lt;/p&gt;

&lt;p&gt;Two design notes worth calling out.&lt;/p&gt;

&lt;p&gt;First, &lt;code&gt;PermissionGate.decision&lt;/code&gt; is open in v0.1. The spec recommends lowercase verbs but does not close the set. Closing it is a likely v1.0 change. If you are a producer, follow the recommendation now to make the v1 transition free.&lt;/p&gt;

&lt;p&gt;Second, &lt;code&gt;AttemptStatus&lt;/code&gt; and &lt;code&gt;EventKind&lt;/code&gt; are intentionally closed. The cost is that adding a new variant is a normative spec change. The benefit is that two implementations cannot legitimately disagree on the meaning of a record.&lt;/p&gt;

&lt;h2&gt;
  
  
  The objects directory
&lt;/h2&gt;

&lt;p&gt;Every object referenced by an event hash field exists as a file at &lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt; where &lt;code&gt;&amp;lt;hex&amp;gt;&lt;/code&gt; is the lowercase hex digest for the active hash algorithm. Object bytes hash to their filename digest. Objects are opaque bytes; AGEF v0.1 does not require MIME metadata.&lt;/p&gt;

&lt;p&gt;A typical bundle has many small objects (prompts, tool inputs, tool outputs, file diffs) and a few large ones (full file contents). Storage is efficient because identical content hashes once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hashing rules
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sha256&lt;/span&gt;
&lt;span class="na"&gt;optional&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;blake3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Readers must support &lt;code&gt;sha256&lt;/code&gt;. They may support &lt;code&gt;blake3&lt;/code&gt;. Within CBOR-encoded events, hashes must be encoded as CBOR byte strings (major type 2) of length 32 for both algorithms. Hex string representation is used only in &lt;code&gt;manifest.json&lt;/code&gt; (&lt;code&gt;session.head&lt;/code&gt;) and in object filenames. Once you internalize that, the rest of the spec falls into place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serialization rules in one paragraph
&lt;/h2&gt;

&lt;p&gt;Event payloads in &lt;code&gt;events.bin&lt;/code&gt; use canonical CBOR (RFC 8949). &lt;code&gt;manifest.json&lt;/code&gt; is UTF-8 JSON with sorted keys and LF endings. Timestamps in CBOR-encoded events use CBOR tag 1. Producers may emit either integer epoch seconds or floating point; readers must accept both. The reference implementation &lt;code&gt;akmon-journal&lt;/code&gt; v0.1 emits integer epoch seconds. Timestamps in &lt;code&gt;manifest.json&lt;/code&gt; use RFC3339 strings. Implementations may use any internal storage format; AGEF rules apply only to the bytes emitted in &lt;code&gt;events.bin&lt;/code&gt; and to the bytes used for event hashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification procedure
&lt;/h2&gt;

&lt;p&gt;A conforming verifier runs this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract the archive.&lt;/li&gt;
&lt;li&gt;Parse &lt;code&gt;manifest.json&lt;/code&gt;. Reject on schema or version failure.&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;events.bin&lt;/code&gt; using the 4-byte length-delimited framing.&lt;/li&gt;
&lt;li&gt;For each event: decode canonical CBOR, recompute event hash, verify &lt;code&gt;sequence&lt;/code&gt; monotonicity, verify all &lt;code&gt;parents&lt;/code&gt; resolve to previously seen events, verify referenced content hashes resolve to files in &lt;code&gt;objects/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For each referenced object: read bytes, hash with &lt;code&gt;manifest.hash_algorithm&lt;/code&gt;, compare to filename digest.&lt;/li&gt;
&lt;li&gt;Confirm &lt;code&gt;manifest.event_count&lt;/code&gt; equals decoded event count, &lt;code&gt;manifest.object_count&lt;/code&gt; equals object file count, &lt;code&gt;manifest.session.head&lt;/code&gt; equals terminal event hash, and &lt;code&gt;SessionStart&lt;/code&gt; is a reachable ancestor of head.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Default operation fails on the first invariant violation. Implementations may offer a "report-all" mode for diagnostics. They must not claim successful verification unless all required checks pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conformance profiles
&lt;/h2&gt;

&lt;p&gt;AGEF v0.1 defines two profiles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bundle Profile, an implementation that produces or consumes AGEF bundles.&lt;/li&gt;
&lt;li&gt;Substrate Profile, an implementation that maintains an AGEF-compatible content-addressed event journal but does not necessarily emit bundles directly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Substrate Profile must be able to produce Bundle Profile output through an export pathway when required.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;akmon-journal&lt;/code&gt; is currently a Substrate Profile. Akmon Phase 4 introduces Bundle Profile capability.&lt;/p&gt;

&lt;p&gt;If you are an implementer, knowing which profile you are claiming is the most important conformance question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Producing AGEF in practice
&lt;/h2&gt;

&lt;p&gt;If you use Akmon, you already produce AGEF. Sessions land in &lt;code&gt;.akmon/audit/&amp;lt;session-id&amp;gt;.jsonl&lt;/code&gt; and &lt;code&gt;.akmon/evidence/&amp;lt;session-id&amp;gt;.json&lt;/code&gt;. With Phase 4 export, the same session is portable as a &lt;code&gt;.akmon&lt;/code&gt; bundle.&lt;/p&gt;

&lt;p&gt;If you build your own producer, the spec lists the libraries known to produce canonical CBOR with the right configuration: &lt;code&gt;ciborium&lt;/code&gt; for Rust, &lt;code&gt;fxamacker/cbor&lt;/code&gt; for Go, &lt;code&gt;cbor2&lt;/code&gt; for Python, &lt;code&gt;cbor-x&lt;/code&gt; for JavaScript and TypeScript. Validate canonical encoding behavior with test vectors before claiming conformance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading and verifying AGEF
&lt;/h2&gt;

&lt;p&gt;For Akmon-produced bundles, the verification commands are part of the trust pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify audit chain integrity for the live session journal&lt;/span&gt;
akmon audit verify .akmon/audit/&amp;lt;session-id&amp;gt;.jsonl

&lt;span class="c"&gt;# Verify the evidence summary plus its linkage to the audit chain&lt;/span&gt;
akmon evidence verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json

&lt;span class="c"&gt;# Once Phase 4 export is on hand, verify a bundle on import&lt;/span&gt;
akmon bundle import evidence.akmon &lt;span class="nt"&gt;--verify-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Independent verifier implementations are welcome. The spec is small enough to implement in a weekend if you start with a CBOR library that supports canonical encoding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security considerations
&lt;/h2&gt;

&lt;p&gt;AGEF provides tamper evidence and portability. It does not provide identity attribution by itself. If producer trust matters in your environment, layer external signing on top of the bundle. Bundles may contain sensitive content; storage and sharing controls are the operator's job. Verification proves integrity, not semantic correctness.&lt;/p&gt;

&lt;p&gt;For sharing, the redaction flow is the right answer. With Akmon:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;object-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"Removed customer name before audit handoff"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces a derivative bundle in which the targeted objects are replaced by canonical CBOR sentinels. The sentinel payload records the original hash, original size, the supplied &lt;code&gt;reason&lt;/code&gt;, and a &lt;code&gt;redacted_at&lt;/code&gt; timestamp. The chain still verifies because the sentinel hashes correctly, and the audit trail makes the redaction explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compatibility and evolution
&lt;/h2&gt;

&lt;p&gt;v0.x is pre-stable. Breaking changes are permitted. Future versions should preserve forward migration guidance. v1.0 is the first stable major. New event kinds in future majors must be version-gated. v0.1 readers must not silently ignore unknown required semantics.&lt;/p&gt;

&lt;p&gt;If you build against v0.1, plan for one migration to v1.0. If you are a careful producer, the migration is small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a new format
&lt;/h2&gt;

&lt;p&gt;Three reasons.&lt;/p&gt;

&lt;p&gt;First, no shared shape exists today. Every framework, every vendor, every gateway writes its own. A reviewer cannot move between them. The cost of a new format is small compared to the cost of having no shared format.&lt;/p&gt;

&lt;p&gt;Second, evidence has a different audience than observability. Observability is for engineers in the moment. Evidence is for reviewers, regulators, customers, and the version of you reading the file in a year. Different shape, different guarantees, different expectations.&lt;/p&gt;

&lt;p&gt;Third, regulated engineering needs an answer that does not depend on a vendor dashboard. The portability requirement is real. AGEF is built for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do this week
&lt;/h2&gt;

&lt;p&gt;If you build AI tooling, three small steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read &lt;code&gt;SPEC.md&lt;/code&gt; end to end. It is short on purpose.&lt;/li&gt;
&lt;li&gt;Implement a Bundle Profile verifier in your language of choice. Use the test vectors that ship in the &lt;code&gt;examples/minimal-bundle&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt;Open an issue on the AGEF repo. Tell us what your runtime emits today and where the spec helps or gets in the way.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you ship a coding agent, the smallest first step is one binary and the trust pipeline. The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Standards become standards by being used. Implementations are welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>rust</category>
      <category>security</category>
    </item>
    <item>
      <title>The trust pipeline: three commands to run before merging an AI-assisted change</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:19:27 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/the-trust-pipeline-three-commands-to-run-before-merging-an-ai-assisted-change-1ple</link>
      <guid>https://dev.to/radotsvetkov/the-trust-pipeline-three-commands-to-run-before-merging-an-ai-assisted-change-1ple</guid>
      <description>&lt;p&gt;The most common failure mode I see when teams adopt AI coding agents is not a bad diff. It is a good diff that no one can defend. The agent ran. The session closed. Three days later, somebody asks how the change came to be, and there is nothing to point at.&lt;/p&gt;

&lt;p&gt;This article is about closing that gap with three commands. Akmon's trust pipeline is &lt;code&gt;audit verify&lt;/code&gt;, &lt;code&gt;evidence verify&lt;/code&gt;, and &lt;code&gt;slo verify&lt;/code&gt;. Each one is fast, deterministic, and gates cleanly in CI. With them, "the agent did it" stops being a hand wave and becomes an artifact.&lt;/p&gt;

&lt;p&gt;The code in this post uses real commands from Akmon v2.0.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "trust pipeline" means
&lt;/h2&gt;

&lt;p&gt;Every Akmon session writes two files when it ends.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.akmon/audit/&amp;lt;session-id&amp;gt;.jsonl&lt;/code&gt;: a tamper-evident audit chain of every prompt, model response, tool call, and policy decision.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.akmon/evidence/&amp;lt;session-id&amp;gt;.json&lt;/code&gt;: a structured evidence summary, with replay metadata and a hash that links back to the audit chain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three commands take those files and produce signals you can trust.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Audit chain integrity.&lt;/span&gt;
akmon audit verify .akmon/audit/&amp;lt;session-id&amp;gt;.jsonl

&lt;span class="c"&gt;# 2. Evidence schema and linkage to the audit chain.&lt;/span&gt;
akmon evidence verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json

&lt;span class="c"&gt;# 3. Reliability metrics against thresholds.&lt;/span&gt;
akmon slo verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json &lt;span class="nt"&gt;--strict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each command exits &lt;code&gt;0&lt;/code&gt; for pass, &lt;code&gt;1&lt;/code&gt; for failure, and (for SLO) &lt;code&gt;2&lt;/code&gt; for invalid input or config. Three exit codes, three crisp signals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1. Audit chain verification
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;akmon audit verify&lt;/code&gt; walks the JSONL chain and checks the cryptographic linkage between events. If a single byte was edited or a record was dropped, it fails.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon audit verify .akmon/audit/2025-05-06_abcd.jsonl
audit chain valid &lt;span class="o"&gt;(&lt;/span&gt;events: 47, &lt;span class="nb"&gt;head&lt;/span&gt;: 5c1f...&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt;
0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon &lt;span class="nt"&gt;--output&lt;/span&gt; json audit verify .akmon/audit/2025-05-06_abcd.jsonl | jq &lt;span class="s1"&gt;'.valid'&lt;/span&gt;
&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters: the audit chain is the substrate. If the chain does not verify, nothing downstream is meaningful. Pin this command at the start of any review workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2. Evidence verification
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;akmon evidence verify&lt;/code&gt; checks the evidence summary's schema, the replay metadata shape, and the linkage to the audit chain. Schema means the file matches the documented evidence schema for v2.0.0. Linkage means the recorded audit hash matches the actual head of the audit chain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon evidence verify .akmon/evidence/2025-05-06_abcd.json
evidence valid &lt;span class="o"&gt;(&lt;/span&gt;linked audit &lt;span class="nb"&gt;head&lt;/span&gt;: 5c1f...&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step catches a class of failure most teams underestimate: an evidence file that was generated correctly but later got out of sync with its audit chain (for example, because of a partial copy). Evidence verify is cheap. Run it on every artifact you intend to share.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3. SLO verification
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;akmon slo verify&lt;/code&gt; evaluates run reliability metrics against thresholds. Examples include tool success rate, replay determinism, attempt counts, and policy gate denials. The CLI accepts thresholds inline or from a TOML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon slo verify .akmon/evidence/2025-05-06_abcd.json &lt;span class="nt"&gt;--strict&lt;/span&gt;
all SLO thresholds met
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon slo verify run.json &lt;span class="nt"&gt;--thresholds&lt;/span&gt; .akmon/slo.toml
threshold breached: tool_success_rate &lt;span class="o"&gt;(&lt;/span&gt;0.93 &amp;lt; 0.95&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt;
1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also pass a single threshold inline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon &lt;span class="nt"&gt;--output&lt;/span&gt; json slo verify run.json &lt;span class="nt"&gt;--min-tool-success-rate&lt;/span&gt; 0.95 | jq &lt;span class="s1"&gt;'.passed'&lt;/span&gt;
&lt;span class="nb"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strict mode treats skipped checks as failures. That is the right setting for CI gating.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus step. SLO trend
&lt;/h2&gt;

&lt;p&gt;Single runs are noisy. Trend mode compares the current run against a baseline window, so you catch regressions you would miss in a one-shot check.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon slo trend .akmon/evidence/2025-05-06_abcd.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--baseline-dir&lt;/span&gt; .akmon/evidence/history &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--window&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--strict&lt;/span&gt;
regression: median tool latency increased from 142ms to 318ms over baseline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With JSON output, this is a clean alert you can post to Slack or PagerDuty. Plug it in once and forget it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting the pipeline in CI
&lt;/h2&gt;

&lt;p&gt;A practical GitHub Actions snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Akmon task headlessly&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;akmon --yes --output json --task "$AKMON_TASK" | tee run.json&lt;/span&gt;
    &lt;span class="s"&gt;echo "session_id=$(jq -r '.session_id' run.json)" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify audit chain&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;akmon audit verify .akmon/audit/${{ steps.run.outputs.session_id }}.jsonl&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify evidence&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;akmon evidence verify .akmon/evidence/${{ steps.run.outputs.session_id }}.json&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enforce SLO thresholds&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;akmon slo verify .akmon/evidence/${{ steps.run.outputs.session_id }}.json \&lt;/span&gt;
      &lt;span class="s"&gt;--thresholds .akmon/slo.toml --strict&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Trend against baseline&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;akmon slo trend .akmon/evidence/${{ steps.run.outputs.session_id }}.json \&lt;/span&gt;
      &lt;span class="s"&gt;--baseline-dir .akmon/evidence/history \&lt;/span&gt;
      &lt;span class="s"&gt;--window 20 \&lt;/span&gt;
      &lt;span class="s"&gt;--strict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any step fails, the merge stops. If all four pass, the change has the artifacts behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Picking the right SLO thresholds
&lt;/h2&gt;

&lt;p&gt;Two patterns work in practice.&lt;/p&gt;

&lt;p&gt;First, conservative thresholds for production policy profiles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# .akmon/slo.toml&lt;/span&gt;
&lt;span class="py"&gt;min_tool_success_rate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.97&lt;/span&gt;
&lt;span class="py"&gt;max_replay_divergence_ratio&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;span class="py"&gt;max_provider_retry_attempts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;max_permission_denials&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second, looser thresholds for exploratory work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;min_tool_success_rate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;
&lt;span class="py"&gt;max_replay_divergence_ratio&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;
&lt;span class="py"&gt;max_provider_retry_attempts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep the file in version control. When a threshold changes, the change is reviewable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this catches in real life
&lt;/h2&gt;

&lt;p&gt;A quick list of incidents the pipeline has caught for me or my testers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A flaky tool that was failing once in twenty calls. SLO trend caught it in a week.&lt;/li&gt;
&lt;li&gt;A model that started retrying on transient 5xx errors after a vendor change. Audit chain still valid, evidence valid, but SLO &lt;code&gt;max_provider_retry_attempts&lt;/code&gt; surfaced the new pattern.&lt;/li&gt;
&lt;li&gt;A reviewer accidentally edited the JSONL file (added a newline). &lt;code&gt;akmon audit verify&lt;/code&gt; failed loudly.&lt;/li&gt;
&lt;li&gt;A copy-paste of the evidence file from one machine to another, where the audit file did not come along. Evidence verify caught the missing linkage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are exotic. Each one is the kind of failure that quietly degrades trust in AI-assisted work over a quarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is different from a dashboard
&lt;/h2&gt;

&lt;p&gt;Three reasons.&lt;/p&gt;

&lt;p&gt;First, the pipeline runs in CI. Dashboards do not gate merges. Exit codes do.&lt;/p&gt;

&lt;p&gt;Second, the artifacts are portable. The verifier on your machine is the verifier on the auditor's machine.&lt;/p&gt;

&lt;p&gt;Third, the schema is fixed. Evidence verifies against a documented schema. SLO thresholds live in a TOML file. There is no dashboard to misconfigure between two clicks.&lt;/p&gt;

&lt;p&gt;If you want both (the visual story and the gating), keep your existing observability. Akmon's trust pipeline runs alongside it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this fits in the bigger picture
&lt;/h2&gt;

&lt;p&gt;The trust pipeline is the first half of the answer. The second half is replay, which I cover in the next post in this series. Replay turns "the artifact verifies" into "we can re-execute the session and see if anything diverges". For now, the trust pipeline is enough to gate any AI-assisted change with confidence.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If your team wants the AI productivity but cannot give up review discipline, three commands are a fair price.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>MCP governance for an AI coding agent without breaking the audit chain</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:19:15 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/mcp-governance-for-an-ai-coding-agent-without-breaking-the-audit-chain-okp</link>
      <guid>https://dev.to/radotsvetkov/mcp-governance-for-an-ai-coding-agent-without-breaking-the-audit-chain-okp</guid>
      <description>&lt;p&gt;The Model Context Protocol gave AI agents a clean way to reach into systems. In a year it has become the default tool surface for serious agents. That is mostly good news. The mostly is the operative word.&lt;/p&gt;

&lt;p&gt;Without care, MCP servers fragment the audit story. Tool calls land in three places: in the agent's runtime, in the gateway, and in the MCP server's own logs. None of them line up. By the time a regulator asks what happened, you have three formats and zero answers.&lt;/p&gt;

&lt;p&gt;This post walks how I wire MCP into Akmon. The result is a single audit chain that captures every MCP tool call as a &lt;code&gt;ToolCall&lt;/code&gt; event, with deterministic linkage in the journal, and a reviewable AGEF bundle at the end.&lt;/p&gt;

&lt;p&gt;The commands and flags here are real Akmon v2.0.0 surface. I will note where Phase 4 features apply.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP audit logging is harder than it looks
&lt;/h2&gt;

&lt;p&gt;Three reasons.&lt;/p&gt;

&lt;p&gt;First, MCP is a fan out. A single agent session can call three MCP servers, each with its own tools, its own version, and its own logging story. If you log per server, you have three formats. If you log only at the agent level, you lose tool-level detail.&lt;/p&gt;

&lt;p&gt;Second, tool inputs are user data. Tool arguments are not metadata; they are the conversation. Logs leak content unless you redact. AGEF makes content addressing explicit, so the bundle can carry the inputs in a way that survives review.&lt;/p&gt;

&lt;p&gt;Third, the chain matters. A flat list of tool calls does not let you say "this file change was caused by this tool call which was caused by that model response". You need the chain. Akmon's audit chain and AGEF events are designed for that.&lt;/p&gt;

&lt;p&gt;Solve those three and your audit is fast. Skip them and your audit is a forensic project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring MCP into Akmon
&lt;/h2&gt;

&lt;p&gt;Akmon registers MCP servers with the &lt;code&gt;--mcp-server&lt;/code&gt; flag, repeatable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--mcp-server&lt;/span&gt; https://mcp.tools.internal/orders &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--mcp-server&lt;/span&gt; https://mcp.tools.internal/calendar &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"Open a ticket for the failing tests and link the related order"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every MCP tool the agent calls becomes a &lt;code&gt;ToolCall&lt;/code&gt; event in the audit chain. The event records &lt;code&gt;tool_id&lt;/code&gt;, &lt;code&gt;input_hash&lt;/code&gt;, &lt;code&gt;output_hash&lt;/code&gt;, and an optional &lt;code&gt;side_effects_hash&lt;/code&gt;. The hashes resolve to objects in the AGEF bundle, so the inputs and outputs are recoverable, content-addressed, and verifiable.&lt;/p&gt;

&lt;p&gt;For interactive runs, the same flag works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon chat &lt;span class="nt"&gt;--mcp-server&lt;/span&gt; https://mcp.tools.internal/orders
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For headless CI, combine MCP servers with a budget cap and JSON output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--mcp-server&lt;/span&gt; https://mcp.tools.internal/orders &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--mcp-server&lt;/span&gt; https://mcp.tools.internal/calendar &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--max-budget-usd&lt;/span&gt; 2.50 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"summarize failing tests and create a ticket"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have an organization that prefers a TOML manifest for MCP servers, encode them in a policy pack and pull from there. The pack belongs to the team. The merge order keeps it deterministic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What gets recorded for each MCP tool call
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;ToolCall&lt;/code&gt; event in AGEF v0.1 carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tool_id&lt;/code&gt;, a stable identifier for the tool.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;input_hash&lt;/code&gt;, hash of the canonical CBOR encoding of the input.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;output_hash&lt;/code&gt;, hash of the output.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;side_effects_hash&lt;/code&gt;, optional, when the tool changed the world.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full input and output bytes live in &lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt; files. The hash links the event to the bytes. A reviewer can recover the exact input and output by walking the hash to the file.&lt;/p&gt;

&lt;p&gt;A small example, viewed through &lt;code&gt;akmon inspect&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon inspect &amp;lt;session-id&amp;gt; &lt;span class="nt"&gt;--resolve&lt;/span&gt;
SessionStart  &lt;span class="nv"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:8a91...
UserTurn      &lt;span class="nv"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:7a91...
ProviderCall  &lt;span class="nv"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic &lt;span class="nv"&gt;attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Success
AssistantTurn &lt;span class="nv"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:b3c1...
ToolCall      &lt;span class="nv"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mcp.orders/lookup_order &lt;span class="nv"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:9c1f... &lt;span class="nv"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:c2a8...
PermissionGate &lt;span class="nv"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;write_safe &lt;span class="nv"&gt;decision&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;allowed
ToolCall      &lt;span class="nv"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mcp.calendar/create_event &lt;span class="nv"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:e7d4... &lt;span class="nv"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:f1a3...
AssistantTurn &lt;span class="nv"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:5b62...
SessionEnd    &lt;span class="nv"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;:0a13...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what an MCP-rich session looks like in evidence form. Every event has a hash linkage. Every input and output resolves to a file in &lt;code&gt;objects/&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Policy at the MCP boundary
&lt;/h2&gt;

&lt;p&gt;Policy in Akmon is a deterministic merge of profile, packs, project policy, and CLI override. For MCP, that means three useful levers.&lt;/p&gt;

&lt;p&gt;First, restrict the tool surface in the profile. &lt;code&gt;prod&lt;/code&gt; already has explicit-deny posture for side effects. You can layer named MCP tools on top.&lt;/p&gt;

&lt;p&gt;Second, write a pack that allows specific MCP servers and named tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# .akmon/policy-packs/org-mcp.toml&lt;/span&gt;
&lt;span class="nn"&gt;[mcp]&lt;/span&gt;
&lt;span class="py"&gt;allowed_servers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s"&gt;"https://mcp.tools.internal/orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s"&gt;"https://mcp.tools.internal/calendar"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[mcp.tools]&lt;/span&gt;
&lt;span class="py"&gt;"mcp.orders/lookup_order"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;allowed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;"mcp.orders/refund"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;allowed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;requires_approval&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;"mcp.calendar/create_event"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;allowed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;"mcp.calendar/delete_event"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;allowed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Third, see what is actually applied:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon policy show-effective &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/org-mcp.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Effective policy is print-able. That ends the "I think the rule is" conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redacting MCP-related content before sharing
&lt;/h2&gt;

&lt;p&gt;Some MCP tool inputs or outputs cannot leave the building. Akmon's &lt;code&gt;redact&lt;/code&gt; command produces a derivative bundle in which selected objects are replaced by canonical CBOR sentinels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;input-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;output-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"Removed customer name before audit handoff"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The chain still verifies because the sentinel hashes correctly. The redaction reason is recorded. The bundle can be shared with auditors without exposing the redacted content.&lt;/p&gt;

&lt;p&gt;Verify before sharing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon bundle import sanitized.akmon &lt;span class="nt"&gt;--verify-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Writing an MCP server that plays well with Akmon
&lt;/h2&gt;

&lt;p&gt;If you maintain an MCP server, three small choices make audit life easier downstream.&lt;/p&gt;

&lt;p&gt;First, make tool inputs and outputs canonicalizable. Avoid embedding wall-clock times in fields the agent does not control. Avoid random IDs in the response when the response could be deterministic.&lt;/p&gt;

&lt;p&gt;Second, expose a stable tool ID and a stable version. The &lt;code&gt;tool_id&lt;/code&gt; in AGEF is whatever the runtime emits. Namespacing your tools (&lt;code&gt;mcp.orders/lookup_order&lt;/code&gt;) keeps the bundle readable a year from now.&lt;/p&gt;

&lt;p&gt;Third, surface side effects explicitly. If a tool has a side effect, return enough metadata for &lt;code&gt;side_effects_hash&lt;/code&gt; to make sense (for example, the new resource ID, the affected entity, the change type).&lt;/p&gt;

&lt;p&gt;These are small DX choices that compound. They do not require any spec change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replay across MCP servers
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;akmon replay&lt;/code&gt; runs a session against the recorded providers and tools. For MCP-heavy sessions, replay surfaces divergence when an MCP server changed behavior in a way you did not intend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon replay &amp;lt;session-id&amp;gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; strict &lt;span class="nt"&gt;--format&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In strict mode, replay treats mismatches more aggressively. That is the right mode for CI gating after a deployment that touched an MCP server. If the replay diverges, you know about it before the next session.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical migration plan
&lt;/h2&gt;

&lt;p&gt;If you are adding MCP to an existing Akmon setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with one MCP server, registered via &lt;code&gt;--mcp-server&lt;/code&gt;. Run a few sessions interactively. Inspect the journal.&lt;/li&gt;
&lt;li&gt;Lock the allowed list in a pack. Run &lt;code&gt;akmon policy show-effective&lt;/code&gt; to confirm.&lt;/li&gt;
&lt;li&gt;Add a CI job that runs the trust pipeline (&lt;code&gt;audit verify&lt;/code&gt;, &lt;code&gt;evidence verify&lt;/code&gt;, &lt;code&gt;slo verify&lt;/code&gt;) on every session.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;slo trend&lt;/code&gt; against a baseline window of recent sessions. Alert on regression.&lt;/li&gt;
&lt;li&gt;When you are ready to share with an external reviewer, redact and export with &lt;code&gt;bundle export&lt;/code&gt; (Phase 4).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A short progression, weeks, not months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this leaves you
&lt;/h2&gt;

&lt;p&gt;After the migration, three things are true.&lt;/p&gt;

&lt;p&gt;First, every MCP tool call is in the audit chain. The chain verifies. The evidence verifies.&lt;/p&gt;

&lt;p&gt;Second, the policy is explicit and inspectable. Adding a new MCP server is a pack change reviewed in version control.&lt;/p&gt;

&lt;p&gt;Third, you can hand any session to a reviewer with a single bundle. The reviewer needs the AGEF format, not your stack.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;. The next article in the series goes deep on replay and divergence detection, which is where MCP-heavy sessions earn their keep most often.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>rust</category>
      <category>security</category>
    </item>
    <item>
      <title>Replay AI coding sessions deterministically: the divergence detector for your repo</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:18:57 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/replay-ai-coding-sessions-deterministically-the-divergence-detector-for-your-repo-19lf</link>
      <guid>https://dev.to/radotsvetkov/replay-ai-coding-sessions-deterministically-the-divergence-detector-for-your-repo-19lf</guid>
      <description>&lt;p&gt;There is a moment in every AI coding workflow where you wish you could roll the tape back. The agent did something on Tuesday. By Thursday the model has shifted. The tool surface has shifted. You cannot reproduce the issue, and you cannot prove the change was sound.&lt;/p&gt;

&lt;p&gt;Replay is the answer. Akmon's &lt;code&gt;akmon replay&lt;/code&gt; command takes a recorded session and re-executes it against the providers and tools that were recorded with it. The output is a pass or fail. In strict mode, it is a CI gate. In default mode, it is a divergence report.&lt;/p&gt;

&lt;p&gt;This article walks the actual command, the modes, the exit codes, and the failure patterns I have caught with it. Everything here is real Akmon v2.0.0 surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why replay matters more than logs
&lt;/h2&gt;

&lt;p&gt;A log file tells you what happened. Replay tells you whether the same session would happen again. The two are not the same.&lt;/p&gt;

&lt;p&gt;If a session passes replay, the artifact is meaningful, even months later. If it diverges, the diff is the bug report. Either way you know more than you did from the log.&lt;/p&gt;

&lt;p&gt;Three concrete jobs replay does well.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detect regressions in your tool surface (a tool change that breaks an old session you already shipped).&lt;/li&gt;
&lt;li&gt;Detect regressions in your provider behavior (a model upgrade that changes the agent's plan).&lt;/li&gt;
&lt;li&gt;Validate that a redacted bundle is still a valid session.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In a regulated codebase, the first two are the difference between trust and a quarterly fight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The command
&lt;/h2&gt;

&lt;p&gt;The replay command is small.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon replay &amp;lt;session-id&amp;gt; &lt;span class="o"&gt;[&lt;/span&gt;OPTIONS]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon replay &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--journal&lt;/span&gt; &amp;lt;path&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--mode&lt;/span&gt; &amp;lt;default|strict&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--persist&lt;/span&gt; &lt;span class="nt"&gt;--persist-to&lt;/span&gt; &amp;lt;path&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt; &amp;lt;human|json&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A first run, in default mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon replay 550e8400-e29b-41d4-a716-446655440000
replay completed: &lt;span class="nv"&gt;passed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;events: 47&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In strict mode for tighter mismatch handling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon replay 550e8400-e29b-41d4-a716-446655440000 &lt;span class="nt"&gt;--mode&lt;/span&gt; strict
replay completed: &lt;span class="nv"&gt;passed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;strict&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In CI with JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon replay 550e8400-e29b-41d4-a716-446655440000 &lt;span class="nt"&gt;--format&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.passed'&lt;/span&gt;
&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The exit code map
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Replay completed with no divergences (&lt;code&gt;passed: true&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Replay completed with divergences (&lt;code&gt;passed: false&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Usage error (invalid arguments or invalid flag combinations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;I/O or environment error (missing source session, malformed source, unwritable persist target)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For CI gating, &lt;code&gt;0&lt;/code&gt; is success, &lt;code&gt;1&lt;/code&gt; is a hard failure, &lt;code&gt;2&lt;/code&gt; and &lt;code&gt;3&lt;/code&gt; are operator errors that should not be ignored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Persisting a replay
&lt;/h2&gt;

&lt;p&gt;If you want to keep the replay output for later review (good practice when investigating an incident), persist it to a target journal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon replay 550e8400-e29b-41d4-a716-446655440000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--persist&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--persist-to&lt;/span&gt; ./replay-journal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The persisted output is itself a journal. It carries its own evidence and is itself replayable. That sounds recursive. It is. It is also exactly what you want for a forensic record.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns that catch real bugs
&lt;/h2&gt;

&lt;p&gt;A few patterns from my own work and from testers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1. Run replay nightly against last week's sessions
&lt;/h3&gt;

&lt;p&gt;If your team runs Akmon in CI for a week, you have a sample of real sessions. A nightly job that replays them in strict mode is the cheapest regression detector you can build.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;s &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; .akmon/audit/ | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 50&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;sid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$s&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; .jsonl&lt;span class="si"&gt;)&lt;/span&gt;
  akmon replay &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$sid&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; strict &lt;span class="nt"&gt;--format&lt;/span&gt; json | jq &lt;span class="nt"&gt;--arg&lt;/span&gt; sid &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$sid&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s1"&gt;'.passed | tostring + " " + $sid'&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a tool change broke an old session, you know.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2. Run replay on PRs that touch tools
&lt;/h3&gt;

&lt;p&gt;Tools are how AI agents reach the world. A PR that touches a tool wrapper, an MCP server, or a CI runner is the most common place to break replay. Wire a job that replays a small set of canonical sessions whenever those paths change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3. Replay a redacted bundle
&lt;/h3&gt;

&lt;p&gt;After running &lt;code&gt;akmon redact&lt;/code&gt; to produce a sanitized derivative bundle, replay confirms the bundle is still a valid session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;object-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"PII removal"&lt;/span&gt;

akmon bundle import sanitized.akmon &lt;span class="nt"&gt;--verify-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify-only checks integrity. For a fuller round trip, replay the redacted session inside a sandbox.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4. Replay across providers
&lt;/h3&gt;

&lt;p&gt;If you switch model providers (say, from a cloud provider to Ollama for sensitive work), replay tells you whether the new provider produces an equivalent session for the same prompt and tools. This is not a guarantee of behavior, but it is the closest thing in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What replay cannot do
&lt;/h2&gt;

&lt;p&gt;Three honest disclaimers.&lt;/p&gt;

&lt;p&gt;First, replay does not reach across the network for fresh model calls by default. The point is to compare against recorded behavior. If you re-execute against a live model, that is a different test, and it should be labeled differently in your CI.&lt;/p&gt;

&lt;p&gt;Second, divergence in default mode is informational. The session might still be useful even if a tool returned a slightly different output. Strict mode is the right setting when the test is "is this session reproducible".&lt;/p&gt;

&lt;p&gt;Third, replay does not inspect the meaning of the output. It checks structural and field-level equivalence. Semantics are still your job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading a divergence report
&lt;/h2&gt;

&lt;p&gt;When replay fails, the JSON output is the place to start.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;akmon replay &amp;lt;session-id&amp;gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; strict &lt;span class="nt"&gt;--format&lt;/span&gt; json | jq
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"session_id"&lt;/span&gt;: &lt;span class="s2"&gt;"550e8400-..."&lt;/span&gt;,
  &lt;span class="s2"&gt;"passed"&lt;/span&gt;: &lt;span class="nb"&gt;false&lt;/span&gt;,
  &lt;span class="s2"&gt;"mode"&lt;/span&gt;: &lt;span class="s2"&gt;"strict"&lt;/span&gt;,
  &lt;span class="s2"&gt;"divergences"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
    &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"event_index"&lt;/span&gt;: 14,
      &lt;span class="s2"&gt;"event_kind"&lt;/span&gt;: &lt;span class="s2"&gt;"ToolCall"&lt;/span&gt;,
      &lt;span class="s2"&gt;"tool_id"&lt;/span&gt;: &lt;span class="s2"&gt;"mcp.orders/lookup_order"&lt;/span&gt;,
      &lt;span class="s2"&gt;"field"&lt;/span&gt;: &lt;span class="s2"&gt;"output_hash"&lt;/span&gt;,
      &lt;span class="s2"&gt;"expected"&lt;/span&gt;: &lt;span class="s2"&gt;"9c1f..."&lt;/span&gt;,
      &lt;span class="s2"&gt;"actual"&lt;/span&gt;: &lt;span class="s2"&gt;"a72b..."&lt;/span&gt;,
      &lt;span class="s2"&gt;"note"&lt;/span&gt;: &lt;span class="s2"&gt;"side effects hash differs"&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The structure points you straight at the change. From there, the workflow is the usual one: inspect the source session, inspect the live behavior, decide whether the change was intentional, and either fix the bug or update the recorded baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting replay in CI
&lt;/h2&gt;

&lt;p&gt;A small GitHub Actions snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Replay session in strict mode&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;akmon replay ${{ inputs.session_id }} --mode strict --format json | tee replay.json&lt;/span&gt;
    &lt;span class="s"&gt;test "$(jq -r '.passed' replay.json)" = "true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pair this with the trust pipeline (&lt;code&gt;audit verify&lt;/code&gt;, &lt;code&gt;evidence verify&lt;/code&gt;, &lt;code&gt;slo verify&lt;/code&gt;) and you have a very small set of commands that gate every AI-assisted change in your repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage and retention
&lt;/h2&gt;

&lt;p&gt;Sessions are content-addressed. Identical content hashes once. A typical session lands in a few hundred kilobytes of audit JSONL plus a small evidence JSON. Bundles are larger because they carry the resolved objects, but compression and content addressing keep them under a megabyte for most engineering tasks.&lt;/p&gt;

&lt;p&gt;For retention, follow the longest of (your contractual minimum, your regulatory minimum, your incident review window). Keep the audit JSONL with the same lifecycle as your CI artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where replay fits in the bigger story
&lt;/h2&gt;

&lt;p&gt;The trust pipeline gives you crisp signals from a single session. Replay gives you crisp signals across time. Together they turn an AI coding agent into a system you can defend in front of a reviewer.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The next article in this series is the redaction workflow, the part of the kit you reach for the day before an external review.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>devops</category>
      <category>debugging</category>
    </item>
    <item>
      <title>A pragmatic threat model for AI coding agents, with controls you can ship today</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:18:44 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/a-pragmatic-threat-model-for-ai-coding-agents-with-controls-you-can-ship-today-3f58</link>
      <guid>https://dev.to/radotsvetkov/a-pragmatic-threat-model-for-ai-coding-agents-with-controls-you-can-ship-today-3f58</guid>
      <description>&lt;p&gt;There is a moment in every AI coding rollout where the question shifts from "can we make this work" to "what is the worst thing this can do". If you have not had that moment yet, this article will save you a quarter.&lt;/p&gt;

&lt;p&gt;The OWASP Top 10 for Agentic Applications, published in late 2025, is the cleanest shared vocabulary we have for the failure modes. It is short, opinionated, and useful. This post takes each item, names the failure pattern in plain language, and pairs it with a control you can ship around an AI coding agent today.&lt;/p&gt;

&lt;p&gt;The configuration shown uses Akmon's policy profiles, packs, and CLI flags. The pattern is general; if you use a different tool, the lessons translate.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to read each section
&lt;/h2&gt;

&lt;p&gt;For each item:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is&lt;/strong&gt;, in one paragraph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The failure story&lt;/strong&gt;, the kind of incident this prevents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The control&lt;/strong&gt;, the actual lever, with code or commands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The trade off&lt;/strong&gt;, the thing the control costs you.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. Prompt injection in tool inputs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; A tool returns text. The text contains a hidden instruction. The agent reads the text and the next decision is reshaped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A scout dossier reads a third-party README. The README has a hidden instruction. The agent later writes a config that the README told it to write.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Use the &lt;code&gt;prod&lt;/code&gt; profile in production paths. Restrict the tool surface in a pack. Constrain &lt;code&gt;web_fetch&lt;/code&gt; to allowed hosts only.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# .akmon/policy-packs/web.toml&lt;/span&gt;
&lt;span class="nn"&gt;[network]&lt;/span&gt;
&lt;span class="py"&gt;web_fetch_allowed_hosts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"docs.example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"api.internal"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;web_fetch_require_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--policy-profile&lt;/span&gt; prod &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/web.toml &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"summarize the API spec at docs.example.com/spec.html"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Some legitimate fetches will fail until you add the host. That is a feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Excessive agency
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; The agent has access to tools it does not need for the task. Breadth becomes surface area.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A documentation task has access to a shell tool that runs migrations. The model invents a migration command on a misread.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Profile-driven tool surface. Use &lt;code&gt;--plan&lt;/code&gt; for read-only scoping before a real run. Add &lt;code&gt;--add-dir&lt;/code&gt; to lock the sandbox.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--policy-profile&lt;/span&gt; prod &lt;span class="nt"&gt;--plan&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"list outdated dependencies and propose updates"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Two-step workflow. Plan first, implement second. Worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Sensitive information disclosure
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; Sensitive data ends up in the model context, the logs, or the agent output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A test fixture has a real customer record. The agent surfaces the record in a comment in a generated PR.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Redact specific objects from the session before sharing. Use Ollama for sensitive paths so the prompt never leaves the machine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;object-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"Customer record removed before audit handoff"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Redaction adds friction at handoff. The friction is the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Improper output handling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; The agent's output is rendered or executed somewhere it should not be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; The agent writes a Markdown reply that includes a fake confirmation block. A downstream automation parses the block as a structured action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Force structured output where it matters. Use &lt;code&gt;--output json&lt;/code&gt; in headless flows so the response is machine-parseable, and validate against your own schema downstream.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$task&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="s1"&gt;'.summary'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Free-form prose has its place. For action-triggering paths, structured output is non negotiable.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Supply chain weaknesses
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; A dependency the agent uses changes in a way that affects behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; An MCP server you use upgraded a tool. The output shape changed. The agent silently misroutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Pin model and tool versions in &lt;code&gt;AKMON.md&lt;/code&gt; and in policy packs. Run &lt;code&gt;akmon replay&lt;/code&gt; in strict mode for a small set of canonical sessions on every PR that touches a tool wrapper.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon replay &amp;lt;baseline-session-id&amp;gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; strict &lt;span class="nt"&gt;--format&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.passed'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; A small set of canonical sessions has to be maintained. Treat them as part of your test suite.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Insecure plugin or tool design
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; A tool was designed without least privilege in mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A generic &lt;code&gt;http.fetch&lt;/code&gt; tool can hit any URL, including internal addresses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Restrict &lt;code&gt;web_fetch&lt;/code&gt; to public allowed hosts and an HTTPS requirement. Use &lt;code&gt;--add-dir&lt;/code&gt; to lock filesystem reads to the project root. Avoid generic shell tools in production profiles.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--policy-profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--add-dir&lt;/span&gt; ./src &lt;span class="nt"&gt;--add-dir&lt;/span&gt; ./docs &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"patch the parser to accept ISO 8601 with offset"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Some legitimate workflows need broader access. Use &lt;code&gt;staging&lt;/code&gt; for those, not &lt;code&gt;prod&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Excessive resource consumption
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; The agent loops, retries, or expands recursively. Tokens, dollars, and tool calls climb without a ceiling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A planning prompt recurses. Over a long evening it racks up provider charges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Use &lt;code&gt;--max-budget-usd&lt;/code&gt; for headless runs. Use &lt;code&gt;--fallback-model&lt;/code&gt; for graceful degradation. Use &lt;code&gt;slo verify&lt;/code&gt; to alarm on retry attempts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--max-budget-usd&lt;/span&gt; 2.50 &lt;span class="nt"&gt;--fallback-model&lt;/span&gt; &lt;span class="s2"&gt;"ollama:qwen-coder-7b"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon slo verify .akmon/evidence/&amp;lt;session&amp;gt;.json &lt;span class="nt"&gt;--strict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Some legitimate runs will hit the budget. Calibrate per task class. Alarm and review.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Vector and embedding weaknesses
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; Retrieval introduces content from an index. If the index can be poisoned, your prompts are poisoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A staging dataset got merged into the production index. An old test record contained a prompt injection. The production agent surfaced it on a real query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Provenance on every index entry. Use the spec workflow (&lt;code&gt;akmon spec&lt;/code&gt;) to gate retrieval-heavy changes through a planning step. Treat &lt;code&gt;RetrievalCall&lt;/code&gt; events as the audit trail.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--index&lt;/span&gt; &lt;span class="nt"&gt;--policy-profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"ground the implementation in the design doc"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Provenance metadata is harder to retrofit than to design in. Start with the most sensitive index.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Misinformation and overreliance
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; The agent claims things confidently that are not true. The user trusts the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; The agent invents a function in a library. The reviewer trusts it. CI catches it; the team's calibration drops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Require structured outputs for fact-bearing tasks. Use &lt;code&gt;--architect&lt;/code&gt; for two-phase plan plus implementation, where the planner uses a stronger model. Layer human review for any change that touches public APIs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--architect&lt;/span&gt; &lt;span class="nt"&gt;--planner-model&lt;/span&gt; &lt;span class="s2"&gt;"anthropic:claude-sonnet-4-6"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"design and implement the JWT rotation flow"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Two-phase costs more tokens. The reviewer can read the plan first, which is its own win.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Unbounded consumption of context
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is.&lt;/strong&gt; Context grows over a long session. Old, irrelevant content shapes new decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure story.&lt;/strong&gt; A multi-hour session keeps adding context until the model truncates from the middle. Behavior shifts in ways nobody can explain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The control.&lt;/strong&gt; Use the spec workflow to break work into discrete sessions. Use &lt;code&gt;--continue&lt;/code&gt; and &lt;code&gt;--session&lt;/code&gt; deliberately, not by habit. Inspect the session journal periodically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon spec parser-iso8601-offset &lt;span class="s2"&gt;"Accept ISO 8601 timestamps with timezone offsets"&lt;/span&gt;
akmon spec parser-iso8601-offset design
akmon spec parser-iso8601-offset tasks
akmon spec parser-iso8601-offset implement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The trade off.&lt;/strong&gt; Less ambient context. The win is reproducibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it together
&lt;/h2&gt;

&lt;p&gt;If you implemented all ten controls, you would have a system with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A small, well-known tool surface, profile-driven.&lt;/li&gt;
&lt;li&gt;Structured output where it matters.&lt;/li&gt;
&lt;li&gt;Redaction available before any external handoff.&lt;/li&gt;
&lt;li&gt;Hard caps on resource use.&lt;/li&gt;
&lt;li&gt;Provenance on retrieval, with &lt;code&gt;RetrievalCall&lt;/code&gt; events as evidence.&lt;/li&gt;
&lt;li&gt;Replay-based regression detection in CI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams will not need all ten on day one. Pick the three that match your top risks. Get them in production. Watch them work. Add the rest as they earn their place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration to put in a pack
&lt;/h2&gt;

&lt;p&gt;A small pack to start with, ready to drop in &lt;code&gt;.akmon/policy-packs/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# .akmon/policy-packs/baseline.toml&lt;/span&gt;
&lt;span class="nn"&gt;[network]&lt;/span&gt;
&lt;span class="py"&gt;web_fetch_allowed_hosts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"docs.example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"api.internal"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;web_fetch_require_https&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[shell]&lt;/span&gt;
&lt;span class="py"&gt;allowed_commands&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"cargo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"npm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"go"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"make"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"git"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;deny_commands&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"rm -rf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"sudo"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[mcp]&lt;/span&gt;
&lt;span class="py"&gt;allowed_servers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"https://mcp.tools.internal/orders"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[tools]&lt;/span&gt;
&lt;span class="py"&gt;web_fetch_default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny"&lt;/span&gt;
&lt;span class="py"&gt;shell_default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ask"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon &lt;span class="nt"&gt;--policy-profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/baseline.toml &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inspect the merged effective policy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon policy show-effective &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/baseline.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The honest part
&lt;/h2&gt;

&lt;p&gt;Controls do not make AI coding agents safe. They make AI coding agents survivable. There will still be model failures, tool bugs, and edge cases nobody saw. The job of the policy and evidence layer is to keep the consequences small and the explanations available.&lt;/p&gt;

&lt;p&gt;If you want to dig deeper into the evidence side of the loop, the next post in this series breaks down the redaction workflow. The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>security</category>
      <category>owasp</category>
    </item>
    <item>
      <title>AI coding compliance for 2026: a working checklist for ISO 42001, the EU AI Act, SOC 2, and tool qualification</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:18:30 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/ai-coding-compliance-for-2026-a-working-checklist-for-iso-42001-the-eu-ai-act-soc-2-and-tool-2i3</link>
      <guid>https://dev.to/radotsvetkov/ai-coding-compliance-for-2026-a-working-checklist-for-iso-42001-the-eu-ai-act-soc-2-and-tool-2i3</guid>
      <description>&lt;p&gt;If you ship AI-assisted code in 2026, three regulatory things have changed under your feet.&lt;/p&gt;

&lt;p&gt;In December 2025, OWASP published the Top 10 for Agentic Applications. In April 2026, Microsoft released the Agent Governance Toolkit. In August 2026, the EU AI Act high-risk obligations take effect. ISO 42001 has become the AI management system standard auditors expect. NIST AI RMF is the framework most US agencies and primes will reference. The Colorado AI Act starts enforcement in June 2026. Tool qualification frameworks (DO-178C and DO-330 for avionics, IEC 62304 for medical devices, ISO 26262 for automotive, CMMC for defense) treat AI tooling with the same scrutiny they applied to legacy code generators.&lt;/p&gt;

&lt;p&gt;That is a lot of paper. The good news is that most of it points at the same operational pattern. You need to know what your AI did, you need to enforce policy at the tool surface, you need evidence you can hand to a third party, and you need a retention story.&lt;/p&gt;

&lt;p&gt;This post is a working checklist that maps each of those frameworks to actual Akmon commands. Map to your own controls as needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the work
&lt;/h2&gt;

&lt;p&gt;Almost every AI compliance program asks for five things, in different language.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A documented inventory of AI systems, with risk classifications and owners.&lt;/li&gt;
&lt;li&gt;A policy framework, ideally enforced at runtime, not only documented.&lt;/li&gt;
&lt;li&gt;An audit trail of agent activity, with the integrity to be admissible.&lt;/li&gt;
&lt;li&gt;A retention story for that audit trail.&lt;/li&gt;
&lt;li&gt;A way to extract evidence on demand, in a format a non-engineer can read.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you build those five, you are most of the way to compliance with most frameworks. The frameworks layer on specific controls that fit your risk profile.&lt;/p&gt;

&lt;h2&gt;
  
  
  ISO 42001, the management system standard
&lt;/h2&gt;

&lt;p&gt;ISO 42001 is a management system standard, like ISO 27001 but for AI. It does not tell you how to build the agent. It tells you how to govern the work.&lt;/p&gt;

&lt;p&gt;The relevant controls for AI coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A.5&lt;/strong&gt; AI policies. Document who owns what.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A.6&lt;/strong&gt; Internal organization. Roles and responsibilities for AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A.7&lt;/strong&gt; Operational planning and control. Including evidence of operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A.8&lt;/strong&gt; Performance evaluation. Continuous monitoring and review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The translation to engineering work, with the Akmon command that produces the evidence:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;What you actually do&lt;/th&gt;
&lt;th&gt;Akmon command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A.5&lt;/td&gt;
&lt;td&gt;Maintain an AI inventory with risk classification&lt;/td&gt;
&lt;td&gt;An internal doc, ideally rendered from the agents that emit AGEF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A.6&lt;/td&gt;
&lt;td&gt;Assign owners for each agent&lt;/td&gt;
&lt;td&gt;A field in the project's &lt;code&gt;AKMON.md&lt;/code&gt; plus your CMDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A.7&lt;/td&gt;
&lt;td&gt;Run policy at the tool boundary, log every call&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;akmon audit verify&lt;/code&gt;, &lt;code&gt;akmon evidence verify&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A.8&lt;/td&gt;
&lt;td&gt;Review evidence regularly, close the loop&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;akmon slo verify&lt;/code&gt;, &lt;code&gt;akmon slo trend&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The trust pipeline already does most of A.7 and A.8. The rest is documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  EU AI Act Article 12, the recording requirement
&lt;/h2&gt;

&lt;p&gt;For high-risk AI systems, the EU AI Act requires automatic recording of events that are relevant to the operation of the system. This is not a logging best practice. It is an obligation.&lt;/p&gt;

&lt;p&gt;The recording must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capture events automatically.&lt;/li&gt;
&lt;li&gt;Cover the operating life of the system.&lt;/li&gt;
&lt;li&gt;Be of sufficient detail to investigate incidents.&lt;/li&gt;
&lt;li&gt;Be retained for a period appropriate to the system's purpose.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The translation to engineering work:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;What you actually do&lt;/th&gt;
&lt;th&gt;Akmon evidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Automatic recording&lt;/td&gt;
&lt;td&gt;Record on every tool call, every model call, every policy decision&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.akmon/audit/&amp;lt;session&amp;gt;.jsonl&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coverage&lt;/td&gt;
&lt;td&gt;Make sure every session produces a record&lt;/td&gt;
&lt;td&gt;Index by session ID, alarm on gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sufficient detail&lt;/td&gt;
&lt;td&gt;Inputs, outputs, decisions, parent and child IDs&lt;/td&gt;
&lt;td&gt;AGEF event kinds, content-addressed objects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retention&lt;/td&gt;
&lt;td&gt;Store records for the contractual or legal minimum&lt;/td&gt;
&lt;td&gt;Lifecycle policy on your storage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Article 12 does not say "use AGEF". It says automatic, sufficient, retained. AGEF satisfies those structurally.&lt;/p&gt;

&lt;h2&gt;
  
  
  NIST AI RMF, the risk management framework
&lt;/h2&gt;

&lt;p&gt;NIST AI RMF is voluntary, US flavored, and well respected. The four functions are Govern, Map, Measure, Manage.&lt;/p&gt;

&lt;p&gt;For AI coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Govern&lt;/strong&gt;: assign accountability for AI tools used in development.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Map&lt;/strong&gt;: classify the risk of each agent and document tools it can use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure&lt;/strong&gt;: monitor for misbehavior, track metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manage&lt;/strong&gt;: respond to incidents, retire systems that fail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have ISO 42001 covered, NIST AI RMF is mostly a vocabulary translation. Same data, different headers in the report.&lt;/p&gt;

&lt;h2&gt;
  
  
  SOC 2 for AI engineering controls
&lt;/h2&gt;

&lt;p&gt;For SOC 2 audits, AI coding agents tend to fall under CC4 (monitoring) and CC7 (system operations). Some auditors are starting to ask about CC2.3 (communications about responsibilities) for AI-specific roles.&lt;/p&gt;

&lt;p&gt;What auditors want to see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A documented set of AI controls, with owners.&lt;/li&gt;
&lt;li&gt;Evidence that the controls run, not just that they exist.&lt;/li&gt;
&lt;li&gt;Logs that demonstrate operations and exceptions.&lt;/li&gt;
&lt;li&gt;Incident history with root cause and remediation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Akmon trust pipeline maps cleanly. SLO verify gives you the "controls run". Evidence verify gives you the "logs that demonstrate operations". Replay gives you the "incident history" framing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool qualification frameworks
&lt;/h2&gt;

&lt;p&gt;This is where Akmon is most differentiated. Generic AI agents do not address tool qualification at all. Akmon does.&lt;/p&gt;

&lt;h3&gt;
  
  
  DO-178C and DO-330 (aerospace and avionics)
&lt;/h3&gt;

&lt;p&gt;DO-330 covers tool qualification. AI tools used in development have to either qualify or be classified as not affecting the certified output. Akmon's evidence chain is the artifact you need to make the case. The tool qualification kit (TQK) typically wants a deterministic procedure and recorded artifacts. Replay against recorded providers and tools is the closest thing in the AI space to a deterministic procedure.&lt;/p&gt;

&lt;h3&gt;
  
  
  IEC 62304 (medical device software)
&lt;/h3&gt;

&lt;p&gt;IEC 62304 cares about the software life cycle. AI assistance in development is part of that life cycle. The evidence Akmon produces fits the V&amp;amp;V records expected at most safety classifications. The redaction flow is critical for protected health information.&lt;/p&gt;

&lt;h3&gt;
  
  
  ISO 26262 (automotive)
&lt;/h3&gt;

&lt;p&gt;For ASIL-rated software, traceability is mandatory. AGEF's content-addressed events plus replay give you a defensible answer when an auditor asks where a particular line came from. The spec workflow (&lt;code&gt;akmon spec&lt;/code&gt;) is the right entry point for high-ASIL changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  CMMC (defense)
&lt;/h3&gt;

&lt;p&gt;CMMC level 2 and above care about access control and audit. Akmon's policy profiles, the explicit deny posture in &lt;code&gt;prod&lt;/code&gt;, and the audit chain map to several practices in AC, AU, and CM domains. Local-first execution (Ollama or your hosted endpoint inside a controlled environment) keeps controlled unclassified information inside the boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  OWASP Top 10 for Agentic Applications
&lt;/h2&gt;

&lt;p&gt;Published December 2025. The list is technical, not regulatory, but it has become the shared vocabulary for failure modes. If you have not mapped your agent's risks to the list, do it.&lt;/p&gt;

&lt;p&gt;The most relevant items for an AI coding agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM01 prompt injection. Tool inputs that contain hidden instructions. Mitigated by policy at the tool boundary, the &lt;code&gt;prod&lt;/code&gt; profile, and constrained &lt;code&gt;web_fetch&lt;/code&gt; allow lists.&lt;/li&gt;
&lt;li&gt;LLM03 sensitive information disclosure. Mitigated by &lt;code&gt;akmon redact&lt;/code&gt; for outputs and by careful provider choice.&lt;/li&gt;
&lt;li&gt;LLM06 excessive agency. Mitigated by the &lt;code&gt;prod&lt;/code&gt; policy profile and by team-specific packs that lock the tool surface.&lt;/li&gt;
&lt;li&gt;LLM08 vector and embedding weaknesses. Where your agent uses retrieval, the events emit &lt;code&gt;RetrievalCall&lt;/code&gt; records, so you can audit what was retrieved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A control map document lives in the docs. The mapping is concrete: each control points at a command or a configuration knob.&lt;/p&gt;

&lt;h2&gt;
  
  
  A short, copyable checklist
&lt;/h2&gt;

&lt;p&gt;If you have one afternoon:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make an inventory. List your agents, owners, and risk levels in a one page document.&lt;/li&gt;
&lt;li&gt;Stand up Akmon in your repo. Choose &lt;code&gt;staging&lt;/code&gt; profile. Add one organization pack.&lt;/li&gt;
&lt;li&gt;Run a session, then run the trust pipeline. Confirm the three exit codes are &lt;code&gt;0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Map three controls each to ISO 42001 A.7, EU AI Act Article 12, and SOC 2 CC7. Use the AGEF event kinds as evidence.&lt;/li&gt;
&lt;li&gt;Set a retention policy. Pick a number, document it, automate the lifecycle.&lt;/li&gt;
&lt;li&gt;Schedule a weekly evidence review. One person, one hour, one summary.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you have a quarter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cover all repos with the same Akmon binary and policy framework.&lt;/li&gt;
&lt;li&gt;Build a control map that ties each AGEF event kind and each command to a control across frameworks.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;slo trend&lt;/code&gt; to detect regressions across recent sessions.&lt;/li&gt;
&lt;li&gt;Add a customer-facing surface (a small page in your help center) that explains what your AI coding agent records.&lt;/li&gt;
&lt;li&gt;Walk your auditor through a sample session. The first one will tell you what is missing.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What this gets you
&lt;/h2&gt;

&lt;p&gt;What you get: an audit trail that survives a third-party review, a faster path through customer security questionnaires, and a much shorter incident loop.&lt;/p&gt;

&lt;p&gt;What you do not get: a guarantee that the model behaves. The model is the model. The job of governance is to make the consequences of misbehavior bounded, observable, and provable.&lt;/p&gt;

&lt;p&gt;If you want a place to start, install Akmon and run the trust pipeline on one session. The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The next article in this series goes deep on the redaction workflow, which is the part of the kit you reach for the day before an external review.&lt;/p&gt;

</description>
      <category>compliance</category>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
    </item>
    <item>
      <title>Sanitizing AI coding sessions before external review: the redaction workflow that ships</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:18:15 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/sanitizing-ai-coding-sessions-before-external-review-the-redaction-workflow-that-ships-48dk</link>
      <guid>https://dev.to/radotsvetkov/sanitizing-ai-coding-sessions-before-external-review-the-redaction-workflow-that-ships-48dk</guid>
      <description>&lt;p&gt;The day before an external review is the day you discover what is actually in your AI coding sessions. A real customer name buried in a tool input. A snippet of restricted code in a model response. A path that exposes more about your infrastructure than you wanted to share.&lt;/p&gt;

&lt;p&gt;The wrong answer is to clean it up by hand. By-hand redaction breaks integrity. The right answer is a redaction workflow that produces a sanitized derivative bundle with sentinels in place of the redacted content, a recorded reason on every change, and a verification step before sharing.&lt;/p&gt;

&lt;p&gt;This article walks Akmon's &lt;code&gt;redact&lt;/code&gt; command end to end. The commands and the sentinel format are real Akmon v2.0.0 surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "I will edit it by hand" does not survive
&lt;/h2&gt;

&lt;p&gt;Three reasons.&lt;/p&gt;

&lt;p&gt;First, the audit chain is cryptographic. If you change a byte without producing a new chain, the chain stops verifying. Your auditor opens the file, runs &lt;code&gt;audit verify&lt;/code&gt;, gets a failure, and the conversation is over.&lt;/p&gt;

&lt;p&gt;Second, redaction without a reason is not auditable. The reviewer needs to know that this object was redacted on purpose, and why. A sentinel with a &lt;code&gt;reason&lt;/code&gt; field is the record.&lt;/p&gt;

&lt;p&gt;Third, ad hoc edits do not round-trip. If you produce a clean derivative bundle that re-imports cleanly and verifies offline, you have something portable. If you produce a one-off edit, you have something that might or might not survive transport.&lt;/p&gt;

&lt;p&gt;The workflow below avoids all three failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The command
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="o"&gt;[&lt;/span&gt;OPTIONS]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A typical invocation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;object-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"PII removal"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multiple objects, repeat &lt;code&gt;--object&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;hash1&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;hash2&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;hash3&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"Removed customer names and one credential reference"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find the right hashes with &lt;code&gt;akmon inspect --resolve&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon inspect &amp;lt;session-id&amp;gt; &lt;span class="nt"&gt;--resolve&lt;/span&gt; | less
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the loop: inspect, decide, redact.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sentinel format
&lt;/h2&gt;

&lt;p&gt;Every redacted object is replaced by a canonical CBOR sentinel. The payload looks like this in JSON form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"akmon_redacted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"original_hash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;hex of original&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"original_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;text from --reason&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"redacted_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;RFC3339 timestamp&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A reviewer reading the bundle can see that an object was redacted on purpose, can see why, and can confirm that the original size is what was expected. The chain still verifies because the sentinel hashes correctly.&lt;/p&gt;

&lt;p&gt;This is a small but important design choice. The sentinel is not a placeholder. It is a record. The redaction is itself part of the evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verifying the sanitized bundle before sharing
&lt;/h2&gt;

&lt;p&gt;Always verify before sending. The cheapest possible smoke test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon bundle import sanitized.akmon &lt;span class="nt"&gt;--verify-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This re-runs the AGEF verification on the derivative bundle. If anything is wrong (a missing object, a hash mismatch, a malformed manifest), the verifier fails loudly. Treat this as the gate before any handoff.&lt;/p&gt;

&lt;p&gt;For deeper confidence, replay the sanitized session in a sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon replay &amp;lt;derivative-session-id&amp;gt; &lt;span class="nt"&gt;--mode&lt;/span&gt; strict &lt;span class="nt"&gt;--format&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A clean replay confirms that the bundle still tells a coherent story.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to redact, and what not to
&lt;/h2&gt;

&lt;p&gt;A practical rule of thumb. Redact what cannot leave the building. Keep what makes the session reviewable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redact: customer names, account numbers, secrets that leaked into a tool input, snippets of code that fall outside the disclosure scope.&lt;/li&gt;
&lt;li&gt;Keep: tool IDs, policy decisions, structural events (&lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;UserTurn&lt;/code&gt;, &lt;code&gt;ProviderCall&lt;/code&gt;, &lt;code&gt;AssistantTurn&lt;/code&gt;, &lt;code&gt;SessionEnd&lt;/code&gt;), provider identifiers, model versions, attempt statuses.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The structure of the session is what makes it auditable. The content is what makes it sensitive. Redaction is the line between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The exit code map
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Derivative bundle written successfully&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Usage error (output exists, invalid hash format, object not in session, missing required flag)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;I/O or environment error (journal or session not found, write failure, unreadable referenced object)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For automation, treat anything other than &lt;code&gt;0&lt;/code&gt; as a hard stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small story from a tester
&lt;/h2&gt;

&lt;p&gt;A friend who runs an embedded team for a medical device manufacturer ran me through their first audit using Akmon last quarter. The auditor asked for a representative session that touched the firmware build path. The team chose three sessions, ran &lt;code&gt;inspect --resolve&lt;/code&gt; on each, and listed the objects to redact: two contained patient IDs from a test dataset, one contained an internal vendor name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &lt;span class="nv"&gt;$sid&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; &lt;span class="s2"&gt;"audit-pack/&lt;/span&gt;&lt;span class="nv"&gt;$sid&lt;/span&gt;&lt;span class="s2"&gt;.akmon"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &lt;span class="nv"&gt;$hash_patient_a&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &lt;span class="nv"&gt;$hash_patient_b&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &lt;span class="nv"&gt;$hash_vendor&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"PHI removal and vendor identification removal for IEC 62304 audit"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They verified each derivative bundle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon bundle import &lt;span class="s2"&gt;"audit-pack/&lt;/span&gt;&lt;span class="nv"&gt;$sid&lt;/span&gt;&lt;span class="s2"&gt;.akmon"&lt;/span&gt; &lt;span class="nt"&gt;--verify-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And handed the auditor the three bundles plus an executable verifier. The auditor verified each one offline and proceeded. The conversation took an hour, not a week. The team now treats the redaction step as a normal part of audit prep.&lt;/p&gt;

&lt;h2&gt;
  
  
  A common mistake to avoid
&lt;/h2&gt;

&lt;p&gt;Do not redact more than you need to. Every redaction shrinks the auditor's ability to follow the session. If you redact &lt;code&gt;AssistantTurn&lt;/code&gt; messages and &lt;code&gt;ToolCall&lt;/code&gt; outputs aggressively, the bundle stops telling a story.&lt;/p&gt;

&lt;p&gt;The right mental model: redact the smallest set of objects that removes the sensitive content. Keep everything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stronger guarantees on top of redaction
&lt;/h2&gt;

&lt;p&gt;Redaction provides selective content removal with documented reasons and verifiable structure. It does not provide identity attribution by itself. If your environment requires producer trust, layer external signing on top of the bundle. The AGEF spec is explicit that signing is expected in later versions; for now, signing is your job.&lt;/p&gt;

&lt;p&gt;A practical approach: sign the bundle with your team's key, distribute the signature alongside, and document the verification procedure. The auditor verifies the signature first, then runs &lt;code&gt;bundle import --verify-only&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this leaves you
&lt;/h2&gt;

&lt;p&gt;After the redaction workflow is in place, three things are true.&lt;/p&gt;

&lt;p&gt;First, you can hand any session to an external reviewer with a single bundle.&lt;/p&gt;

&lt;p&gt;Second, the redaction is itself part of the evidence. The reviewer can see what was redacted and why.&lt;/p&gt;

&lt;p&gt;Third, the integrity guarantees survive transport. The bundle verifies offline.&lt;/p&gt;

&lt;p&gt;If you want to dig deeper into the policy side of the loop, the next post in this series goes deep on policy profiles and packs, the deterministic merge that controls what the agent can do.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>compliance</category>
    </item>
    <item>
      <title>Observability and evidence in AI coding workflows: two log streams, two masters</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:17:59 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/observability-and-evidence-in-ai-coding-workflows-two-log-streams-two-masters-4j09</link>
      <guid>https://dev.to/radotsvetkov/observability-and-evidence-in-ai-coding-workflows-two-log-streams-two-masters-4j09</guid>
      <description>&lt;p&gt;A few months back I watched an external reviewer ask one question I could not answer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For the AI session that touched this medical device firmware on Tuesday, can you show me the inputs, the policy decisions, the outputs, and a signature that says nobody changed the record?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We had beautiful dashboards. Token usage, latency, top tools, error rates. We did not have what she was asking for.&lt;/p&gt;

&lt;p&gt;That conversation forced a vocabulary change. We started saying observability and evidence as two separate words for two separate things. This article is the version of that conversation I wish someone had given me a year earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;Observability is consumed by an engineer trying to understand what the system is doing. Rich, sometimes lossy, meant for a dashboard. Sampling is fine. Cardinality limits are fine.&lt;/p&gt;

&lt;p&gt;Evidence is consumed by someone who was not there. A reviewer. A regulator. A future-you reading the file three months after the session. Evidence has to be complete for the session, normalized so the reader does not have to learn your stack, and tamper evident so they can trust no one rewrote history. Sampling is not fine. Format drift is not fine.&lt;/p&gt;

&lt;p&gt;If you build only observability, you can debug yesterday. If you build evidence, you can defend yesterday.&lt;/p&gt;

&lt;h2&gt;
  
  
  A side by side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Observability&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Audience&lt;/td&gt;
&lt;td&gt;Internal engineers&lt;/td&gt;
&lt;td&gt;External reviewers and future you&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time horizon&lt;/td&gt;
&lt;td&gt;Hours to days&lt;/td&gt;
&lt;td&gt;Months to years&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sampling&lt;/td&gt;
&lt;td&gt;Often, by tenant or rate&lt;/td&gt;
&lt;td&gt;Never&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema&lt;/td&gt;
&lt;td&gt;Vendor specific or OTLP&lt;/td&gt;
&lt;td&gt;Normalized, vendor neutral (AGEF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tamper evidence&lt;/td&gt;
&lt;td&gt;Not required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Hot, indexed&lt;/td&gt;
&lt;td&gt;Warm, durable, lifecycle managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost driver&lt;/td&gt;
&lt;td&gt;Cardinality&lt;/td&gt;
&lt;td&gt;Volume per session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tooling&lt;/td&gt;
&lt;td&gt;Datadog, Langfuse, Helicone&lt;/td&gt;
&lt;td&gt;Akmon trust pipeline plus AGEF bundles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure mode&lt;/td&gt;
&lt;td&gt;Slow dashboards&lt;/td&gt;
&lt;td&gt;Failed audit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both belong in your stack. Neither replaces the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why teams keep conflating them
&lt;/h2&gt;

&lt;p&gt;Three reasons, all reasonable.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The data starts in the same place. Both come from the agent runtime.&lt;/li&gt;
&lt;li&gt;The first audience is engineering. Observability comes first because it solves immediate pain. Evidence is for a later audience.&lt;/li&gt;
&lt;li&gt;Vendors blur the line. Many tools sell observability as "audit ready". Look at the schema, not the marketing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can use the same source events. You should not use the same destination format.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data shape each one needs
&lt;/h2&gt;

&lt;p&gt;Observability wants to slice and dice. The data is dimensional. Tag everything. Sample when you have to. A typical span has attributes for tenant, user, model, tool name, version, with counters for tokens in, tokens out, cost. High cardinality fields can be dropped or hashed.&lt;/p&gt;

&lt;p&gt;Evidence wants to be complete and verifiable. The data is hierarchical. Each session is a chain of typed events with a known closed kind list (&lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;UserTurn&lt;/code&gt;, &lt;code&gt;ProviderCall&lt;/code&gt;, &lt;code&gt;ToolCall&lt;/code&gt;, &lt;code&gt;RetrievalCall&lt;/code&gt;, &lt;code&gt;PermissionGate&lt;/code&gt;, &lt;code&gt;AssistantTurn&lt;/code&gt;, &lt;code&gt;SessionEnd&lt;/code&gt;). Inputs and outputs are content-addressed. Hashes link events to objects. The bundle has a signed manifest in the future and a verifiable chain today.&lt;/p&gt;

&lt;p&gt;The two shapes are related but not the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  What gets recorded in each
&lt;/h2&gt;

&lt;p&gt;A short tour of fields, with the question they answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;latency_ms&lt;/code&gt;: how slow was that step?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tokens_input&lt;/code&gt;, &lt;code&gt;tokens_output&lt;/code&gt;: what is this costing?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;model_name&lt;/code&gt;, &lt;code&gt;model_version&lt;/code&gt;: did a recent change cause the regression?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;error_class&lt;/code&gt;, &lt;code&gt;error_message&lt;/code&gt;: where are breakages clustered?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;user_id_hash&lt;/code&gt;, &lt;code&gt;tenant_id&lt;/code&gt;: who is affected?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Evidence (AGEF v0.1)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Manifest: &lt;code&gt;agef_version&lt;/code&gt;, &lt;code&gt;producer&lt;/code&gt;, &lt;code&gt;session.id&lt;/code&gt;, &lt;code&gt;session.head&lt;/code&gt;, timestamps, &lt;code&gt;hash_algorithm&lt;/code&gt;, counts.&lt;/li&gt;
&lt;li&gt;Events: &lt;code&gt;parents&lt;/code&gt;, &lt;code&gt;kind&lt;/code&gt;, &lt;code&gt;emitted_at&lt;/code&gt;, &lt;code&gt;sequence&lt;/code&gt;, plus kind-specific fields.&lt;/li&gt;
&lt;li&gt;Objects: every input, output, prompt, message, side effect.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A reader of an observability stream cannot answer "show me the input that produced this denial". A reader of an AGEF bundle cannot easily answer "what is the p99 latency for this tool". Both are right.&lt;/p&gt;

&lt;h2&gt;
  
  
  A pipeline that produces both
&lt;/h2&gt;

&lt;p&gt;You do not need two duplicate pipelines. You need one source and two sinks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent runtime  -&amp;gt;  Akmon (single Rust binary)
                          |
                          +--&amp;gt; OTLP / metrics --&amp;gt; observability backend
                          |
                          +--&amp;gt; .akmon/audit/&amp;lt;session&amp;gt;.jsonl  --&amp;gt; trust pipeline
                                .akmon/evidence/&amp;lt;session&amp;gt;.json
                                AGEF bundle (Phase 4)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Akmon writes the audit chain and the evidence summary. Phase 4 export turns those into a portable AGEF bundle. Observability data continues to flow to your dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to roll this out without breaking the existing stack
&lt;/h2&gt;

&lt;p&gt;If your team already has observability, do not rip it out. Add evidence next to it.&lt;/p&gt;

&lt;p&gt;A migration that works in three steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Two weeks: install Akmon. Run the trust pipeline on a small set of sessions. Confirm the three exit codes are &lt;code&gt;0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Two weeks: add policy packs. Pin the tool surface. Run replay against a small canonical set in CI.&lt;/li&gt;
&lt;li&gt;Ongoing: expand to more repos. Map AGEF event kinds to control statements. Hand a sample to your reviewer and ask what is missing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By the end of the third step, the conversation with the reviewer changes. Instead of "we have logs", the answer is "here is the bundle, here is the verifier, here is the policy that fired".&lt;/p&gt;

&lt;h2&gt;
  
  
  Five things teams get wrong
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sampling for evidence.&lt;/strong&gt; If a session is missing, the answer to "what did the agent do at 9:14" is "we do not know". The worst possible answer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storing raw inputs without redaction in shared bundles.&lt;/strong&gt; Sensitive data does not belong in an evidence bundle that is going to a third party. Redact at the boundary; keep the structural events.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Same retention as observability.&lt;/strong&gt; Observability is hot. Evidence is warm. Use storage classes that match the access pattern.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No verification step.&lt;/strong&gt; A bundle that is not verified is a bundle that might fail when it counts. Make &lt;code&gt;bundle import --verify-only&lt;/code&gt; a required step before any handoff.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Treating the format as an internal detail.&lt;/strong&gt; Evidence is consumed by people outside your team. Pick an open format that survives them, not just one that fits today. AGEF is meant for that.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why I picked AGEF for the evidence side
&lt;/h2&gt;

&lt;p&gt;I wrote AGEF because I needed it. Observability formats were not built for the audience that consumes evidence. Vendor-specific traces leave the reader stuck in a dashboard. Custom JSON drifts in a quarter and rots in a year.&lt;/p&gt;

&lt;p&gt;AGEF is small, opinionated, and portable. The full spec fits in a short document. The bundle is &lt;code&gt;tar.zst&lt;/code&gt; with a &lt;code&gt;manifest.json&lt;/code&gt;, &lt;code&gt;events.bin&lt;/code&gt; (length-delimited canonical CBOR), and &lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt; files. The format is signed (planned). It works for any runtime that can emit it. It is the part of the stack I want to outlive any one tool, including mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;If you have observability already, you have most of the source data. The work is to project it into a normalized record per session, redact at the object level when you share, and verify the bundle on import. Akmon and AGEF give you that path without rewriting your dashboard.&lt;/p&gt;

&lt;p&gt;If you do not have observability yet, start there. Get OpenTelemetry traces flowing for any agent runtime you care about. Then put Akmon in the middle and start writing evidence.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When the next reviewer asks for the inputs, the policy decisions, the outputs, and a signature, you will have one answer instead of three.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>observability</category>
      <category>security</category>
    </item>
    <item>
      <title>Why I built Akmon, the AI coding agent for regulated engineering</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Thu, 14 May 2026 08:17:35 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/why-i-built-akmon-the-ai-coding-agent-for-regulated-engineering-2fki</link>
      <guid>https://dev.to/radotsvetkov/why-i-built-akmon-the-ai-coding-agent-for-regulated-engineering-2fki</guid>
      <description>&lt;p&gt;For the last two years I have watched the same conversation happen in every regulated engineering team I work with. Someone tries the new AI coding agent. It writes a real diff. The team is impressed. Then somebody asks the question that ends the experiment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If we merge this, what do we tell the auditor?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent did the work. The session was a TUI window that closed. The diff is in git. The reasoning is gone. Whatever policy the agent used, whatever model version it called, whatever tool it ran, none of that survived. In a hobby project that is fine. In avionics, in a medical device, in a SOC 2 release process, in a CMMC-bound codebase, that is a hard stop.&lt;/p&gt;

&lt;p&gt;This post is the short story of why I started Akmon, what it actually is, and the small set of commands you can try this afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem in one sentence
&lt;/h2&gt;

&lt;p&gt;AI coding agents produce code. They do not produce evidence. In regulated engineering, code without evidence is a liability.&lt;/p&gt;

&lt;p&gt;By "evidence" I mean an artifact that lets a reviewer answer five questions without your help.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What did the agent read?&lt;/li&gt;
&lt;li&gt;What tools did it call, with what inputs?&lt;/li&gt;
&lt;li&gt;What changed on disk?&lt;/li&gt;
&lt;li&gt;Which policy decision allowed each side effect?&lt;/li&gt;
&lt;li&gt;Can we replay the session and confirm the artifact is intact?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are not exotic asks. They are the same questions a careful human reviewer asks of a colleague's branch. The difference is that an agent does ten times the moves in one tenth the time, and the rest of the system has not caught up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I tried first
&lt;/h2&gt;

&lt;p&gt;The first instinct, the same one I had, is to slap a logger on the agent. That helps for a week.&lt;/p&gt;

&lt;p&gt;Then you discover three things.&lt;/p&gt;

&lt;p&gt;First, logs are not evidence. Logs are a developer's artifact. Evidence is a reviewer's artifact. The shape is different, the audience is different, and the integrity guarantees are different.&lt;/p&gt;

&lt;p&gt;Second, the chain is the point. A flat log file does not let you say "this file change was caused by that tool call which was caused by that model response". You need a chain of events, ordered, with parent linkage.&lt;/p&gt;

&lt;p&gt;Third, the export is the work. The audit will not happen on your laptop. The bundle has to travel. It has to verify offline. It has to be portable across machines that have never seen your repo.&lt;/p&gt;

&lt;p&gt;By the time I had stitched all of that together with shell scripts, I had a small, ugly piece of infrastructure. I deleted the scripts and started Akmon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Akmon is, in one paragraph
&lt;/h2&gt;

&lt;p&gt;Akmon is a single Rust binary. It is an AI coding agent (interactive TUI plus a headless &lt;code&gt;--task&lt;/code&gt; mode), built for environments where the agent's reasoning, tool calls, and file changes have to be reviewable later. Every session writes a tamper-evident, content-addressed audit chain to &lt;code&gt;.akmon/audit/&amp;lt;session-id&amp;gt;.jsonl&lt;/code&gt; and a structured evidence summary to &lt;code&gt;.akmon/evidence/&amp;lt;session-id&amp;gt;.json&lt;/code&gt;. Sessions can be replayed deterministically, compared with diff, redacted before external review, and exported as portable AGEF bundles.&lt;/p&gt;

&lt;p&gt;Akmon supports Anthropic, OpenAI, OpenRouter, Groq, Azure OpenAI, Bedrock, OpenAI-compatible endpoints, and Ollama. Model choice is operator controlled. There is no hosted runtime to log into. You run it on your laptop, your CI runner, or your hardened SSH host.&lt;/p&gt;

&lt;p&gt;License is Apache-2.0. The repo is at &lt;code&gt;github.com/radotsvetkov/akmon&lt;/code&gt;. v2.0.0 is the current line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five commands you can try this afternoon
&lt;/h2&gt;

&lt;p&gt;Install the binary, then walk this short pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run a small headless task in your project.&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
akmon &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"summarize failing tests and propose minimal fixes"&lt;/span&gt; | &lt;span class="nb"&gt;tee &lt;/span&gt;run.json

&lt;span class="c"&gt;# Verify the tamper-evident audit chain for the session.&lt;/span&gt;
akmon audit verify .akmon/audit/&amp;lt;session-id&amp;gt;.jsonl

&lt;span class="c"&gt;# Verify the evidence schema and the linkage to the audit chain.&lt;/span&gt;
akmon evidence verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json

&lt;span class="c"&gt;# Enforce a single-run SLO check.&lt;/span&gt;
akmon slo verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json &lt;span class="nt"&gt;--strict&lt;/span&gt;

&lt;span class="c"&gt;# Detect regressions vs a historical baseline.&lt;/span&gt;
akmon slo trend .akmon/evidence/&amp;lt;session-id&amp;gt;.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--baseline-dir&lt;/span&gt; .akmon/evidence/history &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--window&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--strict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five commands, five exit codes, five crisp signals. None of them depend on a dashboard. All of them gate cleanly in CI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one design decision that matters most
&lt;/h2&gt;

&lt;p&gt;Akmon ships as a single Rust binary on purpose. That choice does work most teams underestimate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime state is explicit. There is no plugin runtime to drift.&lt;/li&gt;
&lt;li&gt;Behavior is reproducible across laptops, CI runners, SSH hosts, and air-gapped environments.&lt;/li&gt;
&lt;li&gt;Troubleshooting tends to focus on policy, providers, repository state, or model behavior. Not host runtime mismatch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If two machines run the same Akmon version, you can treat them as equivalent. That sounds boring. It is the difference between a tool you can support and one you cannot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Policy is a deterministic merge
&lt;/h2&gt;

&lt;p&gt;Policy in Akmon comes from four layers, in a fixed order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Built-in profile (&lt;code&gt;dev&lt;/code&gt;, &lt;code&gt;staging&lt;/code&gt;, &lt;code&gt;prod&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Policy packs (TOML or JSON files in &lt;code&gt;.akmon/policy-packs/&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Project-local policy (&lt;code&gt;.akmon/policy.toml&lt;/code&gt; or &lt;code&gt;.akmon/policy.json&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;CLI override (&lt;code&gt;--policy-override&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Within each layer, list fields append and deduplicate while keeping the last occurrence, so higher precedence keeps later rule order. The effective policy is inspectable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon policy show-effective &lt;span class="nt"&gt;--profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/org.toml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/team.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tickets that start with "I think the policy was" become tickets that end with "the policy was, here is the merged TOML".&lt;/p&gt;

&lt;h2&gt;
  
  
  Why an open evidence format
&lt;/h2&gt;

&lt;p&gt;After Akmon's audit chain was working, the next gap was portability. A reviewer who needs to confirm a session should not need Akmon installed. They should need a verifier and a bundle.&lt;/p&gt;

&lt;p&gt;That is what AGEF is. AGEF is the Agent Governance Evidence Format. It is an open spec, governed in its own repo at &lt;code&gt;github.com/radotsvetkov/agef&lt;/code&gt;. A bundle is a &lt;code&gt;tar.zst&lt;/code&gt; archive with &lt;code&gt;manifest.json&lt;/code&gt;, an &lt;code&gt;events.bin&lt;/code&gt; stream of length-delimited canonical CBOR records, and a directory of content-addressed &lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt; files. The current spec text is &lt;code&gt;v0.1.1&lt;/code&gt;; the wire format version is &lt;code&gt;0.1&lt;/code&gt;. Spec text is CC BY 4.0. Code is Apache-2.0.&lt;/p&gt;

&lt;p&gt;Akmon is the reference implementation. The journal substrate (&lt;code&gt;akmon-journal&lt;/code&gt;) is a Substrate Profile under AGEF v0.1. Bundle Profile (full export and import) is part of Akmon Phase 4.&lt;/p&gt;

&lt;p&gt;I will go deep on AGEF in the next post. The point for this one is that the format is real, written, and meant to outlive any one tool, including mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Akmon is for
&lt;/h2&gt;

&lt;p&gt;Three concrete audiences, in order.&lt;/p&gt;

&lt;p&gt;First, senior engineers in regulated codebases. Avionics (DO-178C, DO-330 tool qualification), medical devices (IEC 62304), automotive (ISO 26262, ASPICE), industrial control (IEC 61508), defense (CMMC). If your build process treats unreviewable AI output as inadmissible, Akmon is built for you.&lt;/p&gt;

&lt;p&gt;Second, DevSecOps and platform engineers in finance. SOC 2 evidence, internal audit on AI use, vendor risk on model providers. The trust pipeline maps cleanly to these controls.&lt;/p&gt;

&lt;p&gt;Third, AI tooling leads at companies that have rolled out AI coding tools and now have to answer the next layer of questions about provenance, regression detection, and review fatigue.&lt;/p&gt;

&lt;p&gt;If none of that fits, you might still find the format interesting. AGEF is meant to be useful beyond Akmon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Akmon is not
&lt;/h2&gt;

&lt;p&gt;A few things to keep the conversation honest.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Akmon is not the fastest autocomplete. It is the agent that records what it did. If your workflow does not need that, the trade may not be worth it.&lt;/li&gt;
&lt;li&gt;Akmon is not a hosted SaaS runtime. You run the binary. Your data stays where you put it.&lt;/li&gt;
&lt;li&gt;Akmon is not a guarantee that the model behaves. The model is the model. Akmon makes the consequences of misbehavior bounded, observable, and provable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters. Governance is not the absence of bad behavior. It is the ability to prove what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;If anything in this post hit close to home, the smallest first step is one minute long.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the binary.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;akmon --yes --task "..."&lt;/code&gt; on a sandbox project.&lt;/li&gt;
&lt;li&gt;Run the five-command trust pipeline above.&lt;/li&gt;
&lt;li&gt;Open the AGEF section of the docs and read what a &lt;code&gt;manifest.json&lt;/code&gt; and an &lt;code&gt;events.bin&lt;/code&gt; look like. Pretend you are the reviewer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If that bundle answers the five questions in the second section without your help, you have closed the gap I described at the start of this post.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. The site lives at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;. The next post in this series goes deep on AGEF.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Why I built Akmon, the AI coding agent for regulated engineering</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Wed, 06 May 2026 21:03:46 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/why-i-built-akmon-the-ai-coding-agent-for-regulated-engineering-5elp</link>
      <guid>https://dev.to/radotsvetkov/why-i-built-akmon-the-ai-coding-agent-for-regulated-engineering-5elp</guid>
      <description>&lt;p&gt;For the last two years I have watched the same conversation happen in every regulated engineering team I work with. Someone tries the new AI coding agent. It writes a real diff. The team is impressed. Then somebody asks the question that ends the experiment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If we merge this, what do we tell the auditor?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent did the work. The session was a TUI window that closed. The diff is in git. The reasoning is gone. Whatever policy the agent used, whatever model version it called, whatever tool it ran, none of that survived. In a hobby project that is fine. In avionics, in a medical device, in a SOC 2 release process, in a CMMC-bound codebase, that is a hard stop.&lt;/p&gt;

&lt;p&gt;This post is the short story of why I started Akmon, what it actually is, and the small set of commands you can try this afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem in one sentence
&lt;/h2&gt;

&lt;p&gt;AI coding agents produce code. They do not produce evidence. In regulated engineering, code without evidence is a liability.&lt;/p&gt;

&lt;p&gt;By "evidence" I mean an artifact that lets a reviewer answer five questions without your help.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What did the agent read?&lt;/li&gt;
&lt;li&gt;What tools did it call, with what inputs?&lt;/li&gt;
&lt;li&gt;What changed on disk?&lt;/li&gt;
&lt;li&gt;Which policy decision allowed each side effect?&lt;/li&gt;
&lt;li&gt;Can we replay the session and confirm the artifact is intact?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are not exotic asks. They are the same questions a careful human reviewer asks of a colleague's branch. The difference is that an agent does ten times the moves in one tenth the time, and the rest of the system has not caught up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I tried first
&lt;/h2&gt;

&lt;p&gt;The first instinct, the same one I had, is to slap a logger on the agent. That helps for a week.&lt;/p&gt;

&lt;p&gt;Then you discover three things.&lt;/p&gt;

&lt;p&gt;First, logs are not evidence. Logs are a developer's artifact. Evidence is a reviewer's artifact. The shape is different, the audience is different, and the integrity guarantees are different.&lt;/p&gt;

&lt;p&gt;Second, the chain is the point. A flat log file does not let you say "this file change was caused by that tool call which was caused by that model response". You need a chain of events, ordered, with parent linkage.&lt;/p&gt;

&lt;p&gt;Third, the export is the work. The audit will not happen on your laptop. The bundle has to travel. It has to verify offline. It has to be portable across machines that have never seen your repo.&lt;/p&gt;

&lt;p&gt;By the time I had stitched all of that together with shell scripts, I had a small, ugly piece of infrastructure. I deleted the scripts and started Akmon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Akmon is, in one paragraph
&lt;/h2&gt;

&lt;p&gt;Akmon is a single Rust binary. It is an AI coding agent (interactive TUI plus a headless &lt;code&gt;--task&lt;/code&gt; mode), built for environments where the agent's reasoning, tool calls, and file changes have to be reviewable later. Every session writes a tamper-evident, content-addressed audit chain to &lt;code&gt;.akmon/audit/&amp;lt;session-id&amp;gt;.jsonl&lt;/code&gt; and a structured evidence summary to &lt;code&gt;.akmon/evidence/&amp;lt;session-id&amp;gt;.json&lt;/code&gt;. Sessions can be replayed deterministically, compared with diff, redacted before external review, and exported as portable AGEF bundles.&lt;/p&gt;

&lt;p&gt;Akmon supports Anthropic, OpenAI, OpenRouter, Groq, Azure OpenAI, Bedrock, OpenAI-compatible endpoints, and Ollama. Model choice is operator controlled. There is no hosted runtime to log into. You run it on your laptop, your CI runner, or your hardened SSH host.&lt;/p&gt;

&lt;p&gt;License is Apache-2.0. The repo is at &lt;code&gt;github.com/radotsvetkov/akmon&lt;/code&gt;. v2.0.0 is the current line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five commands you can try this afternoon
&lt;/h2&gt;

&lt;p&gt;Install the binary, then walk this short pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run a small headless task in your project.&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
akmon &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"summarize failing tests and propose minimal fixes"&lt;/span&gt; | &lt;span class="nb"&gt;tee &lt;/span&gt;run.json

&lt;span class="c"&gt;# Verify the tamper-evident audit chain for the session.&lt;/span&gt;
akmon audit verify .akmon/audit/&amp;lt;session-id&amp;gt;.jsonl

&lt;span class="c"&gt;# Verify the evidence schema and the linkage to the audit chain.&lt;/span&gt;
akmon evidence verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json

&lt;span class="c"&gt;# Enforce a single-run SLO check.&lt;/span&gt;
akmon slo verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json &lt;span class="nt"&gt;--strict&lt;/span&gt;

&lt;span class="c"&gt;# Detect regressions vs a historical baseline.&lt;/span&gt;
akmon slo trend .akmon/evidence/&amp;lt;session-id&amp;gt;.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--baseline-dir&lt;/span&gt; .akmon/evidence/history &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--window&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--strict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five commands, five exit codes, five crisp signals. None of them depend on a dashboard. All of them gate cleanly in CI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one design decision that matters most
&lt;/h2&gt;

&lt;p&gt;Akmon ships as a single Rust binary on purpose. That choice does work most teams underestimate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime state is explicit. There is no plugin runtime to drift.&lt;/li&gt;
&lt;li&gt;Behavior is reproducible across laptops, CI runners, SSH hosts, and air-gapped environments.&lt;/li&gt;
&lt;li&gt;Troubleshooting tends to focus on policy, providers, repository state, or model behavior. Not host runtime mismatch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If two machines run the same Akmon version, you can treat them as equivalent. That sounds boring. It is the difference between a tool you can support and one you cannot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Policy is a deterministic merge
&lt;/h2&gt;

&lt;p&gt;Policy in Akmon comes from four layers, in a fixed order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Built-in profile (&lt;code&gt;dev&lt;/code&gt;, &lt;code&gt;staging&lt;/code&gt;, &lt;code&gt;prod&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Policy packs (TOML or JSON files in &lt;code&gt;.akmon/policy-packs/&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Project-local policy (&lt;code&gt;.akmon/policy.toml&lt;/code&gt; or &lt;code&gt;.akmon/policy.json&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;CLI override (&lt;code&gt;--policy-override&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Within each layer, list fields append and deduplicate while keeping the last occurrence, so higher precedence keeps later rule order. The effective policy is inspectable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon policy show-effective &lt;span class="nt"&gt;--profile&lt;/span&gt; prod &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/org.toml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-pack&lt;/span&gt; .akmon/policy-packs/team.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tickets that start with "I think the policy was" become tickets that end with "the policy was, here is the merged TOML".&lt;/p&gt;

&lt;h2&gt;
  
  
  Why an open evidence format
&lt;/h2&gt;

&lt;p&gt;After Akmon's audit chain was working, the next gap was portability. A reviewer who needs to confirm a session should not need Akmon installed. They should need a verifier and a bundle.&lt;/p&gt;

&lt;p&gt;That is what AGEF is. AGEF is the Agent Governance Evidence Format. It is an open spec, governed in its own repo at &lt;code&gt;github.com/radotsvetkov/agef&lt;/code&gt;. A bundle is a &lt;code&gt;tar.zst&lt;/code&gt; archive with &lt;code&gt;manifest.json&lt;/code&gt;, an &lt;code&gt;events.bin&lt;/code&gt; stream of length-delimited canonical CBOR records, and a directory of content-addressed &lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt; files. The current spec text is &lt;code&gt;v0.1.1&lt;/code&gt;; the wire format version is &lt;code&gt;0.1&lt;/code&gt;. Spec text is CC BY 4.0. Code is Apache-2.0.&lt;/p&gt;

&lt;p&gt;Akmon is the reference implementation. The journal substrate (&lt;code&gt;akmon-journal&lt;/code&gt;) is a Substrate Profile under AGEF v0.1. Bundle Profile (full export and import) is part of Akmon Phase 4.&lt;/p&gt;

&lt;p&gt;I will go deep on AGEF in the next post. The point for this one is that the format is real, written, and meant to outlive any one tool, including mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Akmon is for
&lt;/h2&gt;

&lt;p&gt;Three concrete audiences, in order.&lt;/p&gt;

&lt;p&gt;First, senior engineers in regulated codebases. Avionics (DO-178C, DO-330 tool qualification), medical devices (IEC 62304), automotive (ISO 26262, ASPICE), industrial control (IEC 61508), defense (CMMC). If your build process treats unreviewable AI output as inadmissible, Akmon is built for you.&lt;/p&gt;

&lt;p&gt;Second, DevSecOps and platform engineers in finance. SOC 2 evidence, internal audit on AI use, vendor risk on model providers. The trust pipeline maps cleanly to these controls.&lt;/p&gt;

&lt;p&gt;Third, AI tooling leads at companies that have rolled out AI coding tools and now have to answer the next layer of questions about provenance, regression detection, and review fatigue.&lt;/p&gt;

&lt;p&gt;If none of that fits, you might still find the format interesting. AGEF is meant to be useful beyond Akmon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Akmon is not
&lt;/h2&gt;

&lt;p&gt;A few things to keep the conversation honest.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Akmon is not the fastest autocomplete. It is the agent that records what it did. If your workflow does not need that, the trade may not be worth it.&lt;/li&gt;
&lt;li&gt;Akmon is not a hosted SaaS runtime. You run the binary. Your data stays where you put it.&lt;/li&gt;
&lt;li&gt;Akmon is not a guarantee that the model behaves. The model is the model. Akmon makes the consequences of misbehavior bounded, observable, and provable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters. Governance is not the absence of bad behavior. It is the ability to prove what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;If anything in this post hit close to home, the smallest first step is one minute long.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the binary.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;akmon --yes --task "..."&lt;/code&gt; on a sandbox project.&lt;/li&gt;
&lt;li&gt;Run the five-command trust pipeline above.&lt;/li&gt;
&lt;li&gt;Open the AGEF section of the docs and read what a &lt;code&gt;manifest.json&lt;/code&gt; and an &lt;code&gt;events.bin&lt;/code&gt; look like. Pretend you are the reviewer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If that bundle answers the five questions in the second section without your help, you have closed the gap I described at the start of this post.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. &lt;br&gt;
The site lives at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>AGEF explained: a portable evidence format for AI agent sessions</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Wed, 06 May 2026 20:57:19 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/agef-explained-a-portable-evidence-format-for-ai-agent-sessions-2no5</link>
      <guid>https://dev.to/radotsvetkov/agef-explained-a-portable-evidence-format-for-ai-agent-sessions-2no5</guid>
      <description>&lt;p&gt;If you ship AI-assisted code in a regulated codebase and somebody asks "show me what the agent did", you have about a week before that question turns into a finding. The data exists somewhere. It is not in a single shape. It is rarely portable. It is almost never tamper evident.&lt;/p&gt;

&lt;p&gt;AGEF is the spec I wrote to fix that. It stands for Agent Governance Evidence Format. The current text is &lt;code&gt;v0.1.1&lt;/code&gt; (pre-stable), with wire format &lt;code&gt;agef_version: "0.1"&lt;/code&gt;. The repo is at &lt;code&gt;github.com/radotsvetkov/agef&lt;/code&gt;. Spec text is CC BY 4.0. Code is Apache-2.0. Akmon is the reference implementation; &lt;code&gt;akmon-journal&lt;/code&gt; is a Substrate Profile, and Akmon Phase 4 brings full Bundle Profile support.&lt;/p&gt;

&lt;p&gt;This article walks the spec end to end, with code, and with the design choices I made.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AGEF actually is
&lt;/h2&gt;

&lt;p&gt;AGEF defines how one AI agent session can be represented as a portable, tamper-evident bundle. A session is a logical run from &lt;code&gt;SessionStart&lt;/code&gt; to &lt;code&gt;SessionEnd&lt;/code&gt;. The bundle captures every event in order, with cryptographic linkage and content-addressed payloads.&lt;/p&gt;

&lt;p&gt;The bundle is a &lt;code&gt;tar.zst&lt;/code&gt; archive with three top-level paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;manifest.json&lt;/code&gt;, a small UTF-8 JSON file with sorted keys and LF newlines.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events.bin&lt;/code&gt;, an ordered stream of length-delimited canonical CBOR event records.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt;, a directory of content-addressed object files (one per hash).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the whole shape. Verifiers ignore unknown non-normative files unless explicitly configured to reject them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The manifest
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agef_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"producer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"akmon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2.0.0"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"session"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"550e8400-e29b-41d4-a716-446655440000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"head"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5c1f8b2a3d4e7f6a8b1d3e2c4f6a8b9d2c3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-06T09:14:02Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ended_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-06T09:14:18Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hash_algorithm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A reader who has never seen the producer can answer four things from this object: which version of the format, who wrote it, when, and how big it is. Writers must keep counts honest. Readers must reject malformed or incomplete required fields.&lt;/p&gt;

&lt;p&gt;Default hash algorithm is &lt;code&gt;sha256&lt;/code&gt;. Readers must support &lt;code&gt;sha256&lt;/code&gt; and may support &lt;code&gt;blake3&lt;/code&gt;. Bundles that declare unsupported algorithms must be rejected. Hex appears only in the manifest (&lt;code&gt;session.head&lt;/code&gt;) and in object filenames. Hashes inside CBOR-encoded events are 32-byte byte strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  The events stream
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;events.bin&lt;/code&gt; is the substance. It is a sequence of records, each framed by a 4-byte unsigned big-endian length prefix followed by exactly that many bytes of canonical CBOR encoding one event.&lt;/p&gt;

&lt;p&gt;Length-delimited framing was a deliberate trade. It supports partial recovery from truncation and lets a verifier scan deterministically. Producers commit to canonical CBOR per RFC 8949.&lt;/p&gt;

&lt;p&gt;Each event encodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;parents&lt;/code&gt;, an array of zero or more parent event hashes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kind&lt;/code&gt;, one of the closed event kinds in v0.1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;emitted_at&lt;/code&gt;, a CBOR tag 1 timestamp (integer epoch seconds, or floating point for sub-second precision).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sequence&lt;/code&gt;, monotonic per session, starting at 0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Linkage rules are strict in v0.1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sequence&lt;/code&gt; starts at 0 and increases by exactly 1 per event.&lt;/li&gt;
&lt;li&gt;Every event except &lt;code&gt;SessionStart&lt;/code&gt; has at least one parent.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SessionStart&lt;/code&gt; has exactly zero parents.&lt;/li&gt;
&lt;li&gt;Every non-&lt;code&gt;SessionStart&lt;/code&gt; event in v0.1 has exactly one parent, and that parent is the hash of the immediately preceding event by sequence. Multi-parent events are reserved for future versions.&lt;/li&gt;
&lt;li&gt;Event hashes are computed over canonical CBOR bytes of the full event envelope.&lt;/li&gt;
&lt;li&gt;Event ordering in &lt;code&gt;events.bin&lt;/code&gt; matches &lt;code&gt;sequence&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Event kinds in v0.1 (closed set)
&lt;/h2&gt;

&lt;p&gt;Readers must recognize exactly these kinds in v0.1. Unknown kinds are rejected.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SessionStart&lt;/code&gt;: opens the session, fields &lt;code&gt;cwd_hash&lt;/code&gt; and &lt;code&gt;config_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;UserTurn&lt;/code&gt;: a user prompt, field &lt;code&gt;prompt_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ProviderCall&lt;/code&gt;: a model provider call, field &lt;code&gt;provider_id&lt;/code&gt;, an &lt;code&gt;attempts[]&lt;/code&gt; array of &lt;code&gt;AttemptRecord&lt;/code&gt;, optional &lt;code&gt;stream_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ToolCall&lt;/code&gt;: a tool execution, fields &lt;code&gt;tool_id&lt;/code&gt;, &lt;code&gt;input_hash&lt;/code&gt;, &lt;code&gt;output_hash&lt;/code&gt;, optional &lt;code&gt;side_effects_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RetrievalCall&lt;/code&gt;: a retrieval, fields &lt;code&gt;index_id&lt;/code&gt;, &lt;code&gt;query_hash&lt;/code&gt;, &lt;code&gt;results_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PermissionGate&lt;/code&gt;: a policy decision, fields &lt;code&gt;policy_id&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt; (string, recommended lowercase verbs like &lt;code&gt;allowed&lt;/code&gt;, &lt;code&gt;denied&lt;/code&gt;, &lt;code&gt;deferred&lt;/code&gt;), &lt;code&gt;context_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AssistantTurn&lt;/code&gt;: an assistant message, fields &lt;code&gt;message_hash&lt;/code&gt;, optional &lt;code&gt;tool_calls_hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SessionEnd&lt;/code&gt;: closes the session, optional &lt;code&gt;summary_hash&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;ProviderCall.attempts[]&lt;/code&gt; preserves chronological order. Each &lt;code&gt;AttemptRecord&lt;/code&gt; has &lt;code&gt;attempt_number&lt;/code&gt; (1-indexed), &lt;code&gt;started_at&lt;/code&gt;, &lt;code&gt;ended_at&lt;/code&gt;, a closed &lt;code&gt;AttemptStatus&lt;/code&gt;, &lt;code&gt;request_hash&lt;/code&gt;, optional &lt;code&gt;response_hash&lt;/code&gt;, optional &lt;code&gt;stream_hash&lt;/code&gt;, optional &lt;code&gt;error_message&lt;/code&gt;. &lt;code&gt;AttemptStatus&lt;/code&gt; is one of &lt;code&gt;Success&lt;/code&gt;, &lt;code&gt;RateLimited&lt;/code&gt;, &lt;code&gt;NetworkError&lt;/code&gt;, &lt;code&gt;ServerError&lt;/code&gt;, &lt;code&gt;ClientError&lt;/code&gt;, &lt;code&gt;Cancelled&lt;/code&gt;, or &lt;code&gt;Other(string)&lt;/code&gt;. v0.1 readers reject unknown variants.&lt;/p&gt;

&lt;p&gt;Two design notes worth calling out.&lt;/p&gt;

&lt;p&gt;First, &lt;code&gt;PermissionGate.decision&lt;/code&gt; is open in v0.1. The spec recommends lowercase verbs but does not close the set. Closing it is a likely v1.0 change. If you are a producer, follow the recommendation now to make the v1 transition free.&lt;/p&gt;

&lt;p&gt;Second, &lt;code&gt;AttemptStatus&lt;/code&gt; and &lt;code&gt;EventKind&lt;/code&gt; are intentionally closed. The cost is that adding a new variant is a normative spec change. The benefit is that two implementations cannot legitimately disagree on the meaning of a record.&lt;/p&gt;

&lt;h2&gt;
  
  
  The objects directory
&lt;/h2&gt;

&lt;p&gt;Every object referenced by an event hash field exists as a file at &lt;code&gt;objects/&amp;lt;hex&amp;gt;&lt;/code&gt; where &lt;code&gt;&amp;lt;hex&amp;gt;&lt;/code&gt; is the lowercase hex digest for the active hash algorithm. Object bytes hash to their filename digest. Objects are opaque bytes; AGEF v0.1 does not require MIME metadata.&lt;/p&gt;

&lt;p&gt;A typical bundle has many small objects (prompts, tool inputs, tool outputs, file diffs) and a few large ones (full file contents). Storage is efficient because identical content hashes once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hashing rules
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sha256&lt;/span&gt;
&lt;span class="na"&gt;optional&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;blake3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Readers must support &lt;code&gt;sha256&lt;/code&gt;. They may support &lt;code&gt;blake3&lt;/code&gt;. Within CBOR-encoded events, hashes must be encoded as CBOR byte strings (major type 2) of length 32 for both algorithms. Hex string representation is used only in &lt;code&gt;manifest.json&lt;/code&gt; (&lt;code&gt;session.head&lt;/code&gt;) and in object filenames. Once you internalize that, the rest of the spec falls into place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serialization rules in one paragraph
&lt;/h2&gt;

&lt;p&gt;Event payloads in &lt;code&gt;events.bin&lt;/code&gt; use canonical CBOR (RFC 8949). &lt;code&gt;manifest.json&lt;/code&gt; is UTF-8 JSON with sorted keys and LF endings. Timestamps in CBOR-encoded events use CBOR tag 1. Producers may emit either integer epoch seconds or floating point; readers must accept both. The reference implementation &lt;code&gt;akmon-journal&lt;/code&gt; v0.1 emits integer epoch seconds. Timestamps in &lt;code&gt;manifest.json&lt;/code&gt; use RFC3339 strings. Implementations may use any internal storage format; AGEF rules apply only to the bytes emitted in &lt;code&gt;events.bin&lt;/code&gt; and to the bytes used for event hashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification procedure
&lt;/h2&gt;

&lt;p&gt;A conforming verifier runs this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract the archive.&lt;/li&gt;
&lt;li&gt;Parse &lt;code&gt;manifest.json&lt;/code&gt;. Reject on schema or version failure.&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;events.bin&lt;/code&gt; using the 4-byte length-delimited framing.&lt;/li&gt;
&lt;li&gt;For each event: decode canonical CBOR, recompute event hash, verify &lt;code&gt;sequence&lt;/code&gt; monotonicity, verify all &lt;code&gt;parents&lt;/code&gt; resolve to previously seen events, verify referenced content hashes resolve to files in &lt;code&gt;objects/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For each referenced object: read bytes, hash with &lt;code&gt;manifest.hash_algorithm&lt;/code&gt;, compare to filename digest.&lt;/li&gt;
&lt;li&gt;Confirm &lt;code&gt;manifest.event_count&lt;/code&gt; equals decoded event count, &lt;code&gt;manifest.object_count&lt;/code&gt; equals object file count, &lt;code&gt;manifest.session.head&lt;/code&gt; equals terminal event hash, and &lt;code&gt;SessionStart&lt;/code&gt; is a reachable ancestor of head.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Default operation fails on the first invariant violation. Implementations may offer a "report-all" mode for diagnostics. They must not claim successful verification unless all required checks pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conformance profiles
&lt;/h2&gt;

&lt;p&gt;AGEF v0.1 defines two profiles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bundle Profile, an implementation that produces or consumes AGEF bundles.&lt;/li&gt;
&lt;li&gt;Substrate Profile, an implementation that maintains an AGEF-compatible content-addressed event journal but does not necessarily emit bundles directly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Substrate Profile must be able to produce Bundle Profile output through an export pathway when required.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;akmon-journal&lt;/code&gt; is currently a Substrate Profile. Akmon Phase 4 introduces Bundle Profile capability.&lt;/p&gt;

&lt;p&gt;If you are an implementer, knowing which profile you are claiming is the most important conformance question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Producing AGEF in practice
&lt;/h2&gt;

&lt;p&gt;If you use Akmon, you already produce AGEF. Sessions land in &lt;code&gt;.akmon/audit/&amp;lt;session-id&amp;gt;.jsonl&lt;/code&gt; and &lt;code&gt;.akmon/evidence/&amp;lt;session-id&amp;gt;.json&lt;/code&gt;. With Phase 4 export, the same session is portable as a &lt;code&gt;.akmon&lt;/code&gt; bundle.&lt;/p&gt;

&lt;p&gt;If you build your own producer, the spec lists the libraries known to produce canonical CBOR with the right configuration: &lt;code&gt;ciborium&lt;/code&gt; for Rust, &lt;code&gt;fxamacker/cbor&lt;/code&gt; for Go, &lt;code&gt;cbor2&lt;/code&gt; for Python, &lt;code&gt;cbor-x&lt;/code&gt; for JavaScript and TypeScript. Validate canonical encoding behavior with test vectors before claiming conformance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading and verifying AGEF
&lt;/h2&gt;

&lt;p&gt;For Akmon-produced bundles, the verification commands are part of the trust pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify audit chain integrity for the live session journal&lt;/span&gt;
akmon audit verify .akmon/audit/&amp;lt;session-id&amp;gt;.jsonl

&lt;span class="c"&gt;# Verify the evidence summary plus its linkage to the audit chain&lt;/span&gt;
akmon evidence verify .akmon/evidence/&amp;lt;session-id&amp;gt;.json

&lt;span class="c"&gt;# Once Phase 4 export is on hand, verify a bundle on import&lt;/span&gt;
akmon bundle import evidence.akmon &lt;span class="nt"&gt;--verify-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Independent verifier implementations are welcome. The spec is small enough to implement in a weekend if you start with a CBOR library that supports canonical encoding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security considerations
&lt;/h2&gt;

&lt;p&gt;AGEF provides tamper evidence and portability. It does not provide identity attribution by itself. If producer trust matters in your environment, layer external signing on top of the bundle. Bundles may contain sensitive content; storage and sharing controls are the operator's job. Verification proves integrity, not semantic correctness.&lt;/p&gt;

&lt;p&gt;For sharing, the redaction flow is the right answer. With Akmon:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;akmon redact &amp;lt;session-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; sanitized.akmon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--object&lt;/span&gt; &amp;lt;object-hash&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reason&lt;/span&gt; &lt;span class="s2"&gt;"Removed customer name before audit handoff"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces a derivative bundle in which the targeted objects are replaced by canonical CBOR sentinels. The sentinel payload records the original hash, original size, the supplied &lt;code&gt;reason&lt;/code&gt;, and a &lt;code&gt;redacted_at&lt;/code&gt; timestamp. The chain still verifies because the sentinel hashes correctly, and the audit trail makes the redaction explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compatibility and evolution
&lt;/h2&gt;

&lt;p&gt;v0.x is pre-stable. Breaking changes are permitted. Future versions should preserve forward migration guidance. v1.0 is the first stable major. New event kinds in future majors must be version-gated. v0.1 readers must not silently ignore unknown required semantics.&lt;/p&gt;

&lt;p&gt;If you build against v0.1, plan for one migration to v1.0. If you are a careful producer, the migration is small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a new format
&lt;/h2&gt;

&lt;p&gt;Three reasons.&lt;/p&gt;

&lt;p&gt;First, no shared shape exists today. Every framework, every vendor, every gateway writes its own. A reviewer cannot move between them. The cost of a new format is small compared to the cost of having no shared format.&lt;/p&gt;

&lt;p&gt;Second, evidence has a different audience than observability. Observability is for engineers in the moment. Evidence is for reviewers, regulators, customers, and the version of you reading the file in a year. Different shape, different guarantees, different expectations.&lt;/p&gt;

&lt;p&gt;Third, regulated engineering needs an answer that does not depend on a vendor dashboard. The portability requirement is real. AGEF is built for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do this week
&lt;/h2&gt;

&lt;p&gt;If you build AI tooling, three small steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read &lt;code&gt;SPEC.md&lt;/code&gt; end to end. It is short on purpose.&lt;/li&gt;
&lt;li&gt;Implement a Bundle Profile verifier in your language of choice. Use the test vectors that ship in the &lt;code&gt;examples/minimal-bundle&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt;Open an issue on the AGEF repo. Tell me what your runtime emits today and where the spec helps or gets in the way.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you ship a coding agent, the smallest first step is one binary and the trust pipeline. The repo is at &lt;a href="https://github.com/radotsvetkov/akmon" rel="noopener noreferrer"&gt;github.com/radotsvetkov/akmon&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The format spec is at &lt;a href="https://github.com/radotsvetkov/agef" rel="noopener noreferrer"&gt;github.com/radotsvetkov/agef&lt;/a&gt;. &lt;br&gt;
The site is at &lt;a href="https://radotsvetkov.github.io/akmon/" rel="noopener noreferrer"&gt;radotsvetkov.github.io/akmon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Standards become standards by being used. Implementations are welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>rust</category>
      <category>security</category>
    </item>
    <item>
      <title>I built a memory layer for AI assistants that refuses to fake citations</title>
      <dc:creator>Radoslav Tsvetkov</dc:creator>
      <pubDate>Tue, 05 May 2026 11:43:36 +0000</pubDate>
      <link>https://dev.to/radotsvetkov/i-built-a-memory-layer-for-ai-assistants-that-refuses-to-fake-citations-228a</link>
      <guid>https://dev.to/radotsvetkov/i-built-a-memory-layer-for-ai-assistants-that-refuses-to-fake-citations-228a</guid>
      <description>&lt;p&gt;A few months ago I was using my AI assistant to dig through my Obsidian vault. I asked it about a decision I had made on a side project, and it answered with confidence. The answer cited two notes. One of them existed but did not say what the model claimed. The other did not exist at all.&lt;/p&gt;

&lt;p&gt;I kept staring at the response for a minute, because it sounded exactly right. It matched what I half-remembered. If I had not gone to check the source, I would have used that answer in a real conversation with someone.&lt;/p&gt;

&lt;p&gt;That is the problem I want to tell you about, and the one I have spent the last few months trying to fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why "chat with your notes" tools quietly lie to you
&lt;/h3&gt;

&lt;p&gt;The standard recipe for personal RAG looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take your markdown vault.&lt;/li&gt;
&lt;li&gt;Split it into chunks.&lt;/li&gt;
&lt;li&gt;Index those chunks with keyword search and embeddings.&lt;/li&gt;
&lt;li&gt;At query time, retrieve the top few chunks.&lt;/li&gt;
&lt;li&gt;Hand them to a language model with an instruction like "answer using only the context, and cite your sources."&lt;/li&gt;
&lt;li&gt;Hope.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step six is where things go sideways. The model is not actually obligated to cite anything correctly. It is just being asked nicely. When it gets confused, it picks a note title that sounds related and uses it as a citation. When it knows there is a related concept in the vault but cannot find a clean quote, it paraphrases and attributes a paraphrase that does not exist anywhere.&lt;/p&gt;

&lt;p&gt;For chatting with Wikipedia, this is fine. For your own decisions, meeting notes, and life events, it is not. The whole point of writing things down was to have a single source of truth. If your AI layer can invent quotes from notes that do not exist, you have lost the property that made the notes useful in the first place.&lt;/p&gt;

&lt;h3&gt;
  
  
  The shift: claims, not chunks
&lt;/h3&gt;

&lt;p&gt;Memora's central idea is small but, once you accept it, kind of irreversible.&lt;/p&gt;

&lt;p&gt;Stop treating the chunk of text as the unit of memory. Treat the claim as the unit of memory.&lt;/p&gt;

&lt;p&gt;A claim is a small structured fact. It has a subject, a predicate, and an object. It is extracted from a specific note, and it carries the exact byte range of the source text it came from, plus a blake3 hash of that source. It also carries a validity window (when the claim started being true and, optionally, when it stopped) and a privacy band.&lt;/p&gt;

&lt;p&gt;When you index your vault, Memora calls a model once per note to extract these claims. They go into a SQLite database with edges between them, like "supersedes", "contradicts", and "derives_from".&lt;/p&gt;

&lt;p&gt;When you query, you do not get raw chunks back. You get claims. Each claim has an ID. The answering model is asked to cite specific claim IDs in its response.&lt;/p&gt;

&lt;p&gt;Then comes the part that, in my opinion, makes the whole thing actually useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Validation before you ever see the answer
&lt;/h3&gt;

&lt;p&gt;Before any answer reaches the user, Memora does the following:&lt;/p&gt;

&lt;p&gt;For each cited claim ID, it looks up the byte range and the source note, re-reads the exact span from your markdown on disk, and recomputes the blake3 hash. If the hash matches the one stored at extraction time, the citation stands. If it does not, the citation is stripped. If the model invented an ID that does not exist in the database, that one is stripped too.&lt;/p&gt;

&lt;p&gt;If the answer ends up with no valid citations left, the model is re-prompted, this time with only the claims that survived as context. The system enforces the citation contract through Rust types and span hashes, not through prompt obedience.&lt;/p&gt;

&lt;p&gt;This is a different trust model from "ask the model to be careful". It does not assume good behavior. It checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The boring engineering decisions
&lt;/h3&gt;

&lt;p&gt;I want to talk about a few choices that look small but pay off every single day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rust as the language.&lt;/strong&gt; I wanted a single binary you could drop on a machine and forget. No Python environment, no node_modules, no Docker required. Cargo install or download a release. The type system also makes the citation contract enforceable at compile time, which matters when you are trying to make a guarantee like "no unverified citation will ever leave this function".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite + HNSW for storage.&lt;/strong&gt; Personal vaults are not at internet scale. A few thousand notes is the realistic upper bound for a long time. SQLite handles claims and edges fine. HNSW handles vector search fine. There is no Kafka, no vector DB service, no infrastructure for you to run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Obsidian as the substrate.&lt;/strong&gt; I did not want to invent another note-taking app. The vault stays in plain markdown. You can edit in Obsidian, in vim, in TextEdit. Memora watches for changes and re-extracts claims from changed notes. The notes are still yours, in a format that will outlive the project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP for integration.&lt;/strong&gt; Instead of building a chat UI, Memora exposes its tools over the Model Context Protocol. That means it works inside Claude Desktop, Cursor, and anything else that speaks MCP. You bring the chat interface you already use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Things that took longer than I expected
&lt;/h3&gt;

&lt;p&gt;Two surprises worth mentioning.&lt;/p&gt;

&lt;p&gt;The first one is that "atomic claims" is harder than it sounds. Early on I had the extractor pulling things like "the project was good", which is technically a claim but completely useless. The current extraction prompt has been through many revisions and is paired with deduplication, normalization, and a gate that filters single-claim noise out of the active challenger output. There is still room to improve. If you have ideas, I would love to hear them.&lt;/p&gt;

&lt;p&gt;The second one is that local LLMs are not yet good enough at extraction. Qwen 14B hallucinates relationships. Qwen 32B is acceptable but misses cross-region patterns. Llama 70B can match Claude Haiku quality but at significant memory cost. So the recommended setup right now is Claude Haiku for extraction (about $0.30 to index a 100-note vault) with a local model for embeddings. The fully local path works, but I want to be honest that it is not at production quality yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the project is now
&lt;/h3&gt;

&lt;p&gt;It is at v0.1.27. It is open source under Apache 2.0. It indexes 100-note vaults in 5 to 10 minutes with Claude Haiku. The active challenger surfaces decisions, contradictions, stale dependencies, and open questions in a generated atlas page in your vault, so you can keep an eye on the state of your own knowledge over time.&lt;/p&gt;

&lt;p&gt;If you live in your notes, if you have ever asked an AI tool a question about your own writing and gotten back something that was not actually there, please try it.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/radotsvetkov/memora" rel="noopener noreferrer"&gt;https://github.com/radotsvetkov/memora&lt;/a&gt;&lt;br&gt;
Architecture demo: &lt;a href="https://radotsvetkov.github.io/memora" rel="noopener noreferrer"&gt;https://radotsvetkov.github.io/memora&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am especially interested in feedback from people with messy vaults, weird folder layouts, and strong opinions about retrieval. Issues, edge cases, and design discussions are welcome on GitHub.&lt;/p&gt;

&lt;p&gt;If you read this far, thank you. I would much rather hear that I am wrong about something specific than that this looks neat.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>obsidian</category>
      <category>rust</category>
    </item>
  </channel>
</rss>
