<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: liuhaotian2024-prog</title>
    <description>The latest articles on DEV Community by liuhaotian2024-prog (@liuhaotian2024prog).</description>
    <link>https://dev.to/liuhaotian2024prog</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817544%2F1cbdca63-ad45-48d6-b5b7-41b4b38c914c.png</url>
      <title>DEV Community: liuhaotian2024-prog</title>
      <link>https://dev.to/liuhaotian2024prog</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/liuhaotian2024prog"/>
    <language>en</language>
    <item>
      <title>The CIEU Five-Tuple: Why I Modeled AI Agent Logs as Causal Units</title>
      <dc:creator>liuhaotian2024-prog</dc:creator>
      <pubDate>Wed, 11 Mar 2026 14:03:56 +0000</pubDate>
      <link>https://dev.to/liuhaotian2024prog/the-cieu-five-tuple-why-i-modeled-ai-agent-logs-as-causal-units-k86</link>
      <guid>https://dev.to/liuhaotian2024prog/the-cieu-five-tuple-why-i-modeled-ai-agent-logs-as-causal-units-k86</guid>
      <description>&lt;h1&gt;
  
  
  The CIEU Five-Tuple: Why I Modeled AI Agent Logs as Causal Units
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This is a follow-up to &lt;a href="https://dev.to/liuhaotian2024prog/why-auditing-ai-agents-requires-causal-ai-not-another-llm-269c"&gt;Why Auditing AI Agents Requires Causal AI, Not Another LLM&lt;/a&gt;. That post explained the "why." This one explains the "what" and "how."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When I was debugging the incident that led me to build K9 Audit, I had logs. Plenty of them. Timestamps, tool call names, outputs, token counts. Everything a standard observability tool would give you.&lt;/p&gt;

&lt;p&gt;None of it told me &lt;em&gt;what went wrong&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The agent had been corrupting my staging environment for 41 minutes. The logs showed every action it took. What they didn't show was the moment the agent's intent diverged from its actual execution — the causal break that turned a routine deploy task into a data corruption event.&lt;/p&gt;

&lt;p&gt;That gap is exactly what the &lt;strong&gt;CIEU (Causal Intent-Execution Unit)&lt;/strong&gt; is designed to capture.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Wrong With Event Logs
&lt;/h2&gt;

&lt;p&gt;Standard agent logs record &lt;em&gt;events&lt;/em&gt;: tool X was called, output was Y, latency was Z ms.&lt;/p&gt;

&lt;p&gt;This is useful for performance monitoring. It's nearly useless for behavioral auditing.&lt;/p&gt;

&lt;p&gt;Here's why: an agent can execute every tool call successfully, produce outputs that look valid in isolation, and still be pursuing the wrong goal — quietly, for as long as you let it run. Event logs will show green across the board.&lt;/p&gt;

&lt;p&gt;The question you actually need to answer during a post-mortem isn't "did the tool call succeed?" It's: &lt;strong&gt;"at this step, did the agent do what it said it was going to do?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To answer that, you need to have captured what the agent said it was going to do &lt;em&gt;before&lt;/em&gt; it acted.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five-Tuple
&lt;/h2&gt;

&lt;p&gt;Each CIEU is a record of one atomic agent step, structured as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CIEU = (X_t, U_t, Y*_t, Y_t+1, R_t+1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me walk through each component.&lt;/p&gt;

&lt;h3&gt;
  
  
  X_t — Context at time t
&lt;/h3&gt;

&lt;p&gt;The observable state the agent had access to when it formed its intent. This typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The current task description&lt;/li&gt;
&lt;li&gt;Any tool outputs from the previous step&lt;/li&gt;
&lt;li&gt;Relevant memory or retrieved context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why log this?&lt;/strong&gt; Because the same intent expressed in different contexts means different things. You need X_t to evaluate whether U_t was a reasonable response to the situation.&lt;/p&gt;

&lt;h3&gt;
  
  
  U_t — Intent at time t
&lt;/h3&gt;

&lt;p&gt;The agent's stated goal or plan for the current step, &lt;em&gt;before&lt;/em&gt; it executes anything.&lt;/p&gt;

&lt;p&gt;In practice, this is the reasoning trace — what the agent says it's about to do and why. With chain-of-thought models, this is often surfaced explicitly. With tool-use models, you can extract it from the pre-action scratchpad.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why log this?&lt;/strong&gt; This is the baseline against which execution gets evaluated. Without it, you have no reference point for detecting drift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Y*_t — Expected output at time t
&lt;/h3&gt;

&lt;p&gt;The output the agent predicted or described expecting, given its intent.&lt;/p&gt;

&lt;p&gt;Sometimes this is explicit ("I will write the following SQL query..."). Sometimes it's implicit and has to be inferred from U_t. K9 Audit handles both cases — if Y*_t is explicit in the trace, it's captured directly; if not, it's reconstructed from U_t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why log this?&lt;/strong&gt; Y*_t creates a testable prediction. If Y_t+1 diverges from Y*_t significantly, something went wrong between intent and execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Y_t+1 — Actual output at time t+1
&lt;/h3&gt;

&lt;p&gt;What the agent actually produced or executed. This is what standard logs already capture.&lt;/p&gt;

&lt;p&gt;The difference is that in CIEU, Y_t+1 only has meaning &lt;em&gt;in relation to&lt;/em&gt; Y*_t and U_t. Logging it in isolation tells you nothing about whether behavior was correct.&lt;/p&gt;

&lt;h3&gt;
  
  
  R_t+1 — Deviation score at time t+1
&lt;/h3&gt;

&lt;p&gt;A scalar measure of how much Y_t+1 diverged from Y*_t, given X_t and U_t.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;R_t+1 = divergence(Y_t+1, Y*_t | X_t, U_t)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;R_t+1 is computed automatically at logging time. It doesn't require a human reviewer or an LLM judge. It's a deterministic function over the logged data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the key signal for real-time alerting.&lt;/strong&gt; When R_t+1 crosses a threshold, something deviated from plan. That's when you want to be paged.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;Here's a minimal example using K9 Audit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;k9log&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;k9&lt;/span&gt;

&lt;span class="c1"&gt;# Wrap any agent step with @k9
&lt;/span&gt;&lt;span class="nd"&gt;@k9&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deploy_to_staging&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;constraint_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~/.k9log/config/deploy.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;deploy_artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# agent logic here
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_deploy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each time &lt;code&gt;deploy_artifact&lt;/code&gt; is called, K9 captures a full CIEU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cieu_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cieu_20260311_143022_a3f1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"X_t"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deploy build artifact to staging-02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"previous_output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"build_passed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"artifact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"app-v2.3.1.tar.gz"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"context_snapshot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"U_t"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Deploy app-v2.3.1.tar.gz to staging-02. Target environment verified as non-production."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Y_star_t"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"expected_env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"staging-02"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"expected_action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Y_t1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"actual_env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"production-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"actual_action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"R_t1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-11T14:30:22Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:8f3a..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;R_t+1 of 0.94 means near-total divergence from stated intent. In my incident, this kind of record would have fired an alert after the &lt;em&gt;first&lt;/em&gt; wrong action, not 41 minutes later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reading the Audit Trail
&lt;/h2&gt;

&lt;p&gt;The CLI gives you the causal view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;k9log causal &lt;span class="nt"&gt;--last&lt;/span&gt; 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csvs"&gt;&lt;code&gt;&lt;span class="k"&gt;Step&lt;/span&gt;  &lt;span class="k"&gt;Intent&lt;/span&gt;                           &lt;span class="k"&gt;Expected&lt;/span&gt;        &lt;span class="k"&gt;Actual&lt;/span&gt;          &lt;span class="k"&gt;R&lt;/span&gt;&lt;span class="err"&gt;_&lt;/span&gt;&lt;span class="k"&gt;t&lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="mf"&gt;1&lt;/span&gt;
&lt;span class="err"&gt;---&lt;/span&gt;   &lt;span class="err"&gt;-----&lt;/span&gt;                            &lt;span class="err"&gt;--------&lt;/span&gt;        &lt;span class="err"&gt;------&lt;/span&gt;          &lt;span class="err"&gt;-----&lt;/span&gt;
&lt;span class="k"&gt;t&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;9&lt;/span&gt;   &lt;span class="k"&gt;deploy&lt;/span&gt; &lt;span class="k"&gt;artifact&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="k"&gt;staging&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;02&lt;/span&gt;   &lt;span class="k"&gt;staging&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;02&lt;/span&gt;      &lt;span class="k"&gt;staging&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;02&lt;/span&gt;      &lt;span class="mf"&gt;0.02&lt;/span&gt;   &lt;span class="err"&gt;✓&lt;/span&gt;
&lt;span class="k"&gt;t&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;8&lt;/span&gt;   &lt;span class="k"&gt;run&lt;/span&gt; &lt;span class="k"&gt;smoke&lt;/span&gt; &lt;span class="k"&gt;tests&lt;/span&gt;                 &lt;span class="k"&gt;pass&lt;/span&gt;            &lt;span class="k"&gt;pass&lt;/span&gt;            &lt;span class="mf"&gt;0.01&lt;/span&gt;   &lt;span class="err"&gt;✓&lt;/span&gt;
&lt;span class="k"&gt;t&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;7&lt;/span&gt;   &lt;span class="k"&gt;tag&lt;/span&gt; &lt;span class="k"&gt;release&lt;/span&gt; &lt;span class="k"&gt;candidate&lt;/span&gt;           &lt;span class="k"&gt;staging&lt;/span&gt;         &lt;span class="k"&gt;staging&lt;/span&gt;         &lt;span class="mf"&gt;0.03&lt;/span&gt;   &lt;span class="err"&gt;✓&lt;/span&gt;
&lt;span class="k"&gt;t&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;6&lt;/span&gt;   &lt;span class="k"&gt;deploy&lt;/span&gt; &lt;span class="k"&gt;artifact&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="k"&gt;staging&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;02&lt;/span&gt;   &lt;span class="k"&gt;staging&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;02&lt;/span&gt;      &lt;span class="k"&gt;production&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;01&lt;/span&gt;   &lt;span class="mf"&gt;0.94&lt;/span&gt;   &lt;span class="err"&gt;⚠️&lt;/span&gt;  &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="k"&gt;HERE&lt;/span&gt;
&lt;span class="k"&gt;t&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;5&lt;/span&gt;   &lt;span class="k"&gt;verify&lt;/span&gt; &lt;span class="k"&gt;deployment&lt;/span&gt;               &lt;span class="k"&gt;staging&lt;/span&gt;         &lt;span class="k"&gt;production&lt;/span&gt;      &lt;span class="mf"&gt;0.91&lt;/span&gt;   &lt;span class="err"&gt;⚠️&lt;/span&gt;
&lt;span class="err"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deviation started at t-6. Everything before it was clean. This is the kind of signal that would have stopped the incident 39 minutes sooner.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Use an LLM to Judge Deviation?
&lt;/h2&gt;

&lt;p&gt;I get this question a lot.&lt;/p&gt;

&lt;p&gt;Using an LLM to evaluate another LLM's behavior introduces a second failure surface. The auditor shares the same failure modes as the agent: it can be prompted, it can hallucinate, its evaluations aren't reproducible. You'd need to audit the auditor.&lt;/p&gt;

&lt;p&gt;R_t+1 is a deterministic function. Given the same CIEU record, it always produces the same score. It's computable offline, without API calls, with no latency cost at audit time. And it can be verified independently — which matters enormously for EU AI Act Article 12 compliance, where you need to demonstrate to a regulator that your logging system actually captures what it claims to capture.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Ledger
&lt;/h2&gt;

&lt;p&gt;All CIEUs are appended to a tamper-evident ledger at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.k9log/logs/k9log.cieu.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each entry is hash-chained to the previous one. You can verify integrity at any time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;k9log verify-log
&lt;span class="c"&gt;# ✓ Chain intact: 847 records verified&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any record has been modified or deleted, the chain breaks and &lt;code&gt;verify-log&lt;/code&gt; will tell you exactly where.&lt;/p&gt;




&lt;h2&gt;
  
  
  What CIEU Is Not
&lt;/h2&gt;

&lt;p&gt;To be clear about scope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It does not prevent&lt;/strong&gt; the agent from taking wrong actions. It detects and records them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It does not replace&lt;/strong&gt; access controls, sandboxing, or human oversight for high-risk operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It does not work&lt;/strong&gt; without instrumentation — you have to wrap your agent functions with &lt;code&gt;@k9&lt;/code&gt; or use one of the integration entry points.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The constraint validation layer (via &lt;code&gt;constraint_file&lt;/code&gt;) is a separate feature that &lt;em&gt;does&lt;/em&gt; block out-of-bounds actions before they execute. But that's a topic for a separate post.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;k9audit-hook
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;k9log&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;k9&lt;/span&gt;

&lt;span class="nd"&gt;@k9&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_agent_step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# your agent logic
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CIEU ledger starts building immediately. Run &lt;code&gt;k9log stats&lt;/code&gt; to see what's been captured.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/liuhaotian2024-prog/K9Audit" rel="noopener noreferrer"&gt;https://github.com/liuhaotian2024-prog/K9Audit&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Questions about the design, or something you'd want CIEU to capture that it currently doesn't? Drop a comment — I read everything.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;aiagents&lt;/code&gt; &lt;code&gt;python&lt;/code&gt; &lt;code&gt;opensource&lt;/code&gt; &lt;code&gt;devtools&lt;/code&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Why auditing AI agents requires causal AI, not another LLM</title>
      <dc:creator>liuhaotian2024-prog</dc:creator>
      <pubDate>Tue, 10 Mar 2026 22:21:13 +0000</pubDate>
      <link>https://dev.to/liuhaotian2024prog/why-auditing-ai-agents-requires-causal-ai-not-another-llm-269c</link>
      <guid>https://dev.to/liuhaotian2024prog/why-auditing-ai-agents-requires-causal-ai-not-another-llm-269c</guid>
      <description>&lt;h1&gt;
  
  
  Your Logs Tell You What Happened. They Don't Tell You What Should Have Happened.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Haotian Liu · March 2026&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The gap nobody talks about
&lt;/h2&gt;

&lt;p&gt;Your AI agent ran overnight. The result is wrong. You open the terminal — and you see a wall of log lines telling you exactly what the agent did, step by step.&lt;/p&gt;

&lt;p&gt;But none of those lines tell you &lt;strong&gt;what it was supposed to do&lt;/strong&gt;. And none of them tell you where it started going off the rails.&lt;/p&gt;

&lt;p&gt;This is not a logging problem. This is a &lt;strong&gt;structural gap&lt;/strong&gt; in how we think about agent observability.&lt;/p&gt;

&lt;p&gt;Logs record the &lt;em&gt;execution&lt;/em&gt;. They do not record the &lt;em&gt;intent&lt;/em&gt;. Without intent, you cannot measure deviation. Without measuring deviation, you are not auditing — you are just collecting noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why existing tools don't solve this
&lt;/h2&gt;

&lt;p&gt;Tools like LangSmith, Langfuse, and Arize are genuinely useful for what they do: tracing execution, tracking latency and cost, visualizing call chains. If you need to know how long your agent took or how many tokens it consumed, these tools are excellent.&lt;/p&gt;

&lt;p&gt;But they are built on a flat timeline model. They record &lt;em&gt;what happened&lt;/em&gt;. They do not record &lt;em&gt;what the system intended to happen&lt;/em&gt;. And crucially, most of them evaluate output quality using another LLM as a judge.&lt;/p&gt;

&lt;p&gt;This is the paradox: &lt;strong&gt;a probabilistic system cannot render certain judgment about another probabilistic system&lt;/strong&gt;. An LLM evaluator is itself uncertain. Its output varies between runs. Using it to audit an agent is like asking one suspect to verify another suspect's alibi.&lt;/p&gt;

&lt;p&gt;You cannot build forensic-grade evidence on probabilistic foundations.&lt;/p&gt;




&lt;h2&gt;
  
  
  What causal auditing actually means
&lt;/h2&gt;

&lt;p&gt;The alternative is to separate the question into two parts that can be answered deterministically:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What was the agent supposed to do?&lt;/strong&gt; This is defined explicitly, before runtime, as a set of constraints: no staging URLs in production config, trade amount below 500, file writes only within the project directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. What did the agent actually do, and how far did it deviate?&lt;/strong&gt; This is recorded at runtime by comparing every action against the pre-defined constraints.&lt;/p&gt;

&lt;p&gt;Neither question requires an LLM to answer. Both can be answered by deterministic, mathematical comparison.&lt;/p&gt;

&lt;p&gt;This is the CIEU model — &lt;strong&gt;Causal Intent-Execution Unit&lt;/strong&gt;. Every monitored action produces a five-tuple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X_t   — who acted, and under what conditions
U_t   — what the agent actually did
Y*_t  — what the agent was supposed to do (the intent contract)
Y_t+1 — what actually resulted
R_t+1 — how far the outcome diverged from intent, and why
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These five fields are written into a local ledger as a hash-chained record. Each record's SHA256 hash is embedded into the next record. Nothing can be silently modified after the fact. The chain is cryptographically verifiable.&lt;/p&gt;

&lt;p&gt;This is not a new log format. It is a different category of infrastructure: &lt;strong&gt;tamper-evident causal evidence&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real example: three silent writes
&lt;/h2&gt;

&lt;p&gt;On March 4, 2026, during a routine quant backtesting session, Claude Code attempted three times — 41 minutes apart — to write a staging environment URL into a production config file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.market-data.staging.internal/v2/ohlcv"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The syntax was valid. No exception was thrown. A conventional logger would have recorded three "file write" events and moved on — quietly corrupting every subsequent backtest result.&lt;/p&gt;

&lt;p&gt;Because the function was instrumented with a CIEU constraint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@k9&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deny_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging.internal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;allowed_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./project/**&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...all three attempts were flagged immediately, written to the ledger with severity 0.9, and made permanently traceable. The root cause was identified in under a second:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;k9log trace --last
→ seq=451  VIOLATION  _write_file
   finding: content contains forbidden pattern 'staging.internal'
&lt;/span&gt;&lt;span class="gp"&gt;   causal_proof: root cause traced to step #&lt;/span&gt;449, chain intact
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three attempts. 41 minutes apart. All recorded. All verifiable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The three moments when this matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;When something goes wrong at 3am.&lt;/strong&gt; You don't want to read 10,000 log lines. You want to run one command and see: which step deviated, from which constraint, by how much. That is what &lt;code&gt;k9log trace --last&lt;/code&gt; gives you — in under a second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you need to show proof.&lt;/strong&gt; Your agent caused a problem in production. Your client asks what happened. You pull up a terminal screenshot. It could have been edited. Nobody trusts it. A SHA256 hash-chained ledger, verified with &lt;code&gt;k9log verify-log&lt;/code&gt;, is cryptographic proof that the record has not been tampered with since it was written. That is evidence a screenshot cannot provide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you need approval to deploy.&lt;/strong&gt; Your manager asks: what happens if the agent goes out of bounds? Without a concrete answer, the project dies in the approval meeting. With CIEU constraints defined and a verifiable ledger in place, the answer is: &lt;em&gt;every action is measured against explicit rules, deviations are flagged immediately, and the record cannot be altered retroactively.&lt;/em&gt; That answer gets projects approved.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;For a Python developer, instrumentation is one decorator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@k9&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deny_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;staging.internal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_trade&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a Claude Code user, it is one JSON file in the project root — no code changes required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python -m k9log.hook"&lt;/span&gt;&lt;span class="p"&gt;}]}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python -m k9log.hook_post"&lt;/span&gt;&lt;span class="p"&gt;}]}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ledger is stored locally at &lt;code&gt;~/.k9log/logs/k9log.cieu.jsonl&lt;/code&gt;. No data leaves the machine. No tokens are consumed. No per-event billing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The boundary worth stating clearly
&lt;/h2&gt;

&lt;p&gt;CIEU auditing answers one question: &lt;strong&gt;did the agent violate the constraints you defined?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It does not answer: did the agent accomplish the goal you gave it? That question requires evaluation of task completion, which is a different problem — and one that legitimately benefits from LLM evaluation. The two approaches are not in competition. CIEU auditing provides the deterministic foundation; higher-level evaluation can be built on top.&lt;/p&gt;

&lt;p&gt;The mistake is trying to use a probabilistic evaluator as a substitute for a deterministic record. These are not interchangeable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who is this for
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Entry point&lt;/th&gt;
&lt;th&gt;Key commands&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;⭐ Claude Code user&lt;/td&gt;
&lt;td&gt;One &lt;code&gt;.claude/settings.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log trace&lt;/code&gt; / &lt;code&gt;stats&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Zero Python required. Every tool call auto-recorded. &lt;strong&gt;Unique differentiator vs all competitors.&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ Python developer&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@k9&lt;/code&gt; decorator&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log trace --last&lt;/code&gt; / &lt;code&gt;report&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;One decorator per function. Sync and async both supported.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ LangChain agent&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;K9CallbackHandler&lt;/code&gt; (3 lines)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log trace&lt;/code&gt; / &lt;code&gt;verify-log&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Native callback hook. Full CIEU records per tool call.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ High-risk business ops&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@k9&lt;/code&gt; + JSON config&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log alerts&lt;/code&gt; / &lt;code&gt;causal&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Finance, config writes, DB ops. Numeric + content constraints.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ DevOps / CI pipeline&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@k9&lt;/code&gt; + &lt;code&gt;ci_check.py&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ci_check.py&lt;/code&gt; / &lt;code&gt;verify-log&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Pipeline halts on violation. Exit code non-zero. No manual review.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ Small team debugging&lt;/td&gt;
&lt;td&gt;Any entry point&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log trace --last&lt;/code&gt; / &lt;code&gt;stats&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Root cause in under a second. No log archaeology required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ Data security&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@k9&lt;/code&gt; deny_content&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log verify-log&lt;/code&gt; / &lt;code&gt;report&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;File access control. Cryptographic proof of what was touched.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✅ Teaching / tutorials&lt;/td&gt;
&lt;td&gt;Any entry point&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log report&lt;/code&gt; / &lt;code&gt;causal&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Easiest audience to reach today. HTML report is shareable. Demo violations visually.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔲 CrewAI / AutoGen&lt;/td&gt;
&lt;td&gt;Wrapper pattern&lt;/td&gt;
&lt;td&gt;&lt;code&gt;k9log trace&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Works via &lt;code&gt;@k9&lt;/code&gt; on tool functions. Native adapters on roadmap.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔲 Enterprise compliance&lt;/td&gt;
&lt;td&gt;Full audit chain&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;k9log verify-log&lt;/code&gt; / &lt;code&gt;report&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Future use case. Needs organisational trust-building first.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;⭐ = unique differentiator    ✅ = works today    🔲 = roadmap&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;K9 Audit is open source under AGPL-3.0. The CIEU architecture is covered by U.S. Provisional Patent Application No. 63/981,777.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/liuhaotian2024-prog/K9Audit" rel="noopener noreferrer"&gt;github.com/liuhaotian2024-prog/K9Audit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install k9audit-hook&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contact:&lt;/strong&gt; &lt;a href="mailto:liuhaotian2024@gmail.com"&gt;liuhaotian2024@gmail.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;If this resonates with a problem you have hit — or if you think the approach is wrong — I want to hear from you.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
