<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MartinLyu</title>
    <description>The latest articles on DEV Community by MartinLyu (@dislovelhl).</description>
    <link>https://dev.to/dislovelhl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3972999%2Fb9c2ff52-8420-42b2-9a08-c7e142f56791.png</url>
      <title>DEV Community: MartinLyu</title>
      <link>https://dev.to/dislovelhl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dislovelhl"/>
    <language>en</language>
    <item>
      <title>"Guardrails decide what an AI agent says. Receipts decide what it did."</title>
      <dc:creator>MartinLyu</dc:creator>
      <pubDate>Sun, 07 Jun 2026 21:00:00 +0000</pubDate>
      <link>https://dev.to/dislovelhl/guardrails-decide-what-an-ai-agent-says-receipts-decide-what-it-did-27jk</link>
      <guid>https://dev.to/dislovelhl/guardrails-decide-what-an-ai-agent-says-receipts-decide-what-it-did-27jk</guid>
      <description>&lt;p&gt;In the last few months the AI-agent safety conversation moved. It used to be&lt;br&gt;
about the model — prompts, refusals, classifiers. Now the industry is naming a&lt;br&gt;
different layer out loud. The Cloud Security Alliance is writing about going&lt;br&gt;
"from guardrails to governance" and the need for a &lt;em&gt;control layer&lt;/em&gt;. Microsoft&lt;br&gt;
shipped an open-source Agent Governance Toolkit for runtime policy enforcement.&lt;br&gt;
Galileo announced an open-source "control plane for AI agents." Gartner is&lt;br&gt;
warning that ~40% of enterprises will pull autonomous agents back, and Deloitte&lt;br&gt;
puts mature agentic governance at ~21% of organizations. And there is a clock on&lt;br&gt;
it: the EU AI Act's high-risk obligations apply from &lt;strong&gt;August 2, 2026&lt;/strong&gt;, with&lt;br&gt;
Article 12 requiring automatic event logging over a system's lifetime.&lt;/p&gt;

&lt;p&gt;That is the market &lt;a href="https://github.com/dislovelhl/ACGS" rel="noopener noreferrer"&gt;&lt;code&gt;gove-zone&lt;/code&gt;&lt;/a&gt; was built&lt;br&gt;
for. But "governance" is now a crowded word, and most of the new entrants govern&lt;br&gt;
the &lt;em&gt;perimeter&lt;/em&gt; — what an agent is allowed to attempt. &lt;code&gt;gove-zone&lt;/code&gt; governs&lt;br&gt;
something narrower and harder: &lt;strong&gt;whether a specific side effect was legitimate,&lt;br&gt;
with evidence you can verify afterward.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two distinctions make the difference concrete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Distinction 1 — Guardrails moderate the message. Receipts gate the action.
&lt;/h2&gt;

&lt;p&gt;A guardrail sits on the model's output. It shapes, filters, or blocks &lt;em&gt;text&lt;/em&gt;:&lt;br&gt;
the prompt, the structured response, the tool &lt;em&gt;request&lt;/em&gt;. That is genuinely&lt;br&gt;
useful, and it is the right tool for "don't say that." But it lives on the wrong&lt;br&gt;
side of the line for "don't &lt;em&gt;do&lt;/em&gt; that." By the time a &lt;code&gt;tools/call&lt;/code&gt; leaves the&lt;br&gt;
model, the interesting question is no longer what the model intended — it is&lt;br&gt;
whether &lt;em&gt;this exact actor&lt;/em&gt; may run &lt;em&gt;this exact action&lt;/em&gt; with &lt;em&gt;these exact&lt;br&gt;
arguments&lt;/em&gt; under &lt;em&gt;this exact policy evidence&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gove-zone&lt;/code&gt; answers that question at the executor boundary, not the prompt.&lt;br&gt;
In its own framing: &lt;strong&gt;guardrails moderate content; ACGS enforces execution&lt;br&gt;
legitimacy.&lt;/strong&gt; The governed executor fails closed without a valid receipt, and a&lt;br&gt;
receipt binds the actor, the action, and the exact arguments the executor checks.&lt;/p&gt;

&lt;p&gt;A worked example from the&lt;br&gt;
&lt;a href="https://github.com/dislovelhl/ACGS/blob/master/docs/DECISION_RECEIPT_SPEC.md" rel="noopener noreferrer"&gt;Decision Receipt spec&lt;/a&gt;:&lt;br&gt;
a receipt issued for &lt;code&gt;{"path":"/tmp/safe.txt","content":"ok"}&lt;/code&gt; will not&lt;br&gt;
authorize &lt;code&gt;{"path":"/etc/shadow","content":"pwned"}&lt;/code&gt;. Same action name,&lt;br&gt;
different arguments — the gate catches it as an argument mismatch &lt;em&gt;before&lt;/em&gt; any&lt;br&gt;
side effect. A guardrail watching the model's text has no equivalent move,&lt;br&gt;
because the substitution can happen anywhere between the request and the syscall.&lt;/p&gt;

&lt;p&gt;The two are complementary, not rivals. A real stack runs both: guardrails for&lt;br&gt;
what the agent says, receipts for what it is permitted to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Distinction 2 — An audit log is a narrative. A Decision Receipt is a gate.
&lt;/h2&gt;

&lt;p&gt;The usual answer to agent accountability is logging: let the action run, write&lt;br&gt;
a line, reconstruct later. The EU AI Act's Article 12 even mandates it. But a&lt;br&gt;
log is a &lt;em&gt;story told after the fact&lt;/em&gt;. It cannot stop anything, and if it is&lt;br&gt;
mutable it cannot even prove what happened. The recurring industry phrase right&lt;br&gt;
now — "you need an audit trail before August 2, and the part most teams haven't&lt;br&gt;
built is the &lt;em&gt;verifiable&lt;/em&gt; part" — is pointing at exactly this gap.&lt;/p&gt;

&lt;p&gt;A Decision Receipt closes it by collapsing two systems into one object:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The receipt is the gate&lt;/strong&gt;, evaluated &lt;em&gt;before&lt;/em&gt; the side effect — so the audit
artifact and the enforcement decision are the same thing, not two systems that
drift apart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The audit chain is tamper-evident&lt;/strong&gt;: local audit events are hash-chained,
and corrupting an entry breaks verification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decisions are replayable&lt;/strong&gt; where the raw call context is retained, so "what
was allowed, and on what evidence?" is verifiable.&lt;/li&gt;
&lt;li&gt;For higher-assurance contexts, &lt;strong&gt;opt-in Ed25519 signing&lt;/strong&gt; makes authority
cryptographically attributable rather than merely recorded.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The slogan version: logs observe; receipts gate and audit. You do not get to ask&lt;br&gt;
a log to refuse an action. You can ask a receipt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this sits next to the new "governance" tools
&lt;/h2&gt;

&lt;p&gt;The honest framing is &lt;em&gt;combine, don't replace&lt;/em&gt;. Perimeter policy engines,&lt;br&gt;
MCP-transport wrappers, IAM, sandboxing, content guardrails, and SIEM/WORM&lt;br&gt;
retention each own a real job. &lt;code&gt;gove-zone&lt;/code&gt; is the &lt;strong&gt;execution-legitimacy layer&lt;/strong&gt;&lt;br&gt;
underneath them: it binds the actor, action, arguments, policy, validator,&lt;br&gt;
authority, receipt, and audit evidence to one decision, and it fails closed. It&lt;br&gt;
does not authenticate principals (that is IAM), it does not contain execution&lt;br&gt;
(that is sandboxing), and it does not moderate model text (that is guardrails).&lt;br&gt;
It proves a specific side-effect decision.&lt;/p&gt;

&lt;p&gt;So when a team adopts Microsoft's toolkit or a control-plane product for&lt;br&gt;
perimeter policy, the open question that remains is: &lt;em&gt;when the action actually&lt;br&gt;
runs, is there a verifiable receipt binding it to the authority that allowed&lt;br&gt;
it?&lt;/em&gt; That is the slot &lt;code&gt;gove-zone&lt;/code&gt; fills.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest boundary
&lt;/h2&gt;

&lt;p&gt;Honesty is part of the design, so this is not a footnote. &lt;code&gt;gove-zone&lt;/code&gt; is&lt;br&gt;
&lt;strong&gt;alpha&lt;/strong&gt; (&lt;code&gt;0.1.0.dev0&lt;/code&gt;). Everything above is real, locally reproducible&lt;br&gt;
engineering evidence — not a maturity claim. Per the repository's own&lt;br&gt;
&lt;a href="https://github.com/dislovelhl/ACGS/blob/master/docs/CLAIMS.md" rel="noopener noreferrer"&gt;claim ledger&lt;/a&gt;,&lt;br&gt;
this project is &lt;strong&gt;not&lt;/strong&gt; production-certified, &lt;strong&gt;not&lt;/strong&gt; compliance-certified, and&lt;br&gt;
&lt;strong&gt;not&lt;/strong&gt; regulator-approved. It is a local kernel, not a managed production&lt;br&gt;
service. Signing mode is opt-in and assumes integrator-owned identity and key&lt;br&gt;
management. The Aug 2 deadline is a reason to &lt;em&gt;build the proof layer now&lt;/em&gt;; it is&lt;br&gt;
not a certificate &lt;code&gt;gove-zone&lt;/code&gt; can issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  See it gate something
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;tmp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;mktemp&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv run &lt;span class="nt"&gt;--package&lt;/span&gt; gove-zone gove-zone smoke &lt;span class="nt"&gt;--audit&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$tmp&lt;/span&gt;&lt;span class="s2"&gt;/audit.jsonl"&lt;/span&gt;
uv run &lt;span class="nt"&gt;--package&lt;/span&gt; gove-zone python packages/gove-zone/examples/receipt-gated-execution/demo.py
uv run &lt;span class="nt"&gt;--package&lt;/span&gt; gove-zone python examples/tamper_demo/demo.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A safe &lt;code&gt;write_file&lt;/code&gt; is allowed; an &lt;code&gt;id_rsa&lt;/code&gt; write is denied &lt;em&gt;before any side&lt;br&gt;
effect&lt;/em&gt;; both decisions verify as a hash-linked chain — and tampering with the&lt;br&gt;
evidence makes verification fail.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxnoxkia8s1ri0oa3shg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxnoxkia8s1ri0oa3shg.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The category is forming around "control layer" and "verifiable audit trail."&lt;br&gt;
&lt;code&gt;gove-zone&lt;/code&gt;'s bet is that the unit of control should be a receipt you can check,&lt;br&gt;
expire, sign, and replay — not a policy you hope held and a log you hope is&lt;br&gt;
true. Clone it, run the proof path, and try to make it fail open:&lt;br&gt;
&lt;a href="https://github.com/dislovelhl/ACGS" rel="noopener noreferrer"&gt;github.com/dislovelhl/ACGS&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
      <category>governance</category>
    </item>
  </channel>
</rss>
