<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Glendel Joubert Fyne Acosta</title>
    <description>The latest articles on DEV Community by Glendel Joubert Fyne Acosta (@glendel).</description>
    <link>https://dev.to/glendel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3918728%2F08510cdb-8e5a-4538-882e-6d927d1f09e5.png</url>
      <title>DEV Community: Glendel Joubert Fyne Acosta</title>
      <link>https://dev.to/glendel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/glendel"/>
    <language>en</language>
    <item>
      <title>Evidence Beats Claims: Why AI Agents Need Runtime Proof</title>
      <dc:creator>Glendel Joubert Fyne Acosta</dc:creator>
      <pubDate>Tue, 26 May 2026 01:49:17 +0000</pubDate>
      <link>https://dev.to/glendel/evidence-beats-claims-why-ai-agents-need-runtime-proof-36ep</link>
      <guid>https://dev.to/glendel/evidence-beats-claims-why-ai-agents-need-runtime-proof-36ep</guid>
      <description>&lt;p&gt;An AI agent saying &lt;em&gt;"I did it"&lt;/em&gt; is not proof that anything happened.&lt;/p&gt;

&lt;p&gt;"I sent the email."&lt;/p&gt;

&lt;p&gt;"I updated the database."&lt;/p&gt;

&lt;p&gt;"I escalated the issue."&lt;/p&gt;

&lt;p&gt;"I published the post."&lt;/p&gt;

&lt;p&gt;Those are claims.&lt;/p&gt;

&lt;p&gt;In a real production system, claims are not enough.&lt;/p&gt;

&lt;p&gt;If an AI Agent performs work that affects users, data, money, operations, or another system, the runtime must be able to prove what actually happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Language models are very good at producing confident completion statements.&lt;/p&gt;

&lt;p&gt;That confidence can be useful in conversation, but dangerous in infrastructure.&lt;/p&gt;

&lt;p&gt;A model may say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Done, I sent the email."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But what actually happened ?&lt;/p&gt;

&lt;p&gt;Maybe the email tool succeeded.&lt;/p&gt;

&lt;p&gt;Maybe the permission check failed.&lt;/p&gt;

&lt;p&gt;Maybe the API timed out.&lt;/p&gt;

&lt;p&gt;Maybe the retry limit was reached.&lt;/p&gt;

&lt;p&gt;Maybe the tool was never called.&lt;/p&gt;

&lt;p&gt;Maybe the model only assumed the action happened because that was the most natural response in the conversation.&lt;/p&gt;

&lt;p&gt;This is one of the most important differences between a demo and a production AI system.&lt;/p&gt;

&lt;p&gt;In a demo, the agent saying &lt;em&gt;"done"&lt;/em&gt; feels impressive.&lt;/p&gt;

&lt;p&gt;In production, &lt;em&gt;"done"&lt;/em&gt; needs evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Model Claims vs Runtime Evidence&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A model claim is what the AI says happened.&lt;/p&gt;

&lt;p&gt;Runtime evidence is what the system can prove happened.&lt;/p&gt;

&lt;p&gt;Those are not the same thing.&lt;/p&gt;

&lt;p&gt;A serious AI Agent system should separate them clearly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Send the customer follow-up email&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// This is only a model-generated claim&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// "Done, I sent the email."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That message is not enough.&lt;/p&gt;

&lt;p&gt;A production system should also have a runtime record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;actor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;support-agent-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;send_email&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;permission&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;granted&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;input&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;to&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer@example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;template&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;follow_up&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;status&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;providerMessageId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;msg_abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timestamp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-05-25T14:32:10Z&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auditId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;audit_789&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the system can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who requested the action&lt;/li&gt;
&lt;li&gt;which tool executed&lt;/li&gt;
&lt;li&gt;whether permission was granted&lt;/li&gt;
&lt;li&gt;what input was used&lt;/li&gt;
&lt;li&gt;what result came back&lt;/li&gt;
&lt;li&gt;when it happened&lt;/li&gt;
&lt;li&gt;what audit record proves it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between trusting text and trusting infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why This Matters&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI agents are moving from chat interfaces into real workflows.&lt;/p&gt;

&lt;p&gt;They are not just answering questions anymore.&lt;/p&gt;

&lt;p&gt;They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sending messages&lt;/li&gt;
&lt;li&gt;creating tickets&lt;/li&gt;
&lt;li&gt;updating records&lt;/li&gt;
&lt;li&gt;calling APIs&lt;/li&gt;
&lt;li&gt;reading customer data&lt;/li&gt;
&lt;li&gt;triggering workflows&lt;/li&gt;
&lt;li&gt;escalating incidents&lt;/li&gt;
&lt;li&gt;generating reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once agents do real work, organizations need more than fluent responses.&lt;/p&gt;

&lt;p&gt;They need accountability.&lt;/p&gt;

&lt;p&gt;If an agent says it updated a record, the system must prove the record was updated.&lt;/p&gt;

&lt;p&gt;If an agent says it escalated a complaint, the system must prove the escalation happened.&lt;/p&gt;

&lt;p&gt;If an agent says it sent a message, the system must prove the message was sent.&lt;/p&gt;

&lt;p&gt;Otherwise, the organization is not operating on evidence.&lt;/p&gt;

&lt;p&gt;It is operating on model confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Dangerous Failure Mode&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The dangerous failure mode is not always a loud crash.&lt;/p&gt;

&lt;p&gt;Sometimes the agent simply says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Done."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And everyone believes it.&lt;/p&gt;

&lt;p&gt;But behind the scenes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the tool failed&lt;/li&gt;
&lt;li&gt;the permission was denied&lt;/li&gt;
&lt;li&gt;the payload was invalid&lt;/li&gt;
&lt;li&gt;the API returned an error&lt;/li&gt;
&lt;li&gt;the action was never executed&lt;/li&gt;
&lt;li&gt;the workflow stopped halfway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a false sense of completion.&lt;/p&gt;

&lt;p&gt;The user thinks the task is finished.&lt;/p&gt;

&lt;p&gt;The agent thinks the task is finished.&lt;/p&gt;

&lt;p&gt;The organization acts as if the task is finished.&lt;/p&gt;

&lt;p&gt;But the runtime has no proof that the task ever happened.&lt;/p&gt;

&lt;p&gt;That is a serious reliability problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Multi-Agent Systems Make This Worse&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This problem becomes even more dangerous in Multi-Agent Systems (MAS).&lt;/p&gt;

&lt;p&gt;Imagine this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent A says it collected the customer data.&lt;/li&gt;
&lt;li&gt;Agent B uses that claim to draft a response.&lt;/li&gt;
&lt;li&gt;Agent C sends the response.&lt;/li&gt;
&lt;li&gt;Agent D summarizes the case as resolved.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If Agent A's claim was unsupported, the entire chain becomes unreliable.&lt;/p&gt;

&lt;p&gt;One unsupported claim becomes another agent's input.&lt;/p&gt;

&lt;p&gt;The error propagates across the system.&lt;/p&gt;

&lt;p&gt;By the end, the final result may look coherent, but the foundation is wrong.&lt;/p&gt;

&lt;p&gt;This is why Multi-Agent Systems need runtime evidence at every important boundary.&lt;/p&gt;

&lt;p&gt;Agents should not pass around unsupported claims as if they were facts.&lt;/p&gt;

&lt;p&gt;They should pass around claims connected to evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Architecture Pattern&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A better architecture separates three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reasoning&lt;/li&gt;
&lt;li&gt;Execution&lt;/li&gt;
&lt;li&gt;Evidence&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The AI agent reasons about what should happen.&lt;/p&gt;

&lt;p&gt;The runtime executes the action if it is allowed.&lt;/p&gt;

&lt;p&gt;The system records evidence of what actually happened.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decideNextAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;permission&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;permission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recordDeniedAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;permission&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;evidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recordEvidence&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;permission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;result&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summarizeResult&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;evidenceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model can still explain the result to the user.&lt;/p&gt;

&lt;p&gt;But the explanation is now grounded in runtime evidence.&lt;/p&gt;

&lt;p&gt;The agent is no longer saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Trust me."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Here is what happened, and here is the evidence."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Runtime Evidence Should Include&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;At minimum, an evidence record should capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;actor identity&lt;/li&gt;
&lt;li&gt;requested action&lt;/li&gt;
&lt;li&gt;permission result&lt;/li&gt;
&lt;li&gt;tool or workflow used&lt;/li&gt;
&lt;li&gt;input payload&lt;/li&gt;
&lt;li&gt;execution result&lt;/li&gt;
&lt;li&gt;timestamps&lt;/li&gt;
&lt;li&gt;failure reason, if any&lt;/li&gt;
&lt;li&gt;retry attempts&lt;/li&gt;
&lt;li&gt;audit/reference ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For sensitive systems, it may also include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;approval record&lt;/li&gt;
&lt;li&gt;policy version&lt;/li&gt;
&lt;li&gt;resource identifier&lt;/li&gt;
&lt;li&gt;provider response metadata&lt;/li&gt;
&lt;li&gt;verification result&lt;/li&gt;
&lt;li&gt;human review state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to create bureaucracy.&lt;/p&gt;

&lt;p&gt;The goal is to make AI work inspectable, debuggable, and trustworthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Rule&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A simple rule for production AI systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If the agent claims an external action happened, the runtime should have evidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No evidence means the claim is unsupported.&lt;/p&gt;

&lt;p&gt;Not necessarily false.&lt;/p&gt;

&lt;p&gt;But unsupported.&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;An unsupported claim should not be treated as completed work.&lt;/p&gt;

&lt;p&gt;It should trigger one of three outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry&lt;/li&gt;
&lt;li&gt;verify&lt;/li&gt;
&lt;li&gt;escalate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is how AI Systems become operationally reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;From Chatbots To Organizational AI Systems&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Chatbots can get away with claims.&lt;/p&gt;

&lt;p&gt;Organizational AI Systems cannot.&lt;/p&gt;

&lt;p&gt;When AI agents operate inside real organizations, they need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;li&gt;execution boundaries&lt;/li&gt;
&lt;li&gt;audit trails&lt;/li&gt;
&lt;li&gt;verification gates&lt;/li&gt;
&lt;li&gt;runtime evidence&lt;/li&gt;
&lt;li&gt;human escalation paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The more responsibility we give agents, the more important evidence becomes.&lt;/p&gt;

&lt;p&gt;A confident answer is not enough.&lt;/p&gt;

&lt;p&gt;A fluent summary is not enough.&lt;/p&gt;

&lt;p&gt;A completed-looking workflow is not enough.&lt;/p&gt;

&lt;p&gt;The system must be able to prove what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI Agents should reason.&lt;/p&gt;

&lt;p&gt;Runtimes should execute.&lt;/p&gt;

&lt;p&gt;Evidence should prove.&lt;/p&gt;

&lt;p&gt;That separation is what turns agent behavior from conversation into infrastructure.&lt;/p&gt;

&lt;p&gt;If we want AI Agents to operate inside real organizations, we need to stop treating model-generated claims as proof of completed work.&lt;/p&gt;

&lt;p&gt;Evidence beats claims.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>multiagent</category>
      <category>architecture</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI Agents Don't Have Permissions — Runtimes Do</title>
      <dc:creator>Glendel Joubert Fyne Acosta</dc:creator>
      <pubDate>Thu, 21 May 2026 00:45:50 +0000</pubDate>
      <link>https://dev.to/glendel/ai-agents-dont-have-permissions-runtimes-do-16ag</link>
      <guid>https://dev.to/glendel/ai-agents-dont-have-permissions-runtimes-do-16ag</guid>
      <description>&lt;p&gt;Right now, many Multi-Agent Systems are implementing permissions inside prompts.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You may access the CRM."&lt;/p&gt;

&lt;p&gt;"You are allowed to send emails."&lt;/p&gt;

&lt;p&gt;"Do not modify billing records."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is becoming one of the biggest architectural mistakes in modern AI systems.&lt;/p&gt;

&lt;p&gt;A prompt is not a security boundary.&lt;/p&gt;

&lt;p&gt;Language models are probabilistic reasoning engines. They are excellent at planning, summarizing, reasoning, and interpreting context. But they are not deterministic authorization systems.&lt;/p&gt;

&lt;p&gt;If your application's security model depends on the LLM consistently obeying natural-language instructions, your system does not actually have runtime governance.&lt;/p&gt;

&lt;p&gt;It has probabilistic behavior shaping.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I keep seeing architectures where the agent itself is expected to decide whether an action is allowed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
You are an AI Agent.

The user wants to delete a customer record.
The user's permissions are: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.

Should you allow this action?
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks flexible.&lt;/p&gt;

&lt;p&gt;It also creates several major problems immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompts can conflict&lt;/li&gt;
&lt;li&gt;context windows drift&lt;/li&gt;
&lt;li&gt;instructions can be overridden&lt;/li&gt;
&lt;li&gt;reasoning can hallucinate&lt;/li&gt;
&lt;li&gt;behavior changes across models&lt;/li&gt;
&lt;li&gt;authorization becomes non-auditable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And once you move into multi-agent systems, the situation becomes even worse.&lt;/p&gt;

&lt;p&gt;One agent may interpret permissions differently from another. Handoffs may lose constraints. Context summarization may remove critical security instructions entirely.&lt;/p&gt;

&lt;p&gt;Now your governance model depends on whether probabilistic agents correctly preserve natural-language policy across multiple reasoning steps.&lt;/p&gt;

&lt;p&gt;That is not enterprise architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Runtime Must Enforce Boundaries
&lt;/h2&gt;

&lt;p&gt;The AI should reason about &lt;em&gt;what&lt;/em&gt; needs to happen.&lt;/p&gt;

&lt;p&gt;The runtime should determine &lt;em&gt;whether it is allowed to happen&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This distinction is critical.&lt;/p&gt;

&lt;p&gt;A governed architecture should look more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;delete_customer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;customerId&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UnauthorizedError&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteCustomer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM may request the action.&lt;/p&gt;

&lt;p&gt;The deterministic runtime decides whether execution is permitted.&lt;/p&gt;

&lt;p&gt;That is a real security boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cognitive Layer vs The Deterministic Layer
&lt;/h2&gt;

&lt;p&gt;I think a lot of confusion in the current AI ecosystem comes from mixing these two responsibilities together.&lt;/p&gt;

&lt;p&gt;The Cognitive Layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reasoning&lt;/li&gt;
&lt;li&gt;planning&lt;/li&gt;
&lt;li&gt;interpretation&lt;/li&gt;
&lt;li&gt;summarization&lt;/li&gt;
&lt;li&gt;decision support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Deterministic Layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;li&gt;schema validation&lt;/li&gt;
&lt;li&gt;execution&lt;/li&gt;
&lt;li&gt;workflows&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;state transitions&lt;/li&gt;
&lt;li&gt;audit logs&lt;/li&gt;
&lt;li&gt;policy enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI should not govern itself.&lt;/p&gt;

&lt;p&gt;The framework must govern the AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More In Multi-Agent Systems
&lt;/h2&gt;

&lt;p&gt;Single-agent systems are already difficult to debug.&lt;/p&gt;

&lt;p&gt;Multi-agent systems amplify the problem dramatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context drift compounds&lt;/li&gt;
&lt;li&gt;handoff failures appear&lt;/li&gt;
&lt;li&gt;responsibilities blur&lt;/li&gt;
&lt;li&gt;state becomes harder to trace&lt;/li&gt;
&lt;li&gt;authorization assumptions leak between agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without deterministic runtime enforcement, governance becomes almost impossible to reason about operationally.&lt;/p&gt;

&lt;p&gt;And when systems fail, the incident report becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The model ignored the instruction."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No serious infrastructure team will accept that as a security architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Organizational AI Systems Need Runtime Authority
&lt;/h2&gt;

&lt;p&gt;As AI systems move into real organizations, governance stops being optional.&lt;/p&gt;

&lt;p&gt;Enterprises need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;auditability&lt;/li&gt;
&lt;li&gt;traceability&lt;/li&gt;
&lt;li&gt;deterministic enforcement&lt;/li&gt;
&lt;li&gt;runtime evidence&lt;/li&gt;
&lt;li&gt;policy validation&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Natural-language instructions alone cannot provide these guarantees.&lt;/p&gt;

&lt;p&gt;The future of Organizational AI Systems will depend on separating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;probabilistic reasoning
from&lt;/li&gt;
&lt;li&gt;deterministic governance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI Agents should reason.&lt;/p&gt;

&lt;p&gt;Runtimes should govern.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>multiagent</category>
      <category>architecture</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The AI FOMO Trap: Why your Multi-Agent System is brittle (and how to fix it)</title>
      <dc:creator>Glendel Joubert Fyne Acosta</dc:creator>
      <pubDate>Thu, 14 May 2026 00:45:30 +0000</pubDate>
      <link>https://dev.to/glendel/the-ai-fomo-trap-why-your-multi-agent-system-is-brittle-and-how-to-fix-it-20o7</link>
      <guid>https://dev.to/glendel/the-ai-fomo-trap-why-your-multi-agent-system-is-brittle-and-how-to-fix-it-20o7</guid>
      <description>&lt;p&gt;A developer on Reddit recently told me: "&lt;em&gt;Companies right now are risking the LLM-led parts of their architecture due to FOMO. We'll see how far they get&lt;/em&gt;".&lt;/p&gt;

&lt;p&gt;He is absolutely right. Fear Of Missing Out is driving engineering teams to ship "Autonomous Agents" at breakneck speed. But in the rush to production, we are abandoning 20 years of established software engineering principles.&lt;/p&gt;

&lt;p&gt;We are letting probabilistic models control deterministic runtimes.&lt;/p&gt;

&lt;p&gt;If you are routing network traffic, validating data schemas, or checking user permissions using an LLM prompt, you are not building a resilient system. You are building a fragile prompt-chain wrapped in hope. When it fails (and it will), it will be slow, expensive, and completely un-auditable. InfoSec won't accept "the model hallucinated the auth check" as a valid incident report.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;The Cure: The Manager-Executor Pattern&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;To build enterprise-grade Multi-Agent Systems, we must separate the &lt;em&gt;Cognitive&lt;/em&gt; from the &lt;em&gt;Deterministic&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Manager (Probabilistic)&lt;/strong&gt; This is the LLM. Its only job is to reason, plan, and analyze context. It decides &lt;em&gt;what needs to be done&lt;/em&gt;. It does not execute code. It does not manage its own memory. It requests actions via strict JSON schemas.&lt;br&gt;
&lt;strong&gt;2. The Executor (Deterministic)&lt;/strong&gt; This is your runtime framework. It acts as the boundary. When the Manager requests an action, the Executor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies the agent's permissions.&lt;/li&gt;
&lt;li&gt;Validates the payload against a strict schema.&lt;/li&gt;
&lt;li&gt;Checks the token/cost budget.&lt;/li&gt;
&lt;li&gt;Executes the code (API call, DB write).&lt;/li&gt;
&lt;li&gt;Returns the exact result to the Manager.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;The Framework Controls the AI&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The fundamental shift required in MAS architecture is understanding that &lt;strong&gt;the framework must control the LLM; the LLM must never control the framework&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Right now, developers are having to build these custom state machines and validation layers from scratch because popular frameworks default to LLM-routing. It's time we standardize this. We need "&lt;strong&gt;A Real Framework&lt;/strong&gt;" for Multi-Agent Systems—a framework that enforces the Manager-Executor pattern by default.&lt;/p&gt;

&lt;p&gt;Stop relying on vibes-based engineering. Let's get back to rigorous software architecture.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>softwareengineering</category>
      <category>programming</category>
    </item>
    <item>
      <title>The Token Waste Problem: Why your AI Agents shouldn't evaluate permissions</title>
      <dc:creator>Glendel Joubert Fyne Acosta</dc:creator>
      <pubDate>Sat, 09 May 2026 00:47:02 +0000</pubDate>
      <link>https://dev.to/glendel/the-token-waste-problem-why-your-ai-agents-shouldnt-evaluate-permissions-2a2c</link>
      <guid>https://dev.to/glendel/the-token-waste-problem-why-your-ai-agents-shouldnt-evaluate-permissions-2a2c</guid>
      <description>&lt;p&gt;We are burning millions of API tokens on problems that &lt;code&gt;if&lt;/code&gt; statements solved 20 years ago.&lt;/p&gt;

&lt;p&gt;I speak with developers building Multi-Agent Systems (MAS) every day, and I keep seeing the same massive architectural anti-pattern: &lt;strong&gt;Routing everything through the AI model.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Need to check an agent's permissions? "Ask the LLM."&lt;/li&gt;
&lt;li&gt;  Need to route a message? "Ask the LLM."&lt;/li&gt;
&lt;li&gt;  Need to validate a data schema? "Ask the LLM."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Language models are extraordinary reasoning engines. But they are also expensive, probabilistic, and relatively slow. If a problem has a deterministic, correct answer (like checking an access policy), it should be evaluated by runtime code, not guessed by a neural network.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Anti-Pattern
&lt;/h3&gt;

&lt;p&gt;Instead of doing this (Probabilistic):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BAD: Asking the LLM to check permissions&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are an agent. The user wants to delete a file. 
Here are their permissions: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. 
Should you allow it?`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;We need to get back to doing this (Deterministic):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// GOOD: Let code handle policy, let AI handle reasoning&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hasPermission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;delete_file&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unauthorized&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Only call the LLM for actual cognitive tasks&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reasonAboutFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI should decide &lt;em&gt;what&lt;/em&gt; to do. Deterministic code should execute it and enforce the boundaries.&lt;/p&gt;

&lt;p&gt;Are we forgetting basic software engineering principles just because AI is exciting? The MAS space doesn't need more wrappers; we need standardized frameworks that enforce these boundaries. Let's get back to building solid infrastructure.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>softwaredevelopment</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
