<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anshuman Kumar</title>
    <description>The latest articles on DEV Community by Anshuman Kumar (@anshumankumar14).</description>
    <link>https://dev.to/anshumankumar14</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1509723%2Fedc48cfc-9061-445b-b8ad-8af42f34ebd9.jpg</url>
      <title>DEV Community: Anshuman Kumar</title>
      <link>https://dev.to/anshumankumar14</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anshumankumar14"/>
    <language>en</language>
    <item>
      <title>I Built a Runtime Governance Tool for AI Agents — Here's Why Your Agents Need It</title>
      <dc:creator>Anshuman Kumar</dc:creator>
      <pubDate>Thu, 07 May 2026 22:11:27 +0000</pubDate>
      <link>https://dev.to/anshumankumar14/i-built-a-runtime-governance-tool-for-ai-agents-heres-why-your-agents-need-it-3e5o</link>
      <guid>https://dev.to/anshumankumar14/i-built-a-runtime-governance-tool-for-ai-agents-heres-why-your-agents-need-it-3e5o</guid>
      <description>&lt;p&gt;&lt;strong&gt;Your LangChain agent just ran &lt;code&gt;rm -rf /&lt;/code&gt;. It was supposed to list files.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical. AI agents call tools — shell commands, database queries, payment APIs, file operations. Every tool call is a potential security incident. And right now, most agents have zero runtime enforcement.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;ShadowAudit&lt;/strong&gt; to fix this. It's a deterministic, offline-first governance layer that sits between your agent and its tools. If a call exceeds your risk threshold, it's blocked. No LLM calls. No cloud dependencies. No API keys.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;shadowaudit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Problem: Agents Are Unguarded
&lt;/h2&gt;

&lt;p&gt;When you build an AI agent, you give it tools. A shell tool. A database tool. A payment API tool. The agent decides which tool to call and with what parameters. That's the whole point — autonomy.&lt;/p&gt;

&lt;p&gt;But autonomy without guardrails is negligence.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What the agent should do&lt;/th&gt;
&lt;th&gt;What the agent might do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ls -la /var/log&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;rm -rf /var/log&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SELECT * FROM users WHERE id=123&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;DROP TABLE users&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;transfer $10 to vendor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;transfer $10,000 to unknown_account&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Current solutions fall short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt engineering&lt;/strong&gt; — "Please don't do anything dangerous." Agents ignore this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-based guardrails&lt;/strong&gt; — Probabilistic, slow, expensive, requires API calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop&lt;/strong&gt; — Doesn't scale. You can't review 10,000 agent decisions per hour.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you need is &lt;strong&gt;deterministic, runtime enforcement&lt;/strong&gt; that works offline and blocks dangerous calls before they execute.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ShadowAudit Does
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent → ShadowAudit Gate → Tool (allowed)
                         → Blocked (AgentActionBlocked raised)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ShadowAudit evaluates every tool call against a risk taxonomy. If the risk score exceeds the threshold, the call is blocked. The decision is logged. The agent's behavioral state is updated.&lt;/p&gt;

&lt;h3&gt;
  
  
  5 Lines of Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ShellTool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shadowaudit.framework.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ShadowAuditTool&lt;/span&gt;

&lt;span class="n"&gt;safe_shell&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ShadowAuditTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ShellTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ops-agent-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;risk_category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command_execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;safe_shell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ls -la&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# ✅ Allowed
&lt;/span&gt;&lt;span class="n"&gt;safe_shell&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rm -rf /&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# ❌ AgentActionBlocked raised
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same interface as the original tool. Drop-in replacement. Zero behavior change for safe calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI for CI/CD
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scan your codebase for ungated agent tools&lt;/span&gt;
shadowaudit check ./src

&lt;span class="c"&gt;# Block deployments if high-risk tools are ungated&lt;/span&gt;
shadowaudit check ./src &lt;span class="nt"&gt;--fail-on-ungated&lt;/span&gt;

&lt;span class="c"&gt;# Generate a professional HTML assessment report&lt;/span&gt;
shadowaudit assess ./src &lt;span class="nt"&gt;--taxonomy&lt;/span&gt; financial &lt;span class="nt"&gt;--compliance&lt;/span&gt;

&lt;span class="c"&gt;# Replay agent traces through the safety gate&lt;/span&gt;
shadowaudit simulate &lt;span class="nt"&gt;--trace-file&lt;/span&gt; agent_trace.jsonl &lt;span class="nt"&gt;--compare&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop &lt;code&gt;shadowaudit check --fail-on-ungated&lt;/code&gt; into your CI pipeline. If someone commits an ungated shell tool, the build fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: Deterministic, Not Probabilistic
&lt;/h2&gt;

&lt;p&gt;Every AI safety tool today uses LLMs to evaluate risk. That's slow, expensive, and non-deterministic — the same input can produce different outputs.&lt;/p&gt;

&lt;p&gt;ShadowAudit uses &lt;strong&gt;keyword-based scoring with pluggable strategies&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Taxonomy lookup&lt;/strong&gt; — finds risk category config (keywords, threshold delta, severity)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scoring&lt;/strong&gt; — pluggable scorer computes risk score from payload content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threshold comparison&lt;/strong&gt; — score vs. taxonomy delta determines pass/fail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FSM transition&lt;/strong&gt; — fail-closed state machine: anything not an explicit pass is a block&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit log&lt;/strong&gt; — decision recorded with timestamp, agent ID, payload hash, and reason&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State update&lt;/strong&gt; — K (trust) and V (velocity) metrics updated for adaptive scoring&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is auditable. Reproducible. Explainable. The kind of thing compliance auditors actually accept.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Offline-First Matters
&lt;/h2&gt;

&lt;p&gt;ShadowAudit works fully offline. SQLite-backed state. No Redis. No cloud. No API keys.&lt;/p&gt;

&lt;p&gt;This matters because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Banks&lt;/strong&gt; run agents inside air-gapped VPCs. They can't call external APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healthcare&lt;/strong&gt; has HIPAA constraints. Agent data can't leave the network.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defense&lt;/strong&gt; contractors work in classified environments. Zero external connectivity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal teams&lt;/strong&gt; block any tool that sends data to third parties.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your governance tool requires an internet connection, you've already lost these customers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-Built Taxonomies
&lt;/h2&gt;

&lt;p&gt;ShadowAudit ships with three starter taxonomies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Taxonomy&lt;/th&gt;
&lt;th&gt;Risk Categories&lt;/th&gt;
&lt;th&gt;Example Keywords&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;General&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;shell execution, file operations, network calls&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rm&lt;/code&gt;, &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;chmod&lt;/code&gt;, &lt;code&gt;wget&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Financial&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;payments, withdrawals, PII access, account modifications&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;transfer&lt;/code&gt;, &lt;code&gt;withdraw&lt;/code&gt;, &lt;code&gt;ssn&lt;/code&gt;, &lt;code&gt;account_number&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;privilege waiver, regulatory filings, client data access&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;waive&lt;/code&gt;, &lt;code&gt;settle&lt;/code&gt;, &lt;code&gt;attorney_client&lt;/code&gt;, &lt;code&gt;file_motion&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each taxonomy has tuned thresholds. You can build custom ones interactively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shadowaudit build-taxonomy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Framework Support
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ First-class adapter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ First-class adapter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AutoGen&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔜 Next&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Agents SDK&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔜 Planned&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both adapters use duck typing — they work with any tool that has &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and &lt;code&gt;run()&lt;/code&gt;. You don't need the framework installed for the adapter to work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;133 tests&lt;/strong&gt;, 100% pass rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero flaky tests&lt;/strong&gt; — deterministic by design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ruff + mypy clean&lt;/strong&gt; — strict linting from day one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT licensed&lt;/strong&gt; — use it, modify it, build on it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.10+&lt;/strong&gt; — modern Python with no legacy baggage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;ShadowAudit is in alpha (v0.3.2). The core gate, CLI, framework adapters, and assessment tools are functional and tested. Here's the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔜 &lt;strong&gt;AutoGen adapter&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🔜 &lt;strong&gt;Behavioral anomaly detection&lt;/strong&gt; — pattern detection across sessions&lt;/li&gt;
&lt;li&gt;🔜 &lt;strong&gt;Pro dashboard&lt;/strong&gt; — team-level visibility, compliance reports, alerting&lt;/li&gt;
&lt;li&gt;🔜 &lt;strong&gt;More taxonomies&lt;/strong&gt; — healthcare, defense, e-commerce&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;shadowaudit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;📖 &lt;a href="https://github.com/AnshumanKumar14/shadowaudit-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;a href="https://pypi.org/project/shadowaudit/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📁 &lt;a href="https://github.com/AnshumanKumar14/shadowaudit-python/tree/main/examples" rel="noopener noreferrer"&gt;Examples&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;AI agents are the next attack surface. Don't wait for an incident to start governing them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://github.com/AnshumanKumar14" rel="noopener noreferrer"&gt;Anshuman Kumar&lt;/a&gt;. MIT licensed. Works offline.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
