<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vasu Dalal</title>
    <description>The latest articles on DEV Community by Vasu Dalal (@vdalal).</description>
    <link>https://dev.to/vdalal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4001076%2Fc1ecd5b8-65f7-4be2-a41c-8811bdc5a715.png</url>
      <title>DEV Community: Vasu Dalal</title>
      <link>https://dev.to/vdalal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vdalal"/>
    <language>en</language>
    <item>
      <title>I let my AI agent provision cloud infra. Then I made sure it couldn't go bankrupt doing it.</title>
      <dc:creator>Vasu Dalal</dc:creator>
      <pubDate>Fri, 26 Jun 2026 21:37:06 +0000</pubDate>
      <link>https://dev.to/vdalal/i-let-my-ai-agent-provision-cloud-infra-then-i-made-sure-it-couldnt-go-bankrupt-doing-it-g1p</link>
      <guid>https://dev.to/vdalal/i-let-my-ai-agent-provision-cloud-infra-then-i-made-sure-it-couldnt-go-bankrupt-doing-it-g1p</guid>
      <description>&lt;p&gt;A few days back I wrote about giving an autonomous agent database access and building a firewall so it couldn't &lt;code&gt;DROP TABLE&lt;/code&gt; prod. Same lesson, new surface: this time the agent had &lt;strong&gt;cloud credentials&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The failure mode isn't a destructive command here. It's spend. An agent pointed at a networking task can scan a whole range looking for hosts, then spin up a fleet of instances to do it faster. Every individual call is "authorized," your IAM role said yes. The bill is&lt;br&gt;
what eventually says no.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Two shapes, two right answers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The interesting part is that these are not the same kind of problem, so they don't get the same verdict.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The scan is never legitimate as an agent tool call.&lt;/strong&gt; An &lt;code&gt;nmap -sS -p- 10.0.0.0/16&lt;/code&gt; or a &lt;code&gt;masscan&lt;/code&gt; across a network is reconnaissance and abusive egress. There's no benign version of an agent sweeping a network at scale, so it gets &lt;strong&gt;hard-blocked&lt;/strong&gt;, deterministically, before the call runs. (A scan of your own &lt;code&gt;localhost&lt;/code&gt; is a dev check, so that's exempt.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The provisioning might be totally fine.&lt;/strong&gt; Spinning up 50 instances could be a real scale-out, or a runaway loop burning money. You can't tell from the action alone, only from the consequence. So instead of blocking it, AgentX &lt;strong&gt;pauses it for a human&lt;/strong&gt;: a 202, "held for approval," routed to whoever owns the budget. Block the thing that's never okay, escalate the thing that's sometimes okay. Gate on consequence, not identity.&lt;/p&gt;

&lt;p&gt;Both checks are zero-LLM. No model in the hot path means no latency tax and nothing to talk out of it. A runaway fleet should be caught by a rule, not a vibe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The bigger thing this closes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We keep a catalog of real, documented agent failures and triage each one: is it something an &lt;strong&gt;action&lt;/strong&gt; firewall can deterministically catch, or is it someone else's category (output hallucination, content safety, model internals)? We only build for the coverable ones, and we&lt;br&gt;
flag the rest honestly instead of faking a signature.&lt;/p&gt;

&lt;p&gt;With this release, the coverable list is &lt;strong&gt;done&lt;/strong&gt;. Every failure shape an action firewall can actually own now has a deterministic block or a human-in-the-loop escalation behind it. The honesty about what we &lt;em&gt;don't&lt;/em&gt; cover is the point, it's how you know the coverage claims are real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Verify it in 2 minutes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The network checks above run in the gateway, but the part you can prove on your own machine with no key and no account is the deterministic floor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agentx-security-sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentx_sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agentx_protect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_block&lt;/span&gt;

&lt;span class="nd"&gt;@agentx_protect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;demo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EXECUTED (DANGER):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# never reached
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please clean up: DROP TABLE users;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOCKED:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;is_block&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;       &lt;span class="c1"&gt;# -&amp;gt; True, offline, no key
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One decorator. The catastrophic call is intercepted before your function body runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Why I'm posting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same ask as last time: I want a handful of people running &lt;strong&gt;real&lt;/strong&gt; Python agents against live systems, a DB, cloud, files, money, ideally unattended, to point this at their stack and tell me where it's wrong. What would have bitten you? What shape is it still missing?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try it live (keyless): &lt;a href="https://bit.ly/agentfirewall" rel="noopener noreferrer"&gt;https://bit.ly/agentfirewall&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Community / tell me what broke: &lt;a href="https://discord.gg/PmWRTtaSx2" rel="noopener noreferrer"&gt;https://discord.gg/PmWRTtaSx2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Or just reply here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your agent never touches anything irreversible or expensive, say pass. If it does, the repro is two minutes, and a runaway cloud bill is a bad way to find out the hard way.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>security</category>
      <category>llm</category>
    </item>
    <item>
      <title>I gave my AI agent database access. Then I built a firewall so it couldn't wipe prod.</title>
      <dc:creator>Vasu Dalal</dc:creator>
      <pubDate>Wed, 24 Jun 2026 18:46:26 +0000</pubDate>
      <link>https://dev.to/vdalal/i-gave-my-ai-agent-database-access-then-i-built-a-firewall-so-it-couldnt-wipe-prod-83c</link>
      <guid>https://dev.to/vdalal/i-gave-my-ai-agent-database-access-then-i-built-a-firewall-so-it-couldnt-wipe-prod-83c</guid>
      <description>&lt;p&gt;A few months ago I gave an autonomous agent write access to a real database. It was a LangChain-style loop — plan, call a tool, observe, repeat and one of the tools ran SQL.&lt;/p&gt;

&lt;p&gt;It worked great in the demo. Then I watched it, during a "clean up the test rows" task, generate this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
sql
DROP TABLE users;

It didn't run (staging, and I was watching). But the lesson landed: the LLM doesn't know the difference between a destructive command and a safe one until it's already calling the tool. And by then your code is one cursor.execute() away from an incident.

**"AI firewalls" guard the wrong side**

When I went looking for protection, almost everything in the "LLM security" space guards the inbound side — prompt injection, jailbreaks, PII in the input. Useful, but it's the wrong end for an autonomous agent. My problem wasn't a malicious prompt. It was a well-meaning agent emitting a catastrophic action.

What I actually wanted was a firewall on the outbound side; the tool calls themselves:

- destructive SQL (DROP TABLE, unscoped DELETE)
- writes to prod / ALTER ... DROP COLUMN
- SSRF and cloud-metadata fetches (169.254.169.254)
- bulk secret / API-key reads
- runaway retry loops draining your token budget

And critically: I wanted the catch to be deterministic. If your safety layer is itself an LLM call, it's slower, costs money, and can be talked out of it. A DROP TABLE should be blocked by a rule, not a vibe.

**The 2-minute version you can run right now**

I ended up building this and putting the SDK on PyPI. Here's the whole thing; it blocks a live DROP TABLE offline, with no API key, using built-in policy seeds:

pip install agentx-security-sdk

from agentx_sdk import agentx_protect, is_block

@agentx_protect(agent_id="demo")
def run_sql(query: str, db_session=None):
    print("EXECUTED (DANGER):", query)   # never reached
    return {"ok": True}

result = run_sql(query="Please clean up: DROP TABLE users
print("BLOCKED:", is_block(result))       # -&amp;gt; True, offline, no key

One decorator on your tool function. The destructive call gets intercepted before your
function body runs, and you get a block result back insteateway,
no account, no LLM in the hot path as it runs entirely on your machine.

▎ Note: the package is agentx-security-sdk (import path agentx_sdk), version ≥ 0.3.11.

**How the block works**

The decorator wraps your tool call and runs the arguments through a layer of deterministic
checks before execution including pattern + structural rules for s
(destructive SQL, prod writes, SSRF targets, secret-store reads, no-progress loops). If a rule trips, the call returns a block instead of executing. No  the floor, which is why it works with no key and adds negligible latency.

There's more above that floor — it can escalate ambiguous-but-dangerous actions for a human-in-the-loop decision, circuit-break a runaway loop, reframe and retry the run instead of just dying. But the part I want you to be able to verify in 2 minutes without trusting me is the whole point of leading with it.

**Why I'm posting this**
I'm looking for a handful of people running real Python agents; something that touches a
live DB, cloud, files, or money, ideally unattended to stack and
tell me where it's wrong. Not a launch, not a sales pitch. I want to know:

- Does it catch the thing that would've bitten you?
- What dangerous action shape is it missing?

If you've ever thought "what happens when this agent does something irreversible at 2am," I'd genuinely like your take.

- Try it live (keyless quickstart): https://bit.ly/agentfirewall
- Community / tell me what broke: https://discord.gg/PmWR
- Or just reply here. Bonus points for the war story that made you click.

If your agent never touches anything irreversible, ignore me. If it does, the repro's two minutes, and DROP TABLE is a bad way to find out  the hard way.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>security</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
