<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fran</title>
    <description>The latest articles on DEV Community by Fran (@wraithvector0).</description>
    <link>https://dev.to/wraithvector0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3874511%2F473a86d4-0938-41ef-a8cd-6ff5c5834763.jpeg</url>
      <title>DEV Community: Fran</title>
      <link>https://dev.to/wraithvector0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wraithvector0"/>
    <language>en</language>
    <item>
      <title>What I learned securing AI agents with tool access</title>
      <dc:creator>Fran</dc:creator>
      <pubDate>Sun, 12 Apr 2026 07:26:21 +0000</pubDate>
      <link>https://dev.to/wraithvector0/what-i-learned-securing-ai-agents-with-tool-access-274j</link>
      <guid>https://dev.to/wraithvector0/what-i-learned-securing-ai-agents-with-tool-access-274j</guid>
      <description>&lt;p&gt;I’ve been experimenting with AI agents that can call tools: shell commands, APIs, databases and file systems.&lt;/p&gt;

&lt;p&gt;Recently I did a small integration with OpenClaw. It’s still very early, but I’d really value feedback from people running agents in real environments.&lt;/p&gt;

&lt;p&gt;History&lt;/p&gt;

&lt;p&gt;At first everything looked great.&lt;/p&gt;

&lt;p&gt;The agent could reason, choose tools, and automate tasks.&lt;/p&gt;

&lt;p&gt;Then I realized something uncomfortable:&lt;/p&gt;

&lt;p&gt;If the model decides to run something dangerous, nothing really stops it.&lt;/p&gt;

&lt;p&gt;One test made it obvious.&lt;/p&gt;

&lt;p&gt;The agent attempted:&lt;/p&gt;

&lt;p&gt;exec("cat /etc/passwd")&lt;/p&gt;

&lt;p&gt;Not because it was malicious, but because the prompt context allowed it.&lt;/p&gt;

&lt;p&gt;That’s when it clicked.&lt;/p&gt;

&lt;p&gt;Most agent setups today trust the model too much.&lt;/p&gt;

&lt;p&gt;So I started applying very boring security ideas from classic web development.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Treat tool inputs like user inputs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Just because an LLM produced an argument doesn’t mean it’s safe.&lt;/p&gt;

&lt;p&gt;Tool arguments need validation and sanitization.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file paths&lt;/li&gt;
&lt;li&gt;SQL queries&lt;/li&gt;
&lt;li&gt;shell commands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If something looks suspicious, reject it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Least privilege for tools&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Originally the agent had access to everything.&lt;/p&gt;

&lt;p&gt;Bad idea.&lt;/p&gt;

&lt;p&gt;Now every tool has minimal permissions.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;Database tool&lt;br&gt;&lt;br&gt;
→ read-only tables&lt;/p&gt;

&lt;p&gt;Filesystem tool&lt;br&gt;&lt;br&gt;
→ restricted directories&lt;/p&gt;

&lt;p&gt;API tool&lt;br&gt;&lt;br&gt;
→ scoped endpoints&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Log the full chain of actions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Initially I only logged prompts and responses.&lt;/p&gt;

&lt;p&gt;But when something went wrong I had no idea what the agent actually did.&lt;/p&gt;

&lt;p&gt;Recording the full chain made debugging much easier:&lt;/p&gt;

&lt;p&gt;agent reasoning&lt;br&gt;&lt;br&gt;
→ tool selection&lt;br&gt;&lt;br&gt;
→ parameters&lt;br&gt;&lt;br&gt;
→ execution result&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Validate tool calls before execution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of letting the agent execute tools directly, I started intercepting tool calls.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;/p&gt;

&lt;p&gt;agent → tool request&lt;br&gt;&lt;br&gt;
policy check → allow / block&lt;br&gt;&lt;br&gt;
tool execution&lt;/p&gt;

&lt;p&gt;If a call violates policy, it never runs.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Always have a kill switch&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At one point an agent got stuck in a loop repeatedly calling an API.&lt;/p&gt;

&lt;p&gt;A simple kill switch that stops tool execution saved the system.&lt;/p&gt;

&lt;p&gt;None of these ideas are new.&lt;/p&gt;

&lt;p&gt;They’re basically classic security principles applied to a new context.&lt;/p&gt;

&lt;p&gt;But as agents get more powerful, these guardrails feel increasingly necessary.&lt;/p&gt;

&lt;p&gt;I'm still experimenting with runtime guardrails for tool calls.&lt;/p&gt;

&lt;p&gt;If anyone here is running agents in production, I'm curious:&lt;/p&gt;

&lt;p&gt;• Are you validating tool inputs?&lt;br&gt;&lt;br&gt;
• Do you intercept tool calls before execution?&lt;br&gt;&lt;br&gt;
• Or do you rely mostly on prompt guardrails?&lt;/p&gt;

&lt;p&gt;Experiment&lt;/p&gt;

&lt;p&gt;If anyone wants to take a look or give feedback, the repo is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/wraithvector0/wraithvector-openclaw" rel="noopener noreferrer"&gt;https://github.com/wraithvector0/wraithvector-openclaw&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>security</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
