DEV Community

Cover image for What I learned securing AI agents with tool access
Fran
Fran

Posted on

What I learned securing AI agents with tool access

I’ve been experimenting with AI agents that can call tools: shell commands, APIs, databases and file systems.

Recently I did a small integration with OpenClaw. It’s still very early, but I’d really value feedback from people running agents in real environments.

History

At first everything looked great.

The agent could reason, choose tools, and automate tasks.

Then I realized something uncomfortable:

If the model decides to run something dangerous, nothing really stops it.

One test made it obvious.

The agent attempted:

exec("cat /etc/passwd")

Not because it was malicious, but because the prompt context allowed it.

That’s when it clicked.

Most agent setups today trust the model too much.

So I started applying very boring security ideas from classic web development.

  1. Treat tool inputs like user inputs

Just because an LLM produced an argument doesn’t mean it’s safe.

Tool arguments need validation and sanitization.

Examples:

  • file paths
  • SQL queries
  • shell commands

If something looks suspicious, reject it.

  1. Least privilege for tools

Originally the agent had access to everything.

Bad idea.

Now every tool has minimal permissions.

Examples:

Database tool

→ read-only tables

Filesystem tool

→ restricted directories

API tool

→ scoped endpoints

  1. Log the full chain of actions

Initially I only logged prompts and responses.

But when something went wrong I had no idea what the agent actually did.

Recording the full chain made debugging much easier:

agent reasoning

→ tool selection

→ parameters

→ execution result

  1. Validate tool calls before execution

Instead of letting the agent execute tools directly, I started intercepting tool calls.

Conceptually:

agent → tool request

policy check → allow / block

tool execution

If a call violates policy, it never runs.

  1. Always have a kill switch

At one point an agent got stuck in a loop repeatedly calling an API.

A simple kill switch that stops tool execution saved the system.

None of these ideas are new.

They’re basically classic security principles applied to a new context.

But as agents get more powerful, these guardrails feel increasingly necessary.

I'm still experimenting with runtime guardrails for tool calls.

If anyone here is running agents in production, I'm curious:

• Are you validating tool inputs?

• Do you intercept tool calls before execution?

• Or do you rely mostly on prompt guardrails?

Experiment

If anyone wants to take a look or give feedback, the repo is here:

https://github.com/wraithvector0/wraithvector-openclaw

Top comments (0)