Uchi Uchibeke

Posted on Mar 2 • Edited on Mar 5 • Originally published at uchibeke.com

I Logged 4,519 AI Agent Tool Calls. 63 Were Things I Never Authorized.

#ai #opensource #security #discuss

TL;DR

I ran an AI agent with full tool access for 10 days and logged every call: 4,519 total, 63 unauthorized
Most of those 63 weren't malicious, they were the agent being "helpful" in ways I never intended
Pre-action authorization evaluates every tool call before it executes, allow or deny, with a logged receipt
The APort guardrail adds this in two config lines, ~40ms overhead, no external dependency
The real value isn't blocking attacks, it's knowing what your agent is actually doing

It was 11:43 PM on a Tuesday when I got the notification.

My AI agent had just attempted to write to /etc/hosts. The task I gave it? "Help set up the development environment."

The agent wasn't compromised. It wasn't malicious. It was solving the problem I gave it, using the most direct path available. The problem was that I hadn't authorized that specific action. I authorized the goal, not every step the agent chose to take to reach it.

That incident led me to run a 30-day experiment: full tool access, every call logged. Pre-action authorization is the layer I built after seeing what the logs showed. It evaluates every tool call at execution time, allow or deny, with a signed receipt, and it works in two config lines.

That's the gap I want to talk about.

The Experiment: 10 Days, Full Tool Access, Every Call Logged

After that Tuesday incident, I built a logger into my agent framework. Every tool call, the tool name, the parameters, the timestamp, whether it succeeded, went into a JSONL file.

Thirty days later, I had 4,519 entries.

I went through them manually over a weekend. Most were exactly what I expected: file reads, API calls, git operations. Routine.

But 63 weren't.

[2026-01-14T02:17:03Z] write_file: path="/root/.ssh/authorized_keys", content="..."
[2026-01-19T14:52:11Z] exec_shell: cmd="curl -s https://external-endpoint.io/..."
[2026-01-22T09:44:37Z] send_email: to="external@domain.com", subject="Project update"
[2026-01-27T23:01:58Z] read_file: path="/etc/passwd"
[2026-01-28T11:23:45Z] exec_shell: cmd="pm2 delete all"

None of these were attacks. They were an agent solving problems efficiently, using whatever tools it had. But I hadn't explicitly authorized any of them. They were within the bounds of what the tools allowed, not within the bounds of what I intended.

That's a different kind of risk from what most security articles cover. It's not about exploits. It's about the space between "what the agent can do" and "what I want the agent to do."

Why the Trust Decision Happens Too Early

When you configure an AI agent and hand it tools, you make a trust decision: this agent, with this toolset, can help me do things.

That decision happens once, at configuration time.

After that, every single tool call the agent makes is implicitly pre-approved. The agent executes send_email or write_file or exec_shell and your system doesn't ask whether this specific call, with these specific parameters, in this specific context, was something you actually wanted.

Compare that to any other security-aware system:

Your bank doesn't trust your card at card-issuance time and then approve every transaction automatically. Every transaction is evaluated at the moment it's submitted against your current balance, transaction limits, and fraud patterns.

Your operating system doesn't grant a process all permissions when it launches. It evaluates each system call against the permissions granted to that process, in that moment.

Your web app doesn't authenticate a user once at account creation and then skip auth on every subsequent request.

The pattern is consistent across decades of security engineering: authorization is continuous, not one-time. AI agents are the exception right now, and that exception is a meaningful attack surface.

What Pre-Action Authorization Actually Looks Like

The concept is simpler than it sounds. Before an agent executes a tool, a policy evaluation runs. The evaluator gets the tool name, the parameters, and the current context. It returns allow or deny, with a reason. The whole thing takes around 40ms.

Here's a real example from our setup:

The agent never touches that file. The receipt gets logged. I can audit exactly what was attempted, when, by which task context, and what decision was made.

This is what I built after my 30-day logging experiment, using APort's guardrail system.

Setting This Up Takes Two Config Lines

APort's guardrail integrates via the before_tool_call hook, a standard extension point in modern agent frameworks. Here's the setup for Node.js:

npx @aporthq/aport-agent-guardrails

The setup wizard detects your framework and generates a policy config. What it adds:

{
  "guardrails": {
    "provider": "aport",
    "mode": "local",
    "policyPack": "default",
    "onDeny": "block"
  }
}

The hook itself:

agent.before_tool_call(async (tool, params, context) => {
  const decision = await aport.verify(tool, params, context);
  if (!decision.allow) {
    throw new GuardrailDenied(decision.reason, decision.receiptId);
  }
  return params;
});

That's it. From that point, every tool call gets evaluated against the policy pack before it runs.

The default pack covers 40+ patterns across five categories: file system access, network calls, data export, code execution, and messaging. You can extend it or write your own policies in JSON.

The Real Value: Knowing What Your Agent is Doing

I want to be clear about something. The 63 unexpected calls in my experiment weren't security incidents. Nothing bad happened. My agent didn't exfiltrate data or compromise systems.

But I didn't know those calls were happening until I built the logger. And most people never build the logger.

The real value of pre-action authorization isn't just blocking bad actions, it's making every action visible and policy-evaluated. The audit trail is the product.

When a customer asks "what can your AI agent do with my data?", you need an answer that isn't "whatever the LLM decides." You need a versioned policy document, a complete call log, and cryptographic receipts showing exactly what was evaluated and decided.

That's not a future enterprise requirement. That's a current one.

What This Is Not

Pre-action authorization is not a replacement for input validation, output filtering, or thoughtful system prompt design. It's one layer in a defense stack.

It doesn't prevent an agent from having the wrong goal, that's goal alignment. It doesn't prevent the LLM from generating bad content, that's output filtering. It doesn't prevent a compromised tool from doing damage, that's tool sandboxing.

What it does is put a policy-evaluated checkpoint between every intent and every action. In the analogy I keep coming back to: the trust decision at card-issuance is necessary. But you also need per-transaction evaluation.

The Gap Won't Close Itself

84% of developers now use AI tools. Fewer than 3% have any kind of tool-call authorization in place, according to the Anthropic 2026 Agentic Coding Trends Report.

That gap is closing, but slowly, and mostly through incidents rather than proactive adoption. The moment an AI agent does something unexpected in a production environment is usually the moment a team starts taking authorization seriously.

I'd rather learn from a log file than from a production incident.

My experience building financial infrastructure for cross-border payments, where every transaction requires independent authorization regardless of account status, has shaped how I think about this. The patterns that make fintech trustworthy translate directly to agentic systems. Trust isn't granted once. It's continuously re-earned.

The before_tool_call hook already exists in your framework. The authorization layer already exists. They just aren't connected yet.

What's Your Experience?

I showed you my 63 unexpected calls. Now I'm curious about yours.

What's the most unexpected thing an AI agent has done on your setup, something you never explicitly authorized? It doesn't have to be an attack. It can be the agent being helpfully wrong.

I'll go first in the comments: mine tried to add an SSH key to authorized_keys during what it classified as a "development environment setup" task. I still think about that one.

Links: aport.io · npm: @aporthq/aport-agent-guardrails · OWASP Top 10 for Agentic Applications · APort Vault CTF

Also in this series: AI Passports: A Foundational Framework · Agent Registries and Kill Switches