Autonomous AI agents are moving from research labs into production environments at speed. Unlike chatbots that respond to single prompts, agents plan, reason, execute multi-step tasks, call external tools, and delegate sub-tasks to child agents. With each of these capabilities comes a new attack surface — and the stakes are higher because agents act, not just talk.
The Three-Tier Agent Threat Model
Every production agent system shares a common architecture with three security tiers. Understanding this model is the first step to securing your deployment.
Tier 1 — The Agent Brain. The LLM that plans and reasons. Vulnerable to prompt injection, goal misgeneralisation, and system prompt leakage. An attacker who injects a malicious instruction can redirect the agent's entire execution chain.
Tier 2 — Tool, Delegation, and Data Access. The agent's connection to the outside world. Tool execution (code, file I/O, API calls), sub-agent spawning, and access to internal data stores each introduce their own risks.
Tier 3 — Defense Boundaries. Permission controls, guardrails, audit logging, and human-in-the-loop checks that contain the blast radius when things go wrong.
The Prompt Injection Amplifier
In a chatbot, prompt injection is dangerous — the model might leak a system prompt or generate harmful content. In an agent, prompt injection is catastrophic. A single injected instruction can cause the agent to read internal databases, execute system commands, exfiltrate data via API calls, and spawn sub-agents that repeat the attack at greater scale.
Tool Permission Boundaries
The most critical security control for agent systems is strict tool permission boundaries. Apply the principle of least privilege to every tool the agent can call:
- Code execution tools should run in sandboxed environments with no network access unless explicitly required
- API tools should have scoped tokens with minimal permissions
- Database tools should use read-only connections by default, with write access requiring explicit human approval
Sub-Agent Delegation Risks
When an agent can spawn child agents, the security problem compounds. Each sub-agent inherits — or must be explicitly granted — the tools and permissions of its parent. Without careful design, a single compromised parent agent can produce a cascade of malicious children.
Human-in-the-Loop for High-Risk Actions
Classify actions into three categories: Automatic (read-only queries, no approval needed), Confirm (write operations, transactions — require explicit human confirmation), and Blocked (actions outside the authorised scope).
Originally published at aisecurities.uk
Top comments (0)