DEV Community

Cover image for Building a Zero-Trust AI Agent Architecture
Jane Alesi
Jane Alesi Subscriber

Posted on

Building a Zero-Trust AI Agent Architecture

Most teams adopt AI agents with a productivity-first mindset. That is understandable — shipping pressure is real.

But if your agent can read internal docs, execute shell commands, or call external APIs, then "trust by default" is no longer acceptable. The right baseline is Zero Trust: every action is verified, constrained, and auditable.

This article provides a practical architecture you can implement step by step.

Why Zero Trust for AI agents?

In classic app security, SQL injection taught us one painful lesson: never mix untrusted input with privileged execution.

Prompt injection is the same class of failure in agent systems:

  • Untrusted text is interpreted as instruction
  • Tool access is invoked without sufficient checks
  • Sensitive actions happen outside explicit policy boundaries

If your agent can run commands, read secrets, or send data externally, prompt injection becomes an execution-path problem, not just a "model quality" problem.

The core model: trust nothing, verify everything

Zero-Trust for agents means:

  1. Input isolation: Treat all external text as untrusted.
  2. Policy-first routing: Classify task risk before tool execution.
  3. Least privilege tools: Give each agent only the permissions it needs.
  4. Sandboxed execution: Run code and shell in constrained environments.
  5. Human approval gates: Require explicit confirmation for high-impact actions.
  6. Audit by default: Log decisions, tool calls, and outcomes.

Reference architecture

┌───────────────────────────────────────────────────────────┐
│                    USER / EXTERNAL INPUT                  │
└──────────────────────────────┬────────────────────────────┘
                               │
                               ▼
┌───────────────────────────────────────────────────────────┐
│  LAYER 1: INPUT HYGIENE                                   │
│  - Unicode normalization                                  │
│  - Prompt injection pattern checks                        │
│  - PII / secret detection                                 │
└──────────────────────────────┬────────────────────────────┘
                               │
                               ▼
┌───────────────────────────────────────────────────────────┐
│  LAYER 2: POLICY ROUTER                                   │
│  - Task classification (public/internal/restricted)       │
│  - Tool allowlist per class                               │
│  - Mandatory approval flag for critical actions           │
└──────────────────────────────┬────────────────────────────┘
                               │
                 ┌─────────────┴─────────────┐
                 ▼                           ▼
┌───────────────────────────────┐  ┌────────────────────────┐
│ LOW-RISK TOOL PATH            │  │ HIGH-RISK TOOL PATH    │
│ - Read-only docs/API calls    │  │ - Shell / write / net  │
│ - No secret scope             │  │ - Human approval gate  │
└───────────────┬───────────────┘  └───────────┬────────────┘
                │                               │
                └───────────────┬───────────────┘
                                ▼
┌────────────────────────────────────────────────────────────┐
│  LAYER 3: AUDIT + FEEDBACK                                 │
│  - Structured logs                                         │
│  - Alerting on policy violations                           │
│  - Continuous policy tuning                                │
└────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Risk-tiered task routing (simple and effective)

Start with three operational tiers:

Tier Data sensitivity Typical actions Approval required
Tier 1 Public Summaries, formatting, generic research No
Tier 2 Internal Internal docs, architecture notes, non-prod ops Conditional
Tier 3 Restricted Customer data, credentials, prod changes, outbound data export Yes

This keeps your policy understandable for engineering and compliance teams.

Sandbox execution for code-capable agents

A secure agent should not run arbitrary host commands directly. Use an isolated runtime with explicit limits.

Example docker run for constrained execution:

docker run --rm \
  --network none \
  --cpus="1.0" \
  --memory="512m" \
  --pids-limit=128 \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --security-opt=no-new-privileges \
  --cap-drop=ALL \
  python:3.12-alpine \
  python -c "print('sandbox ok')"
Enter fullscreen mode Exit fullscreen mode

What this does:

  • blocks outbound network (--network none)
  • prevents privilege escalation
  • enforces CPU/memory/process limits
  • removes write access except temporary memory-backed storage

For rootless environments, Podman is often a strong operational choice.

Policy gate pattern in practice

A minimal policy evaluator can be enough to prevent high-risk mistakes.

from dataclasses import dataclass

@dataclass
class ActionRequest:
    tool: str
    data_tier: str
    touches_production: bool
    outbound_transfer: bool

def requires_human_approval(req: ActionRequest) -> bool:
    if req.data_tier == "restricted":
        return True
    if req.touches_production:
        return True
    if req.outbound_transfer:
        return True
    if req.tool in {"shell_exec", "write_file", "delete_file"}:
        return True
    return False
Enter fullscreen mode Exit fullscreen mode

Keep policy logic explicit and versioned. Hidden logic is un-auditable logic.

Cline-style command guardrails for agent workflows

If your agent orchestrates command execution, define permission boundaries up front.

export CLINE_COMMAND_PERMISSIONS='{
  "allow": [
    "git status",
    "git diff *",
    "npm test",
    "pnpm test",
    "pytest"
  ],
  "deny": [
    "rm -rf *",
    "sudo *",
    "curl * | bash",
    "eval *"
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Then enforce bounded autonomous runs:

cline -y --timeout 300 --max-consecutive-mistakes 3 "Run tests and report failures"
Enter fullscreen mode Exit fullscreen mode

This transforms "agent freedom" into "agent freedom within policy."

Human-in-the-loop without killing velocity

Approval workflows fail when they are too frequent or too vague.

Use this simple rule:

  • Auto-approve deterministic low-risk operations
  • Require approval for irreversible, external, or production-impacting actions
  • Escalate ambiguous cases with a concise impact summary

An approval request should always include:

  1. action summary
  2. expected blast radius
  3. rollback path
  4. confidence level

That keeps humans fast and effective instead of overloaded.

Implementation roadmap (4 phases)

Phase Focus Deliverable
1 Baseline controls Tier model + command deny list
2 Runtime hardening Sandboxed tool execution path
3 Approval flows Human gate for Tier 3 and prod-touching actions
4 Observability Audit logs, anomaly alerts, policy review cadence

Ship this incrementally. Security maturity compounds.

Common anti-patterns

  • Running agents with unrestricted shell/network access
  • Treating "internal platform data" as trusted by default
  • Hiding policy decisions inside prompts only
  • Logging tool output but not authorization rationale
  • Approval workflows without defined rollback requirements

Final takeaway

Zero-Trust agent architecture is not a buzzword layer on top of prompts. It is a control-plane decision:

  • classify risk first
  • enforce least privilege
  • isolate execution
  • gate high-impact operations
  • audit every critical decision

Teams that implement this early move faster later — with fewer incidents, cleaner audits, and stronger stakeholder trust.


I’m Jane Alesi, AI Architect at satware AG, focused on secure and sovereign AI systems for real-world operations.

🔗 GitHub · dev.to · LinkedIn · Linktree

Top comments (0)