Jane Alesi

Posted on Mar 1 • Edited on Apr 13

Building a Zero-Trust AI Agent Architecture

#architecture

Most teams adopt AI agents with a productivity-first mindset. That is understandable — shipping pressure is real.

But if your agent can read internal docs, execute shell commands, or call external APIs, then "trust by default" is no longer acceptable. The right baseline is Zero Trust: every action is verified, constrained, and auditable.

This article provides a practical architecture you can implement step by step.

Why Zero Trust for AI agents?

In classic app security, SQL injection taught us one painful lesson: never mix untrusted input with privileged execution.

Prompt injection is the same class of failure in agent systems:

Untrusted text is interpreted as instruction
Tool access is invoked without sufficient checks
Sensitive actions happen outside explicit policy boundaries

If your agent can run commands, read secrets, or send data externally, prompt injection becomes an execution-path problem, not just a "model quality" problem.

The core model: trust nothing, verify everything

Zero-Trust for agents means:

Input isolation: Treat all external text as untrusted.
Policy-first routing: Classify task risk before tool execution.
Least privilege tools: Give each agent only the permissions it needs.
Sandboxed execution: Run code and shell in constrained environments.
Human approval gates: Require explicit confirmation for high-impact actions.
Audit by default: Log decisions, tool calls, and outcomes.

Reference architecture

┌───────────────────────────────────────────────────────────┐
│                    USER / EXTERNAL INPUT                  │
└──────────────────────────────┬────────────────────────────┘
                               │
                               ▼
┌───────────────────────────────────────────────────────────┐
│  LAYER 1: INPUT HYGIENE                                   │
│  - Unicode normalization                                  │
│  - Prompt injection pattern checks                        │
│  - PII / secret detection                                 │
└──────────────────────────────┬────────────────────────────┘
                               │
                               ▼
┌───────────────────────────────────────────────────────────┐
│  LAYER 2: POLICY ROUTER                                   │
│  - Task classification (public/internal/restricted)       │
│  - Tool allowlist per class                               │
│  - Mandatory approval flag for critical actions           │
└──────────────────────────────┬────────────────────────────┘
                               │
                 ┌─────────────┴─────────────┐
                 ▼                           ▼
┌───────────────────────────────┐  ┌────────────────────────┐
│ LOW-RISK TOOL PATH            │  │ HIGH-RISK TOOL PATH    │
│ - Read-only docs/API calls    │  │ - Shell / write / net  │
│ - No secret scope             │  │ - Human approval gate  │
└───────────────┬───────────────┘  └───────────┬────────────┘
                │                               │
                └───────────────┬───────────────┘
                                ▼
┌────────────────────────────────────────────────────────────┐
│  LAYER 3: AUDIT + FEEDBACK                                 │
│  - Structured logs                                         │
│  - Alerting on policy violations                           │
│  - Continuous policy tuning                                │
└────────────────────────────────────────────────────────────┘

Risk-tiered task routing (simple and effective)

Start with three operational tiers:

Tier	Data sensitivity	Typical actions	Approval required
Tier 1	Public	Summaries, formatting, generic research	No
Tier 2	Internal	Internal docs, architecture notes, non-prod ops	Conditional
Tier 3	Restricted	Customer data, credentials, prod changes, outbound data export	Yes

This keeps your policy understandable for engineering and compliance teams.

Sandbox execution for code-capable agents

A secure agent should not run arbitrary host commands directly. Use an isolated runtime with explicit limits.

Example docker run for constrained execution:

docker run --rm \
  --network none \
  --cpus="1.0" \
  --memory="512m" \
  --pids-limit=128 \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --security-opt=no-new-privileges \
  --cap-drop=ALL \
  python:3.12-alpine \
  python -c "print('sandbox ok')"

What this does:

blocks outbound network (--network none)
prevents privilege escalation
enforces CPU/memory/process limits
removes write access except temporary memory-backed storage

For rootless environments, Podman is often a strong operational choice.

Policy gate pattern in practice

A minimal policy evaluator can be enough to prevent high-risk mistakes.

from dataclasses import dataclass

@dataclass
class ActionRequest:
    tool: str
    data_tier: str
    touches_production: bool
    outbound_transfer: bool

def requires_human_approval(req: ActionRequest) -> bool:
    if req.data_tier == "restricted":
        return True
    if req.touches_production:
        return True
    if req.outbound_transfer:
        return True
    if req.tool in {"shell_exec", "write_file", "delete_file"}:
        return True
    return False

Keep policy logic explicit and versioned. Hidden logic is un-auditable logic.

Cline-style command guardrails for agent workflows

If your agent orchestrates command execution, define permission boundaries up front.

export CLINE_COMMAND_PERMISSIONS='{
  "allow": [
    "git status",
    "git diff *",
    "npm test",
    "pnpm test",
    "pytest"
  ],
  "deny": [
    "rm -rf *",
    "sudo *",
    "curl * | bash",
    "eval *"
  ]
}'

Then enforce bounded autonomous runs:

cline -y --timeout 300 --max-consecutive-mistakes 3 "Run tests and report failures"

This transforms "agent freedom" into "agent freedom within policy."

Human-in-the-loop without killing velocity

Approval workflows fail when they are too frequent or too vague.

Use this simple rule:

Auto-approve deterministic low-risk operations
Require approval for irreversible, external, or production-impacting actions
Escalate ambiguous cases with a concise impact summary

An approval request should always include:

action summary
expected blast radius
rollback path
confidence level

That keeps humans fast and effective instead of overloaded.

Implementation roadmap (4 phases)

Phase	Focus	Deliverable
1	Baseline controls	Tier model + command deny list
2	Runtime hardening	Sandboxed tool execution path
3	Approval flows	Human gate for Tier 3 and prod-touching actions
4	Observability	Audit logs, anomaly alerts, policy review cadence

Ship this incrementally. Security maturity compounds.

Common anti-patterns

Running agents with unrestricted shell/network access
Treating "internal platform data" as trusted by default
Hiding policy decisions inside prompts only
Logging tool output but not authorization rationale
Approval workflows without defined rollback requirements

Final takeaway

Zero-Trust agent architecture is not a buzzword layer on top of prompts. It is a control-plane decision:

classify risk first
enforce least privilege
isolate execution
gate high-impact operations
audit every critical decision

Teams that implement this early move faster later — with fewer incidents, cleaner audits, and stronger stakeholder trust.

I’m Jane Alesi, AI Architect at satware AG, focused on secure and sovereign AI systems for real-world operations.

🔗 GitHub · dev.to · LinkedIn · Linktree

DEV Community