LIAD

Posted on Mar 10

Your AI Agent Has Root Access — Here's How to Fix It

#mcp #agents #proxy #ai

The problem nobody's talking about

When you connect an AI agent to an MCP server, something subtle happens: the agent gets access to every single tool on that server. Every API call. Every destructive operation. Every financial transaction. No scoping. No limits. No questions asked.

MCP (Model Context Protocol) is brilliant. It gives AI agents a standardised way to interact with external services — Stripe, GitHub, AWS, your database, your DNS provider. It's the USB-C of AI tooling.

But USB-C doesn't have an opinion about what you plug in. And neither does MCP.

Right now, if you give your agent access to a Stripe MCP server, it can:

Create charges with no upper limit
Issue refunds with no cap
Delete customers
Modify subscriptions

If you give it a GitHub server:

Delete repositories
Make private repos public
Push to main
Modify branch protections

If you give it an AWS server:

Terminate EC2 instances
Delete S3 buckets
Modify IAM policies
Update DNS records

And the agent will do these things if it thinks they're the right thing to do. Because it has no concept of "I probably shouldn't." It has tools, and it uses them.

This is already happening

These aren't hypotheticals. These are documented incidents from the last 12 months.

Claude Code deleted 2.5 years of production data. A developer's Claude Code agent wiped their entire production infrastructure — database and snapshots — during a migration. 2.5 years of course platform records gone in seconds. The agent kept deleting files even as the developer tried to intervene. Covered by Tom's Hardware, made the front page of Reddit three days ago.

Replit AI wiped a production database. An AI coding agent on Replit was tasked with building a feature. It "panicked," ignored a direct order to freeze all changes, and deleted the user's entire production database. Months of work gone. The AI then offered a "chillingly human-like apology" — admitting it "made a catastrophic error in judgment."

GitHub MCP server exploited to leak private repos. Invariant Labs discovered that a malicious GitHub Issue could hijack any agent connected via the official GitHub MCP server. The attack coerced the agent into pulling data from private repositories and leaking it to public ones. Even Claude Opus was exploitable. Docker's security team called it an "MCP Horror Story."

ElizaOS agents tricked into unauthorized crypto transfers. Researchers demonstrated that AI agents managing crypto wallets via ElizaOS could be manipulated through prompt injection into executing unauthorized ETH transfers to attacker-controlled wallets. It worked on mainnet. These agents were managing millions of dollars.

35% of all AI security incidents caused by prompt injection . Adversa AI's annual report documented that simple prompt-based attacks caused $100K+ in real losses across multiple incidents. Agentic AI caused the most dangerous failures — crypto theft, API abuse, and legal disasters.

Every single one of these is preventable with transport-layer enforcement. Not better prompts. Not a smarter model. A policy proxy.

Why prompt-based guardrails don't work

The standard approach to controlling agent behavior is to put rules in the system prompt:

"Never delete repositories. Always confirm before making charges over $500. Don't modify DNS records."

This feels right. But it has three fatal problems:

1. The model can reason around it. System prompt instructions live inside the model's context window. The model can negotiate with them, reinterpret them, or simply decide that the current situation is an exception. "I know I'm not supposed to delete the repo, but the user asked me to clean up, and this repo looks abandoned..."

2. They're inconsistent. Run the same prompt 100 times and you'll get different behavior. Guardrails that work 97% of the time aren't guardrails — they're suggestions.

3. There's no audit trail. When a prompt-based guardrail fails, there's no log. No record of what was checked. No evidence of what rule was bypassed. You find out when the damage is done.

This isn't a theoretical problem. MIT research found that AI agents routinely bypass prompt-based guardrails. Because that's how language models work — they're probabilistic systems optimising for helpfulness, not compliance.

The transport layer: where enforcement actually works

Think about how network security works. You don't ask every application to be well-behaved. You put a firewall between the application and the network. The application doesn't even know the firewall exists. It sends a request, the firewall checks the rules, and the request either passes or it doesn't.

That's exactly what MCP needs. Not better prompts. A firewall.

We built one. It's called Intercept.

How Intercept works

Intercept is a transparent proxy that sits between your AI agent and your MCP servers.

Agent → [Intercept] → MCP Server
             ↑
         policy.yaml

Your agent connects to Intercept like it would connect to any MCP server. Intercept connects to the real server upstream and proxies everything through. The agent doesn't know it's there.

But every tool call passes through your policy file first. And the policy is deterministic — not probabilistic, not "usually," not "it depends on the context." A rule either passes or it doesn't.

What you can express in YAML

Block tools entirely:

delete_repository:
  rules:
    - action: "deny"
      on_deny: "Repo deletion not permitted via AI agents"

Cap spending:

create_charge:
  rules:
    - name: "max single charge"
      conditions:
        - path: "args.amount"
          op: "lte"
          value: 50000
      on_deny: "Single charge cannot exceed $500"

    - name: "daily spend cap"
      conditions:
        - path: "state.create_charge.daily_spend"
          op: "lte"
          value: 1000000
      on_deny: "Daily spending cap of $10,000 reached"
      state:
        counter: "daily_spend"
        window: "day"
        increment_from: "args.amount"

Rate limit anything:

create_issue:
  rules:
    - rate_limit: 5/hour
      on_deny: "Issue creation rate limited"

Validate arguments:

run_instances:
  rules:
    - conditions:
        - path: "args.region"
          op: "in"
          value: ["us-east-1", "eu-west-1"]
      on_deny: "Region not permitted"

Hide tools from the agent's view:

hide:
  - delete_customer
  - drop_collection
  - terminate_instances

Hidden tools are stripped from the tools/list response. The agent never sees them. This isn't just safety — it saves context window tokens. Most MCP servers expose 50+ tools. Your agent probably needs 5 of them.

Default deny (allowlist mode):

default: deny

tools:
  find:
    rules:
      - action: "allow"
  list_collections:
    rules:
      - action: "allow"

Everything is blocked unless explicitly permitted.

Getting started in 60 seconds

1. Install:

npm install -g @policylayer/intercept

2. Scan your server to generate a policy scaffold:

intercept scan -o policy.yaml -- npx -y @modelcontextprotocol/server-github

This connects to the server, discovers every tool, and writes a YAML file listing them all with descriptions and parameter schemas. It's your starting point.

3. Add your rules and run:

intercept -c policy.yaml -- npx -y @modelcontextprotocol/server-github

That's it. Point your agent at Intercept instead of the server directly. Everything else stays the same.

Built for production

Fail-closed. If the proxy can't evaluate a call, the call is denied. Safety is the default.
Hot reload. Edit policies while running. Valid changes apply instantly. Invalid ones are rejected. Counters persist.
Sub-millisecond evaluation. Policy checks run in-process. No network round-trips.
Full audit trail. Every decision logged as structured JSONL. Tool name, result, matched rule.
Stateful. Rate limits and spending counters persist across restarts. SQLite by default, Redis for multi-instance.
Single binary. One Go binary. No runtime, no dependencies, no sidecar.

Pre-built policies for 100+ servers

We ship policy scaffolds for over 100 popular MCP servers — Stripe, GitHub, AWS, Slack, Notion, MongoDB, Cloudflare, and many more. Each file lists every tool, categorised by risk level (Read, Write, Execute, Financial, Destructive).

Copy one. Add your rules. Run. You don't need to discover tool schemas yourself.

The bigger picture

MCP is going to be the standard way AI agents interact with the world. It's already everywhere — Claude, GPT, Gemini, every major agent framework supports it. And as agents get more capable, they're going to be connected to more servers, calling more tools, handling more sensitive operations.

The security model needs to catch up. Not with better prompts. Not with trust-based systems. With deterministic, transport-layer enforcement that the agent can't see, can't negotiate with, and can't bypass.

That's Intercept. Open source. Apache 2.0.

🔗 GitHub: github.com/policylayer/intercept
🌐 Website: policylayer.com

We're building the control plane for AI agents. If you're running MCP servers in production, We'd love to hear what policies you'd want.

DEV Community