DEV Community

Valerii Vainkop
Valerii Vainkop

Posted on

The AI Agent Gateway Pattern: How to Give Agents Infrastructure Access Without Losing Control

The AI Agent Gateway Pattern: How to Give Agents Infrastructure Access Without Losing Control

There's a pattern I've seen in almost every team that starts running AI agents against real infrastructure.

The agent works well in the demo. It calls the right APIs, does the right thing, and everyone is impressed. So the team gives it more access — a Kubernetes API here, a cloud provider credential there. It's fast to set up. It works.

And then, somewhere between month one and month three, something goes wrong. An agent loops. A tool call hits the wrong environment. A permission that was supposed to be narrow turns out to be wide. Nobody can tell exactly what the agent did because there's no trace of it.

This is not a problem with AI agents specifically. It's the same problem we solved with service meshes — and then forgot we'd solved it.

The Parallel That Should Make You Nervous

Think back to how microservice architectures evolved before service meshes existed.

Services called each other directly. No policy enforcement at the network layer. No distributed tracing. No mutual TLS between services. Each service team was responsible for their own security and observability, which in practice meant it was inconsistent, incomplete, or absent.

The failures were predictable: cascading retries, credential exposure, services with much wider blast radii than intended, debugging sessions that took hours because nobody had a complete picture of what called what.

Service meshes — Istio, Linkerd, Cilium — addressed this by treating inter-service communication as an infrastructure concern, not an application one. Policy enforcement, traffic observability, and mTLS moved into the data plane. Application developers stopped worrying about it. Operations teams got a consistent control surface.

AI agents are currently at the "services calling each other directly" stage.

Most agent-to-infrastructure connections I've seen have no policy layer, minimal observability, and no isolation model. The agent has credentials. It uses them. That's the entire security model.

What the Gateway Pattern Actually Is

InfoQ published a detailed architecture piece this week covering an emerging pattern: the AI Agent Gateway. The core idea is straightforward — treat every AI agent tool call as an API call that must pass through a control plane before it reaches the target infrastructure.

The control plane does three things:

1. Policy authorization via Open Policy Agent (OPA)

Before any tool call executes, OPA evaluates it against a policy set. The agent declares its intent — what resource, what action, what context — and the policy either permits or denies it.

OPA is the right choice here because its policy language (Rego) can express nuanced conditions: "this agent can read pod logs in namespace staging but not production", "this agent can scale a deployment but only within this replica range", "this agent can call this API only during business hours."

The key property is that policy lives outside the agent code. You can tighten or loosen it without touching the agent, test it independently, and audit it separately from the rest of your infrastructure.

Here's a minimal OPA policy for an infrastructure agent:

package agent.authz

import future.keywords.if
import future.keywords.in

default allow := false

# Allow read-only operations in staging namespace
allow if {
    input.action in {"get", "list", "watch"}
    input.namespace == "staging"
    input.agent_id in data.authorized_agents
}

# Allow scale operations, but cap max replicas
allow if {
    input.action == "scale"
    input.resource == "deployment"
    input.desired_replicas <= 10
    input.namespace != "production"
    input.agent_id in data.authorized_agents
}

# Deny anything touching secrets
deny if {
    input.resource == "secret"
}
Enter fullscreen mode Exit fullscreen mode

The input object is constructed by the gateway from the agent's tool call. The agent never touches OPA directly — it just makes a request to the gateway and gets back a decision.

2. Full observability via OpenTelemetry

Every tool call that passes through the gateway gets a trace span. Not just "did it succeed" — full structured data:

  • What the agent requested
  • What OPA decided
  • What the target infrastructure returned
  • How long each step took
  • Which parent span (agent session, task ID) it belongs to

This matters more than people expect until they need it. When something goes wrong — and it will — "the agent did something" is not enough information. You need to know exactly what it did, when, with what parameters, and what came back.

OTel collector configuration for the gateway:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
  resource:
    attributes:
      - key: service.name
        value: agent-gateway
        action: upsert
      - key: deployment.environment
        value: production
        action: upsert

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: false
  prometheus:
    endpoint: 0.0.0.0:8889

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
Enter fullscreen mode Exit fullscreen mode

Add to this a Prometheus counter for agent_tool_calls_total labeled by agent_id, action, resource, result (allowed/denied/error) and you have the basis for both alerting and audit.

3. Ephemeral execution environments

The tool call doesn't execute in the gateway process. It executes in a short-lived isolated container that spins up, runs the specific operation, and terminates. The container gets only the credentials it needs for that specific call — nothing broader, nothing persistent.

This is the blast radius control. If the tool call goes wrong — infinite loop, unexpected API behavior, compromised logic — the damage is bounded to what that ephemeral container can reach during that single execution window.

In Kubernetes terms:

apiVersion: batch/v1
kind: Job
metadata:
  name: agent-tool-call-{{ .CallID }}
  namespace: agent-execution
spec:
  ttlSecondsAfterFinished: 30
  template:
    spec:
      serviceAccountName: agent-tool-executor
      automountServiceAccountToken: true
      restartPolicy: Never
      containers:
      - name: executor
        image: your-registry/agent-tool-executor:v1.2.0
        env:
        - name: TOOL_NAME
          value: {{ .ToolName }}
        - name: TOOL_PARAMS
          value: {{ .ParamsJSON }}
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: http://otel-collector:4317
        - name: CALL_ID
          value: {{ .CallID }}
        resources:
          limits:
            cpu: "500m"
            memory: "256Mi"
        securityContext:
          runAsNonRoot: true
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
Enter fullscreen mode Exit fullscreen mode

The agent-tool-executor service account has only the RBAC permissions required for the specific tool it executes. Nothing more. Workload identity or External Secrets handles credential injection at runtime — no static credentials in the container spec.

The Architecture As a Whole

Here's how the pieces connect:

graph LR
    A[AI Agent] -->|tool call request| B[Agent Gateway]
    B -->|policy check| C[OPA]
    C -->|allow/deny| B
    B -->|spawn| D[Ephemeral Container]
    B -->|emit span| E[OTel Collector]
    D -->|execute| F[Infrastructure API]
    F -->|result| D
    D -->|result + trace| B
    B -->|response| A
    E -->|traces| G[Tempo]
    E -->|metrics| H[Prometheus]

    style B fill:#1a365d,color:#63b3ed
    style C fill:#1c4532,color:#9ae6b4
    style D fill:#2d1b69,color:#b794f4
    style E fill:#1a365d,color:#63b3ed
Enter fullscreen mode Exit fullscreen mode

The agent sees none of this. It makes a tool call and gets back a result (or an authorization error). Everything between the agent and the infrastructure API is controlled at the infrastructure layer.

What This Looks Like in Practice

An agent trying to check the status of a deployment in production:

  1. Agent sends: {"tool": "k8s_get_deployment", "namespace": "production", "name": "api-server"}
  2. Gateway receives the request, constructs the OPA input object
  3. OPA evaluates: agent is authorized, action is "get", namespace is "production" — check against policy
  4. If the policy allows read in production for this agent: spawn ephemeral container with minimal service account
  5. Container calls the Kubernetes API, retrieves the deployment status
  6. Result returned to gateway, emitted as a trace span, forwarded to agent
  7. Container terminates within 30 seconds of completion

Total execution: 2-4 seconds including container startup. For automation workflows, that's acceptable. For interactive debugging, it's slightly awkward — worth noting as the main UX tradeoff.

When This Is Overkill (And When It Isn't)

I want to be honest: this pattern has overhead. Container startup time, OPA latency (typically 1-5ms for simple policies), OTel export — none of it is free.

For a personal automation script or a development environment, direct API access with a narrow service account is fine. The gateway pattern is not the answer to every agent use case.

But if you're running agents in production against shared infrastructure, or giving agents access to multiple environments, or having anyone other than yourself rely on the agent — the overhead is justified. The alternative is discovering the blast radius of a misbehaving agent in the worst possible way.

The production threshold I use: if the agent can affect something that takes more than 30 minutes to recover from, it needs a control plane between it and that resource.

Tools Worth Knowing

katanemo/plano — an AI-native proxy built in Rust specifically for agentic apps. Offloads routing, auth, and observability from agent code. 5,600+ stars. Worth watching if you don't want to build the gateway yourself.

Open Policy Agent — battle-tested, widely deployed, good Kubernetes integration. If you're already running OPA for cluster admission control, extending it to agent authorization is a natural step.

OpenTelemetry Collector — if you have OTel in your stack already, the agent gateway just becomes another telemetry source. No new infrastructure required.

Ephemeral containers in Kubernetes — native since 1.25 GA, though for this pattern you're more likely to use short-TTL Jobs rather than ephemeral debug containers. The Job approach is simpler to reason about and easier to RBAC.

The Mental Shift That Actually Matters

The teams that struggle most with this pattern are the ones that treat it as an application engineering problem. "We'll add some checks in the agent code." "We'll be careful with what credentials we give it."

That's the wrong frame.

Agent-to-infrastructure communication is an infrastructure concern. The same way you don't secure service-to-service communication with "be careful in the application code" — you secure it at the network layer, with consistent policy, with enforced observability.

Agents that touch real infrastructure need a data plane. That data plane needs to be operated, not just written.

The good news is that the building blocks — OPA, OTel, containers — are already in most production stacks. The work is integration and adoption, not net-new tooling.

What's your current blast radius model for agents that have infrastructure access? I'm curious how teams are handling this in practice. Reach out if you're working through this.


LinkedIn

Top comments (0)