Every MCP gateway guide stops at routing and auth. Here's what comes after — and why it determines whether your agents stay under budget or burn through it.
What Is an MCP Gateway?
The Model Context Protocol (MCP) changed how AI agents interact with tools. Instead of every agent team building custom integrations for Slack, GitHub, databases, and APIs, MCP provides a standard interface: agents speak MCP, tools expose MCP servers, and everyone connects.
Then reality set in. One agent connecting to one MCP server is a demo. Fifty agents connecting to twenty MCP servers across five teams is production. And production needs a gateway.
An MCP gateway sits between AI agents and MCP servers. Instead of each agent maintaining direct connections to every tool server, agents connect to the gateway, and the gateway manages upstream connections.
# Without a gateway:
Agent A → MCP Server (GitHub)
Agent A → MCP Server (Slack)
Agent A → MCP Server (Database)
Agent B → MCP Server (GitHub)
Agent B → MCP Server (Slack)
Agent B → MCP Server (Database)
# 6 connections, each configured separately
# With a gateway:
Agent A → MCP Gateway → MCP Server (GitHub)
Agent B → MCP Gateway → MCP Server (Slack)
→ MCP Server (Database)
# 2 agent connections, gateway manages the rest
This centralization solves three immediate problems:
- Configuration sprawl. Without a gateway, each agent needs credentials and connection details for every tool. With a gateway, agents authenticate once.
- Auth translation. MCP servers often need specific credentials (OAuth tokens, API keys, service accounts). The gateway handles credential management so agents don't carry sensitive tokens.
- Tool discovery. The gateway aggregates tool definitions from all upstream servers, presenting agents with a unified catalog of available capabilities.
MCP Gateway Architecture: The Standard Stack
Most MCP gateway implementations share a common architecture with four layers:
Layer 1: Transport
MCP supports multiple transports: stdio (local processes), SSE (Server-Sent Events over HTTP), and the newer Streamable HTTP transport. A gateway typically accepts connections via SSE or Streamable HTTP on the client side, and connects to upstream servers using whatever transport they support.
Layer 2: Authentication & Authorization
The gateway becomes your authentication boundary. Agents authenticate to the gateway; the gateway authenticates to upstream servers. Standard auth answers one question: is this agent allowed to connect? Binary. Yes or no.
Layer 3: Tool Aggregation & Filtering
When an agent connects, it calls tools/list to discover available tools. The gateway aggregates tool definitions from all upstream servers, optionally filtering based on the agent's role or permissions.
Layer 4: Observability
The gateway instruments MCP traffic. Every tool call passes through it, so you get a complete audit log without modifying agents or servers.
The Gap: What Standard MCP Gateways Miss
If you follow Docker's MCP gateway guide, or Traefik's, or Composio's, you'll end up with a working gateway that routes traffic, handles auth, aggregates tools, and logs everything. That's genuinely useful.
It's also incomplete in a way that won't be obvious until the first cost incident.
Here's the scenario: A research agent connects to your MCP gateway. It has access to a code search tool (fast, cheap) and a code analysis tool (slow, expensive — it invokes an LLM under the hood). The agent calls the analysis tool 800 times in two hours.
Your gateway logged every call. Your metrics show a spike. Your alert fires. But the damage is done — $2,400 in compute costs, triggered by a single agent.
The standard gateway stack had four opportunities to prevent this. It used zero:
- Authentication confirmed the agent was valid. It didn't check whether the agent could afford 800 expensive tool calls.
- Authorization confirmed the agent was allowed to use the tool. It didn't limit how much the agent could spend on it.
- Observability recorded every call. It didn't stop any of them.
- Rate limiting counted requests per window. It didn't know that some requests cost $0.01 and others cost $3.00.
Layer 5: Economic Governance
Economic governance adds three capabilities your MCP gateway needs:
1. Per-Tool Cost Modeling
Every tool in your MCP catalog has an economic weight. A search_code call that hits a local index costs virtually nothing. A generate_analysis call that invokes Claude costs real money. The gateway needs to know the difference.
tools:
github.search_code:
cost: 1 credit # ~$0.001
analysis.review_code:
cost: 50 credits # ~$0.50 (invokes LLM)
analysis.generate_report:
cost: 200 credits # ~$2.00 (long-form generation)
With cost modeling, rate limiting becomes budget limiting. An agent with 500 credits can make 500 searches, or 10 code reviews, or 2 report generations.
2. Budget-Aware Tokens
Standard bearer tokens say "this agent is authenticated." Budget-aware tokens say "this agent is authenticated and has 1,000 credits remaining."
SatGate implements this with macaroon tokens — a cryptographic credential format designed at Google that supports embedded caveats. A macaroon can encode total budget, expiration time, allowed tools, and delegation chains.
The critical property: macaroons support attenuation. A parent token can mint child tokens with fewer permissions, never more. An orchestrator with 10,000 credits can delegate 2,000 to a research sub-agent. That sub-agent can delegate 500 to a search specialist. Authority flows downward and diminishes — exactly the pattern multi-agent architectures need.
3. Pre-Call Enforcement
This is the distinction between observability and governance. Observability logs a tool call after it happens. Governance decides whether the call happens at all.
# Gateway decision flow:
1. Agent calls tools/call with macaroon token
2. Gateway validates macaroon signature ✓
3. Gateway checks: is this tool allowed? ✓
4. Gateway looks up tool cost: 50 credits
5. Gateway checks remaining budget: 30 credits
6. 30 < 50 → DENY with structured 402 response
The denial is structured. The agent gets machine-readable context: how much it has, how much it needs, and what cheaper alternatives exist. Compare this to a rate-limit 429, which just says "try again later" and triggers a retry loop.
The MCP Gateway Maturity Model
Think of MCP gateway deployment as a progression:
- Level 0: Direct connections. Each agent connects to each server. Works for prototypes.
- Level 1: Routing gateway. Centralized connections, auth translation, tool aggregation. This is where most guides end.
- Level 2: Observable gateway. Add structured logging, metrics, and alerting. You know what happened. You can't prevent it.
- Level 3: Governed gateway. Add cost modeling, budget enforcement, and hierarchical delegation. You control what happens, in real time.
Most teams in early 2026 are at Level 1 or 2. The cost incidents that push them to Level 3 are predictable and preventable.
Getting Started
Economic governance isn't about distrust — it's about enabling autonomy safely. Agents with clear budget boundaries can operate more independently, because the organization knows the blast radius is contained. The gateway doesn't slow agents down. It lets you give them a longer leash.
SatGate adds economic governance to your MCP gateway. Open source:
go install github.com/satgate-io/satgate/cmd/satgate-mcp@latest
Top comments (0)