How We Built Budget Enforcement for MCP Tool Calls
Published: February 13, 2026
MCP (Model Context Protocol) gives AI agents structured access to tools. What it doesn't give them is a credit card limit.
We shipped an open-source MCP proxy that intercepts tools/call JSON-RPC messages and enforces per-tool budgets with cryptographic delegation. Here's how we built it and what we learned.
The Architecture
The proxy sits between any MCP client and any MCP server. It speaks standard MCP protocol — agents don't know it's there.
Agent → [stdio/SSE] → SatGate MCP Proxy → [stdio] → Upstream MCP Server
│
Budget Enforcement
Per-tool Cost Attribution
Token Delegation
Two transport modes:
- stdio: Local sidecar. One agent, one process. Zero network overhead.
- SSE/HTTP: Remote server. Multiple agents connect over HTTP. Each gets an independent SSE event stream.
Per-Tool Cost Resolution
Not all tool calls cost the same. A web_search is cheap. A dalle_generate is expensive. Our cost resolver supports exact match and wildcard prefixes:
tools:
defaultCost: 5
costs:
web_search: 5
database_query: 5
gpt4_summarize: 25
gpt4_*: 25 # wildcard: gpt4_analyze, gpt4_translate...
dalle_generate: 50
code_execute: 15
Resolution order: exact match → longest wildcard prefix → catch-all * → default. Same pattern as the enterprise cost attribution engine, but running locally.
Budget Enforcement
The BudgetEnforcer interface is the split point between OSS and Enterprise:
type BudgetEnforcer interface {
Check(ctx, tokenID string, cost int64) (*BudgetResult, error)
Spend(ctx, tokenID, toolName string, cost int64, requestID string) (*BudgetResult, error)
Remaining(ctx, tokenID string) (int64, error)
Initialize(ctx, tokenID string, credits int64) error
}
OSS provides InMemoryBudgetEnforcer — a mutex-protected map. Simple, fast, not durable across restarts. Good enough for local development and single-session agents.
Enterprise provides RedisBudgetEnforcer — atomic Lua scripts, idempotent spend tracking, Postgres audit trail. The dashboard reads the same spend ledger in real-time.
When budget hits zero:
{"jsonrpc":"2.0","id":42,"error":{
"code":-32000,
"message":"Budget exhausted",
"data":{
"error":"budget_exhausted",
"tool":"dalle_generate",
"cost_credits":50,
"remaining_credits":0,
"token_id":"abc123"
}
}}
The agent gets a structured error it can handle gracefully — not a crashed process or an infinite retry.
Delegation: The Hard Part
The interesting engineering is in delegation. When an orchestrator agent spawns sub-agents, each needs its own budget. The parent carves credits from its own allocation:
Parent: 1000 credits
├── satgate/delegate(300, "research-agent") → child token
├── satgate/delegate(200, "content-agent") → child token
└── 500 credits remaining
This is implemented as a SatGate extension to MCP — satgate/delegate and satgate/budget methods, namespaced to avoid conflicts with standard MCP.
The key design decision: token identity = hash(identifier + signature). Macaroon delegation produces child tokens with the same identifier as the parent (that's how HMAC chaining works — you chain from the current signature, not a new identifier). But each delegation adds caveats that change the signature, making the hash unique. This gives us a stable budget key per token without requiring a separate identity system.
Budget isolation is enforced at the spend level. When research-agent exhausts its 300 credits, content-agent and the parent are unaffected. We verified this with a Go integration test that delegates to three children, exhausts one, and confirms the others still operate.
What We Learned
Macaroons are underrated for agent auth. The HMAC chain means verification is constant-time with no database lookup. Delegation is just appending caveats. Permissions can only narrow, never widen. This is exactly what you want for agent-to-agent delegation.
stdio transport is simpler than you'd think. Newline-delimited JSON over stdin/stdout. No HTTP overhead, no connection management. The upstream manager spawns the MCP server as a subprocess, pipes stdin/stdout, and correlates request/response IDs. The tricky part is the read loop — responses arrive asynchronously, so you need a sync.Map of pending channels keyed by request ID.
SSE needs keepalive. Connections through load balancers and proxies will drop after 30-60 seconds of silence. A periodic SSE comment line (: keepalive\n\n) prevents this. Also: make your message handler async. If tool calls block the HTTP response goroutine while waiting for upstream, you get head-of-line blocking across sessions.
Fail-mode matters. When the budget backend (Redis) is unreachable, do you deny all calls (closed) or allow and log (open)? We default to closed — it's the secure choice. But some deployments prefer open (the agent's work is more valuable than the budget risk). Making this a config option was worth it.
Numbers
- 18 source files, 2,164 lines of code
- 10 test files, 1,365 lines of tests
- 28 tests: budget, auth, delegation, config, JSON-RPC, SSE, integration
- Built in one evening
The code is at github.com/SatGate-io/satgate/pkg/mcpserver.
SatGate is an open-source Economic Firewall for AI agents. The MCP proxy is part of the OSS gateway. Enterprise features (Redis budgets, dashboard, multi-tenant) are available separately.
Top comments (0)