Claude Code enables developers to delegate coding tasks to AI agents directly from their terminal. However, running AI coding agents in enterprise environments without governance creates significant risks: uncontrolled API costs, no visibility into what agents do, and inability to enforce budgets or rate limits.
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
This guide shows how to add enterprise control and visibility to Claude Code using Bifrost.
The Claude Code Governance Problem
Claude Code Without Governance:
- No budget controls (costs can spiral)
- Zero visibility into agent actions
- No rate limiting (can hit provider limits)
- No audit trails
- No team-level cost attribution
Enterprise Requirements:
- Per-user or per-team budgets
- Real-time cost tracking
- Rate limit enforcement
- Complete audit logs
- Usage visibility
Solution: Bifrost as Claude Code Gateway
Architecture:
Claude Code CLI
↓ (via Bifrost proxy)
Bifrost Gateway (governance + observability)
↓
Anthropic Claude API
What Bifrost Adds:
- Hierarchical budgets (team/user/project)
- Real-time rate limiting
- Complete request/response logging
- Cost attribution and tracking
- Prometheus metrics + dashboards
Setup: Claude Code with Bifrost
Step 1: Install and Configure Bifrost
# Install Bifrost
npx -y @maximhq/bifrost
# Bifrost runs at http://localhost:8080
Step 2: Configure Anthropic Provider
Add Anthropic API Key (Web UI at http://localhost:8080):
- Go to "Providers" → "Add Provider"
- Select "Anthropic"
- Add your API key
- Save
Or via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "anthropic",
"keys": [
{
"name": "anthropic-key-1",
"value": "env.ANTHROPIC_API_KEY",
"weight": 1.0
}
]
}'
Step 3: Create Virtual Keys with Budgets
Per-Team Budget (Engineering team: $500/month):
# Create customer
curl -X POST http://localhost:8080/api/governance/customers \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Corp",
"budget": {
"max_limit": 5000.00,
"reset_duration": "1M"
}
}'
# Create team
curl -X POST http://localhost:8080/api/governance/teams \
-H "Content-Type: application/json" \
-d '{
"name": "Engineering Team",
"customer_id": "customer-acme",
"budget": {
"max_limit": 500.00,
"reset_duration": "1M"
}
}'
# Create virtual key for team
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-eng-team \
-H "Content-Type: application/json" \
-d '{
"team_id": "team-engineering",
"budget": {
"max_limit": 500.00,
"reset_duration": "1M"
},
"rate_limit": {
"request_max_limit": 1000,
"request_reset_duration": "1h",
"token_max_limit": 500000,
"token_reset_duration": "1h"
}
}'
Per-User Budget (Developer: $50/month):
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
-H "Content-Type: application/json" \
-d '{
"team_id": "team-engineering",
"budget": {
"max_limit": 50.00,
"reset_duration": "1M"
},
"rate_limit": {
"request_max_limit": 100,
"request_reset_duration": "1h"
}
}'
Step 4: Configure Claude Code to Use Bifrost
Set Environment Variables:
export ANTHROPIC_API_KEY="vk-dev-alice" # Virtual key, not direct API key
export ANTHROPIC_BASE_URL="http://localhost:8080"
Or create .env file:
ANTHROPIC_API_KEY=vk-dev-alice
ANTHROPIC_BASE_URL=http://localhost:8080
Step 5: Use Claude Code Normally
# Claude Code now routes through Bifrost
claude-code "create a REST API for user management"
# All requests governed by virtual key rules:
# - Budget checked
# - Rate limits enforced
# - Full audit logging
# - Cost tracking
Governance Features
Hierarchical Budget Enforcement
Budget Hierarchy (all checked for every request):
Customer: Acme Corp ($5,000/month)
↓
Team: Engineering ($500/month)
↓
User: Alice ($50/month)
Budget Checking Flow:
- ✅ Check user budget ($48/$50 remaining)
- ✅ Check team budget ($450/$500 remaining)
- ✅ Check customer budget ($4,800/$5,000 remaining)
- Request proceeds (all budgets pass)
- After request ($2 cost):
- User: $50/$50
- Team: $452/$500
- Customer: $4,802/$5,000
Next Request: Blocked (user budget exceeded)
Error response:
{
"error": {
"type": "budget_exceeded",
"message": "Budget exceeded: VK budget exceeded: 50.00 > 50.00 dollars"
}
}
Rate Limiting
Per-User Rate Limits:
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
-H "Content-Type: application/json" \
-d '{
"rate_limit": {
"request_max_limit": 100,
"request_reset_duration": "1h",
"token_max_limit": 50000,
"token_reset_duration": "1h"
}
}'
Behavior:
- Max 100 requests per hour
- Max 50,000 tokens per hour
- Exceeding either triggers 429 error
Error Response:
{
"error": {
"type": "rate_limited",
"message": "Rate limits exceeded: [request limit exceeded (101/100, resets every 1h)]"
}
}
Model Access Control
Restrict to Specific Models:
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
-H "Content-Type: application/json" \
-d '{
"provider_configs": [
{
"provider": "anthropic",
"allowed_models": ["claude-3-5-haiku-20241022"]
}
]
}'
Behavior: Requests for claude-3-opus-4-20240229 blocked with 403 error
Observability and Visibility
Built-in Dashboard
Access: http://localhost:8080
Real-Time Visibility:
- Request logs (prompt, response, tokens, cost)
- Cost tracking per user/team/customer
- Rate limit utilization
- Token usage trends
- Latency distribution
Prometheus Metrics
Metrics Endpoint: http://localhost:8080/metrics
Key Metrics:
# Total cost by virtual key
sum(bifrost_cost_total) by (vk)
# Budget utilization
(budget_usage / budget_limit) by (vk)
# Requests per user
rate(bifrost_requests_total[5m]) by (vk)
# Token usage
sum(bifrost_tokens_total) by (vk, token_type)
Alerting (Prometheus):
groups:
- name: claude_code_budgets
rules:
- alert: UserBudgetNearLimit
expr: (budget_usage{vk="vk-dev-alice"} / budget_limit{vk="vk-dev-alice"}) > 0.8
labels:
severity: warning
annotations:
summary: "Alice approaching budget limit (>80%)"
- alert: TeamBudgetCritical
expr: (team_budget_usage / team_budget_limit) > 0.9
labels:
severity: critical
annotations:
summary: "Engineering team 90% budget consumed"
Complete Audit Trails
Request Logging:
Every Claude Code request logged with:
- Virtual key used
- User ID (via
x-bf-user-idheader) - Model requested
- Token usage (input + output)
- Cost calculated
- Timestamp
- Latency
Query Logs (via dashboard or API):
# Get all requests for user Alice
curl http://localhost:8080/api/logs?vk=vk-dev-alice
# Filter by date range
curl "http://localhost:8080/api/logs?vk=vk-dev-alice&start=2026-02-01&end=2026-02-28"
Multi-Team Configuration
Scenario: Engineering + Data Science teams with separate budgets.
Configuration:
# Engineering team: $500/month, Claude 3.5 Haiku
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-eng-team \
-H "Content-Type: application/json" \
-d '{
"team_id": "team-engineering",
"budget": {"max_limit": 500.00, "reset_duration": "1M"},
"provider_configs": [
{
"provider": "anthropic",
"allowed_models": ["claude-3-5-haiku-20241022"]
}
]
}'
# Data Science team: $1,000/month, Claude Opus 4
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-ds-team \
-H "Content-Type: application/json" \
-d '{
"team_id": "team-data-science",
"budget": {"max_limit": 1000.00, "reset_duration": "1M"},
"provider_configs": [
{
"provider": "anthropic",
"allowed_models": ["claude-opus-4-20240229"]
}
]
}'
Usage:
# Engineering developer
export ANTHROPIC_API_KEY="vk-eng-team"
claude-code "refactor this function"
# Data Science researcher
export ANTHROPIC_API_KEY="vk-ds-team"
claude-code "analyze this dataset"
Cost Optimization
Semantic Caching
Enable Caching (40-60% cost reduction):
# Via Web UI: Features → Semantic Caching → Enable
How It Works:
- Similar prompts return cached responses
- Example: "fix this bug" vs "debug this code"
- Cache hit = no provider cost
- Sub-millisecond response time
Impact:
- 40-60% cost reduction for repetitive coding tasks
- Faster responses (cached results)
- Proportional budget savings
Multi-Provider Failover
Configuration:
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-cost-optimized \
-H "Content-Type: application/json" \
-d '{
"provider_configs": [
{
"provider": "anthropic",
"weight": 0.8,
"allowed_models": ["claude-3-5-haiku-20241022"]
},
{
"provider": "openai",
"weight": 0.2,
"allowed_models": ["gpt-4o-mini"]
}
]
}'
Behavior: 80% Anthropic, 20% OpenAI (cost optimization)
Get Started
Install Bifrost:
npx -y @maximhq/bifrost
Configure Claude Code:
export ANTHROPIC_API_KEY="your-virtual-key"
export ANTHROPIC_BASE_URL="http://localhost:8080"
Docs: https://getmax.im/bifrostdocs
GitHub: https://git.new/bifrost
Key Takeaway: Claude Code lacks enterprise governance (no budgets, rate limits, or visibility). Bifrost adds hierarchical budget controls (team/user/project levels), real-time rate limiting, complete audit trails, and unified observability—enabling safe Claude Code deployments in enterprise environments with per-user budgets, cost tracking, and Prometheus metrics.

Top comments (2)
The governance layer is the piece most teams skip until they get a surprise bill or a security incident.
Budget enforcement and rate limits are table stakes, but the audit trail is where the real value is. When an AI agent makes a change that breaks something three days later, you need to trace back: what was the prompt, what files were read, what was the full context. Without that, debugging AI-assisted code is archaeology.
One gap I see in most gateway approaches: they control the API layer but not the execution layer. An AI agent that has shell access can do damage that never touches the API. The Copilot CLI exploit this week (malware execution via allowlisted
envcommand) is a perfect example — the API-level controls were fine, the command execution sandbox was missing.This hits a gap that most AI coding discussions ignore: what happens when it's not just you and your terminal, but a team of 20 developers all running agents with different budgets, permissions, and context?
The governance angle matters because AI agents amplify not just individual productivity, but individual mistakes. One developer's misconfigured agent burning through API credits is annoying. An agent with production database access making a "helpful" schema change is a different category of problem.
I think the budget and rate limit controls are table stakes, but the visibility piece is what actually changes behavior. When developers can see how their agent spends tokens and what it actually does (not just the final output), they start designing better prompts and better project structures. It's the same dynamic as CI — making failures visible and fast is what drives improvement.
The attention cost of managing AI tools is real and underestimated. Tools like this help by shifting that cost from "manually watch what the agent does" to "set guardrails and review dashboards."