Pranay Batta

Posted on Feb 26

Control and Visibility for Claude Code: Enterprise Governance for AI Coding Agents

#ai #programming #agents #claude

Claude Code enables developers to delegate coding tasks to AI agents directly from their terminal. However, running AI coding agents in enterprise environments without governance creates significant risks: uncontrolled API costs, no visibility into what agents do, and inability to enforce budgets or rate limits.

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration…

View on GitHub

This guide shows how to add enterprise control and visibility to Claude Code using Bifrost.

The Claude Code Governance Problem

Claude Code Without Governance:

No budget controls (costs can spiral)
Zero visibility into agent actions
No rate limiting (can hit provider limits)
No audit trails
No team-level cost attribution

Enterprise Requirements:

Per-user or per-team budgets
Real-time cost tracking
Rate limit enforcement
Complete audit logs
Usage visibility

Solution: Bifrost as Claude Code Gateway

Overview - Bifrost

Use Bifrost with LibreChat, Claude Code, Codex CLI, Gemini CLI, Qwen Code, and more by pointing each tool at the correct Bifrost endpoint.

docs.getbifrost.ai

Architecture:

Claude Code CLI
    ↓ (via Bifrost proxy)
Bifrost Gateway (governance + observability)
    ↓
Anthropic Claude API

What Bifrost Adds:

Hierarchical budgets (team/user/project)
Real-time rate limiting
Complete request/response logging
Cost attribution and tracking
Prometheus metrics + dashboards

Setup: Claude Code with Bifrost

Step 1: Install and Configure Bifrost

# Install Bifrost
npx -y @maximhq/bifrost

# Bifrost runs at http://localhost:8080

Step 2: Configure Anthropic Provider

Add Anthropic API Key (Web UI at http://localhost:8080):

Go to "Providers" → "Add Provider"
Select "Anthropic"
Add your API key
Save

Or via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "anthropic",
    "keys": [
      {
        "name": "anthropic-key-1",
        "value": "env.ANTHROPIC_API_KEY",
        "weight": 1.0
      }
    ]
  }'

Step 3: Create Virtual Keys with Budgets

Per-Team Budget (Engineering team: $500/month):

# Create customer
curl -X POST http://localhost:8080/api/governance/customers \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Corp",
    "budget": {
      "max_limit": 5000.00,
      "reset_duration": "1M"
    }
  }'

# Create team
curl -X POST http://localhost:8080/api/governance/teams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering Team",
    "customer_id": "customer-acme",
    "budget": {
      "max_limit": 500.00,
      "reset_duration": "1M"
    }
  }'

# Create virtual key for team
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-eng-team \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-engineering",
    "budget": {
      "max_limit": 500.00,
      "reset_duration": "1M"
    },
    "rate_limit": {
      "request_max_limit": 1000,
      "request_reset_duration": "1h",
      "token_max_limit": 500000,
      "token_reset_duration": "1h"
    }
  }'

Per-User Budget (Developer: $50/month):

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-engineering",
    "budget": {
      "max_limit": 50.00,
      "reset_duration": "1M"
    },
    "rate_limit": {
      "request_max_limit": 100,
      "request_reset_duration": "1h"
    }
  }'

Step 4: Configure Claude Code to Use Bifrost

Set Environment Variables:

export ANTHROPIC_API_KEY="vk-dev-alice"  # Virtual key, not direct API key
export ANTHROPIC_BASE_URL="http://localhost:8080"

Or create .env file:

ANTHROPIC_API_KEY=vk-dev-alice
ANTHROPIC_BASE_URL=http://localhost:8080

Step 5: Use Claude Code Normally

# Claude Code now routes through Bifrost
claude-code "create a REST API for user management"

# All requests governed by virtual key rules:
# - Budget checked
# - Rate limits enforced
# - Full audit logging
# - Cost tracking

Governance Features

Hierarchical Budget Enforcement

Budget Hierarchy (all checked for every request):

Customer: Acme Corp ($5,000/month)
    ↓
Team: Engineering ($500/month)
    ↓
User: Alice ($50/month)

Budget Checking Flow:

✅ Check user budget ($48/$50 remaining)
✅ Check team budget ($450/$500 remaining)
✅ Check customer budget ($4,800/$5,000 remaining)
Request proceeds (all budgets pass)
After request ($2 cost):
- User: $50/$50
- Team: $452/$500
- Customer: $4,802/$5,000

Next Request: Blocked (user budget exceeded)

Error response:

{
  "error": {
    "type": "budget_exceeded",
    "message": "Budget exceeded: VK budget exceeded: 50.00 > 50.00 dollars"
  }
}

Rate Limiting

Per-User Rate Limits:

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limit": {
      "request_max_limit": 100,
      "request_reset_duration": "1h",
      "token_max_limit": 50000,
      "token_reset_duration": "1h"
    }
  }'

Behavior:

Max 100 requests per hour
Max 50,000 tokens per hour
Exceeding either triggers 429 error

Error Response:

{
  "error": {
    "type": "rate_limited",
    "message": "Rate limits exceeded: [request limit exceeded (101/100, resets every 1h)]"
  }
}

Model Access Control

Restrict to Specific Models:

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
  -H "Content-Type: application/json" \
  -d '{
    "provider_configs": [
      {
        "provider": "anthropic",
        "allowed_models": ["claude-3-5-haiku-20241022"]
      }
    ]
  }'

Behavior: Requests for claude-3-opus-4-20240229 blocked with 403 error

Observability and Visibility

Built-in Dashboard

Access: http://localhost:8080

Real-Time Visibility:

Request logs (prompt, response, tokens, cost)
Cost tracking per user/team/customer
Rate limit utilization
Token usage trends
Latency distribution

Prometheus Metrics

Metrics Endpoint: http://localhost:8080/metrics

Key Metrics:

# Total cost by virtual key
sum(bifrost_cost_total) by (vk)

# Budget utilization
(budget_usage / budget_limit) by (vk)

# Requests per user
rate(bifrost_requests_total[5m]) by (vk)

# Token usage
sum(bifrost_tokens_total) by (vk, token_type)

Alerting (Prometheus):

groups:
  - name: claude_code_budgets
    rules:
      - alert: UserBudgetNearLimit
        expr: (budget_usage{vk="vk-dev-alice"} / budget_limit{vk="vk-dev-alice"}) > 0.8
        labels:
          severity: warning
        annotations:
          summary: "Alice approaching budget limit (>80%)"

      - alert: TeamBudgetCritical
        expr: (team_budget_usage / team_budget_limit) > 0.9
        labels:
          severity: critical
        annotations:
          summary: "Engineering team 90% budget consumed"

Complete Audit Trails

Request Logging:

Every Claude Code request logged with:

Virtual key used
User ID (via x-bf-user-id header)
Model requested
Token usage (input + output)
Cost calculated
Timestamp
Latency

Query Logs (via dashboard or API):

# Get all requests for user Alice
curl http://localhost:8080/api/logs?vk=vk-dev-alice

# Filter by date range
curl "http://localhost:8080/api/logs?vk=vk-dev-alice&start=2026-02-01&end=2026-02-28"

Multi-Team Configuration

Scenario: Engineering + Data Science teams with separate budgets.

Configuration:

# Engineering team: $500/month, Claude 3.5 Haiku
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-eng-team \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-engineering",
    "budget": {"max_limit": 500.00, "reset_duration": "1M"},
    "provider_configs": [
      {
        "provider": "anthropic",
        "allowed_models": ["claude-3-5-haiku-20241022"]
      }
    ]
  }'

# Data Science team: $1,000/month, Claude Opus 4
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-ds-team \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-data-science",
    "budget": {"max_limit": 1000.00, "reset_duration": "1M"},
    "provider_configs": [
      {
        "provider": "anthropic",
        "allowed_models": ["claude-opus-4-20240229"]
      }
    ]
  }'

Usage:

# Engineering developer
export ANTHROPIC_API_KEY="vk-eng-team"
claude-code "refactor this function"

# Data Science researcher
export ANTHROPIC_API_KEY="vk-ds-team"
claude-code "analyze this dataset"

Cost Optimization

Semantic Caching

Enable Caching (40-60% cost reduction):

# Via Web UI: Features → Semantic Caching → Enable

How It Works:

Similar prompts return cached responses
Example: "fix this bug" vs "debug this code"
Cache hit = no provider cost
Sub-millisecond response time

Impact:

40-60% cost reduction for repetitive coding tasks
Faster responses (cached results)
Proportional budget savings

Multi-Provider Failover

Configuration:

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-cost-optimized \
  -H "Content-Type: application/json" \
  -d '{
    "provider_configs": [
      {
        "provider": "anthropic",
        "weight": 0.8,
        "allowed_models": ["claude-3-5-haiku-20241022"]
      },
      {
        "provider": "openai",
        "weight": 0.2,
        "allowed_models": ["gpt-4o-mini"]
      }
    ]
  }'

Behavior: 80% Anthropic, 20% OpenAI (cost optimization)

Get Started

Install Bifrost:

npx -y @maximhq/bifrost

Configure Claude Code:

export ANTHROPIC_API_KEY="your-virtual-key"
export ANTHROPIC_BASE_URL="http://localhost:8080"

Docs: https://getmax.im/bifrostdocs

GitHub: https://git.new/bifrost

Key Takeaway: Claude Code lacks enterprise governance (no budgets, rate limits, or visibility). Bifrost adds hierarchical budget controls (team/user/project levels), real-time rate limiting, complete audit trails, and unified observability—enabling safe Claude Code deployments in enterprise environments with per-user budgets, cost tracking, and Prometheus metrics.

Top comments (2)

Matthew Hou • Feb 28

The governance layer is the piece most teams skip until they get a surprise bill or a security incident.

Budget enforcement and rate limits are table stakes, but the audit trail is where the real value is. When an AI agent makes a change that breaks something three days later, you need to trace back: what was the prompt, what files were read, what was the full context. Without that, debugging AI-assisted code is archaeology.

One gap I see in most gateway approaches: they control the API layer but not the execution layer. An AI agent that has shell access can do damage that never touches the API. The Copilot CLI exploit this week (malware execution via allowlisted env command) is a perfect example — the API-level controls were fine, the command execution sandbox was missing.

Matthew Hou • Feb 27

This hits a gap that most AI coding discussions ignore: what happens when it's not just you and your terminal, but a team of 20 developers all running agents with different budgets, permissions, and context?

The governance angle matters because AI agents amplify not just individual productivity, but individual mistakes. One developer's misconfigured agent burning through API credits is annoying. An agent with production database access making a "helpful" schema change is a different category of problem.

I think the budget and rate limit controls are table stakes, but the visibility piece is what actually changes behavior. When developers can see how their agent spends tokens and what it actually does (not just the final output), they start designing better prompts and better project structures. It's the same dynamic as CI — making failures visible and fast is what drives improvement.

The attention cost of managing AI tools is real and underestimated. Tools like this help by shifting that cost from "manually watch what the agent does" to "set guardrails and review dashboards."