DEV Community

Cover image for Control and Visibility for Claude Code: Enterprise Governance for AI Coding Agents
Pranay Batta
Pranay Batta

Posted on

Control and Visibility for Claude Code: Enterprise Governance for AI Coding Agents

Claude Code enables developers to delegate coding tasks to AI agents directly from their terminal. However, running AI coding agents in enterprise environments without governance creates significant risks: uncontrolled API costs, no visibility into what agents do, and inability to enforce budgets or rate limits.

GitHub logo maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

Go Report Card Discord badge Known Vulnerabilities codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration…

This guide shows how to add enterprise control and visibility to Claude Code using Bifrost.


The Claude Code Governance Problem

Claude Code Without Governance:

  • No budget controls (costs can spiral)
  • Zero visibility into agent actions
  • No rate limiting (can hit provider limits)
  • No audit trails
  • No team-level cost attribution

Enterprise Requirements:

  • Per-user or per-team budgets
  • Real-time cost tracking
  • Rate limit enforcement
  • Complete audit logs
  • Usage visibility

Solution: Bifrost as Claude Code Gateway

Tools, Editors & CLI Agents - Bifrost

Use Bifrost with tools like LibreChat, Claude Code, Codex CLI, Gemini CLI and Qwen Code by just changing the base URL and unlock advanced features.

favicon docs.getbifrost.ai

Architecture:

Claude Code CLI
    ↓ (via Bifrost proxy)
Bifrost Gateway (governance + observability)
    ↓
Anthropic Claude API
Enter fullscreen mode Exit fullscreen mode

What Bifrost Adds:

  • Hierarchical budgets (team/user/project)
  • Real-time rate limiting
  • Complete request/response logging
  • Cost attribution and tracking
  • Prometheus metrics + dashboards

Setup: Claude Code with Bifrost

Step 1: Install and Configure Bifrost

# Install Bifrost
npx -y @maximhq/bifrost

# Bifrost runs at http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure Anthropic Provider

Add Anthropic API Key (Web UI at http://localhost:8080):

  1. Go to "Providers" → "Add Provider"
  2. Select "Anthropic"
  3. Add your API key
  4. Save

Or via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "anthropic",
    "keys": [
      {
        "name": "anthropic-key-1",
        "value": "env.ANTHROPIC_API_KEY",
        "weight": 1.0
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Step 3: Create Virtual Keys with Budgets

Per-Team Budget (Engineering team: $500/month):

# Create customer
curl -X POST http://localhost:8080/api/governance/customers \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Corp",
    "budget": {
      "max_limit": 5000.00,
      "reset_duration": "1M"
    }
  }'

# Create team
curl -X POST http://localhost:8080/api/governance/teams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering Team",
    "customer_id": "customer-acme",
    "budget": {
      "max_limit": 500.00,
      "reset_duration": "1M"
    }
  }'

# Create virtual key for team
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-eng-team \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-engineering",
    "budget": {
      "max_limit": 500.00,
      "reset_duration": "1M"
    },
    "rate_limit": {
      "request_max_limit": 1000,
      "request_reset_duration": "1h",
      "token_max_limit": 500000,
      "token_reset_duration": "1h"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Per-User Budget (Developer: $50/month):

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-engineering",
    "budget": {
      "max_limit": 50.00,
      "reset_duration": "1M"
    },
    "rate_limit": {
      "request_max_limit": 100,
      "request_reset_duration": "1h"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Step 4: Configure Claude Code to Use Bifrost

Set Environment Variables:

export ANTHROPIC_API_KEY="vk-dev-alice"  # Virtual key, not direct API key
export ANTHROPIC_BASE_URL="http://localhost:8080"
Enter fullscreen mode Exit fullscreen mode

Or create .env file:

ANTHROPIC_API_KEY=vk-dev-alice
ANTHROPIC_BASE_URL=http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 5: Use Claude Code Normally

# Claude Code now routes through Bifrost
claude-code "create a REST API for user management"

# All requests governed by virtual key rules:
# - Budget checked
# - Rate limits enforced
# - Full audit logging
# - Cost tracking
Enter fullscreen mode Exit fullscreen mode

Governance Features

Hierarchical Budget Enforcement

Budget Hierarchy (all checked for every request):

Customer: Acme Corp ($5,000/month)
    ↓
Team: Engineering ($500/month)
    ↓
User: Alice ($50/month)
Enter fullscreen mode Exit fullscreen mode

Budget Checking Flow:

  1. ✅ Check user budget ($48/$50 remaining)
  2. ✅ Check team budget ($450/$500 remaining)
  3. ✅ Check customer budget ($4,800/$5,000 remaining)
  4. Request proceeds (all budgets pass)
  5. After request ($2 cost):
    • User: $50/$50
    • Team: $452/$500
    • Customer: $4,802/$5,000

Next Request: Blocked (user budget exceeded)

Error response:

{
  "error": {
    "type": "budget_exceeded",
    "message": "Budget exceeded: VK budget exceeded: 50.00 > 50.00 dollars"
  }
}
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

Per-User Rate Limits:

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limit": {
      "request_max_limit": 100,
      "request_reset_duration": "1h",
      "token_max_limit": 50000,
      "token_reset_duration": "1h"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Behavior:

  • Max 100 requests per hour
  • Max 50,000 tokens per hour
  • Exceeding either triggers 429 error

Error Response:

{
  "error": {
    "type": "rate_limited",
    "message": "Rate limits exceeded: [request limit exceeded (101/100, resets every 1h)]"
  }
}
Enter fullscreen mode Exit fullscreen mode

Model Access Control

Restrict to Specific Models:

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-dev-alice \
  -H "Content-Type: application/json" \
  -d '{
    "provider_configs": [
      {
        "provider": "anthropic",
        "allowed_models": ["claude-3-5-haiku-20241022"]
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Behavior: Requests for claude-3-opus-4-20240229 blocked with 403 error


Observability and Visibility

Built-in Dashboard

Access: http://localhost:8080

Real-Time Visibility:

  • Request logs (prompt, response, tokens, cost)
  • Cost tracking per user/team/customer
  • Rate limit utilization
  • Token usage trends
  • Latency distribution

Prometheus Metrics

Metrics Endpoint: http://localhost:8080/metrics

Key Metrics:

# Total cost by virtual key
sum(bifrost_cost_total) by (vk)

# Budget utilization
(budget_usage / budget_limit) by (vk)

# Requests per user
rate(bifrost_requests_total[5m]) by (vk)

# Token usage
sum(bifrost_tokens_total) by (vk, token_type)
Enter fullscreen mode Exit fullscreen mode

Alerting (Prometheus):

groups:
  - name: claude_code_budgets
    rules:
      - alert: UserBudgetNearLimit
        expr: (budget_usage{vk="vk-dev-alice"} / budget_limit{vk="vk-dev-alice"}) > 0.8
        labels:
          severity: warning
        annotations:
          summary: "Alice approaching budget limit (>80%)"

      - alert: TeamBudgetCritical
        expr: (team_budget_usage / team_budget_limit) > 0.9
        labels:
          severity: critical
        annotations:
          summary: "Engineering team 90% budget consumed"
Enter fullscreen mode Exit fullscreen mode

Complete Audit Trails

Request Logging:

Every Claude Code request logged with:

  • Virtual key used
  • User ID (via x-bf-user-id header)
  • Model requested
  • Token usage (input + output)
  • Cost calculated
  • Timestamp
  • Latency

Query Logs (via dashboard or API):

# Get all requests for user Alice
curl http://localhost:8080/api/logs?vk=vk-dev-alice

# Filter by date range
curl "http://localhost:8080/api/logs?vk=vk-dev-alice&start=2026-02-01&end=2026-02-28"
Enter fullscreen mode Exit fullscreen mode

Multi-Team Configuration

Scenario: Engineering + Data Science teams with separate budgets.

Configuration:

# Engineering team: $500/month, Claude 3.5 Haiku
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-eng-team \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-engineering",
    "budget": {"max_limit": 500.00, "reset_duration": "1M"},
    "provider_configs": [
      {
        "provider": "anthropic",
        "allowed_models": ["claude-3-5-haiku-20241022"]
      }
    ]
  }'

# Data Science team: $1,000/month, Claude Opus 4
curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-ds-team \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-data-science",
    "budget": {"max_limit": 1000.00, "reset_duration": "1M"},
    "provider_configs": [
      {
        "provider": "anthropic",
        "allowed_models": ["claude-opus-4-20240229"]
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Usage:

# Engineering developer
export ANTHROPIC_API_KEY="vk-eng-team"
claude-code "refactor this function"

# Data Science researcher
export ANTHROPIC_API_KEY="vk-ds-team"
claude-code "analyze this dataset"
Enter fullscreen mode Exit fullscreen mode

Cost Optimization

Semantic Caching

Enable Caching (40-60% cost reduction):

# Via Web UI: Features → Semantic Caching → Enable
Enter fullscreen mode Exit fullscreen mode

How It Works:

  • Similar prompts return cached responses
  • Example: "fix this bug" vs "debug this code"
  • Cache hit = no provider cost
  • Sub-millisecond response time

Impact:

  • 40-60% cost reduction for repetitive coding tasks
  • Faster responses (cached results)
  • Proportional budget savings

Multi-Provider Failover

Configuration:

curl -X PUT http://localhost:8080/api/governance/virtual-keys/vk-cost-optimized \
  -H "Content-Type: application/json" \
  -d '{
    "provider_configs": [
      {
        "provider": "anthropic",
        "weight": 0.8,
        "allowed_models": ["claude-3-5-haiku-20241022"]
      },
      {
        "provider": "openai",
        "weight": 0.2,
        "allowed_models": ["gpt-4o-mini"]
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Behavior: 80% Anthropic, 20% OpenAI (cost optimization)


Get Started

Install Bifrost:

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Configure Claude Code:

export ANTHROPIC_API_KEY="your-virtual-key"
export ANTHROPIC_BASE_URL="http://localhost:8080"
Enter fullscreen mode Exit fullscreen mode

Docs: https://getmax.im/bifrostdocs

GitHub: https://git.new/bifrost


Key Takeaway: Claude Code lacks enterprise governance (no budgets, rate limits, or visibility). Bifrost adds hierarchical budget controls (team/user/project levels), real-time rate limiting, complete audit trails, and unified observability—enabling safe Claude Code deployments in enterprise environments with per-user budgets, cost tracking, and Prometheus metrics.

Top comments (2)

Collapse
 
matthewhou profile image
Matthew Hou

The governance layer is the piece most teams skip until they get a surprise bill or a security incident.

Budget enforcement and rate limits are table stakes, but the audit trail is where the real value is. When an AI agent makes a change that breaks something three days later, you need to trace back: what was the prompt, what files were read, what was the full context. Without that, debugging AI-assisted code is archaeology.

One gap I see in most gateway approaches: they control the API layer but not the execution layer. An AI agent that has shell access can do damage that never touches the API. The Copilot CLI exploit this week (malware execution via allowlisted env command) is a perfect example — the API-level controls were fine, the command execution sandbox was missing.

Collapse
 
matthewhou profile image
Matthew Hou

This hits a gap that most AI coding discussions ignore: what happens when it's not just you and your terminal, but a team of 20 developers all running agents with different budgets, permissions, and context?

The governance angle matters because AI agents amplify not just individual productivity, but individual mistakes. One developer's misconfigured agent burning through API credits is annoying. An agent with production database access making a "helpful" schema change is a different category of problem.

I think the budget and rate limit controls are table stakes, but the visibility piece is what actually changes behavior. When developers can see how their agent spends tokens and what it actually does (not just the final output), they start designing better prompts and better project structures. It's the same dynamic as CI — making failures visible and fast is what drives improvement.

The attention cost of managing AI tools is real and underestimated. Tools like this help by shifting that cost from "manually watch what the agent does" to "set guardrails and review dashboards."