Pranay Batta

Posted on Apr 20

How to Govern Claude Code Usage Across Engineering Teams

#claude #ai #devops #llm

Claude Code is powerful; maybe too powerful to run without guardrails.

I came across a case where a mid-sized startup had three engineering teams adopt it independently. Within two weeks, their bill hit $4,200. No breakdown of who spent what, no audit trail, no rate limits—just usage piling up and a growing invoice.

If your org is adopting Claude Code, you need centralized governance. I tested Bifrost as an AI gateway layer to solve exactly this. Here is how I set it up.

The Problem: Ungoverned Claude Code

Claude Code runs locally on each developer's machine. Every developer has their own API key. This means:

No visibility into per-developer or per-team spend
No rate limiting. One runaway agent loop burns through your budget
No audit trail of what tools were called, what code was generated
No control over which MCP tools Claude Code can access
No way to enforce org-wide policies

You need a proxy layer between Claude Code and the LLM provider. That is what an AI gateway does.

Setting Up Bifrost as Your Claude Code Gateway

Bifrost is a Go-based AI gateway with 11 microsecond latency overhead. Deploy it with a single command:

npx @anthropic-ai/bifrost

Or via Docker:

docker run -p 8080:8080 ghcr.io/maximhq/bifrost:latest

Point Claude Code at it by setting the base URL in your Claude Code config:

{
  "apiBaseUrl": "http://localhost:8080/v1",
  "apiKey": "vk_team_frontend_abc123"
}

That apiKey is not an Anthropic key. It is a Bifrost virtual key. This is where governance starts.

Virtual Keys: Per-Developer Access Control

Virtual keys let you issue scoped credentials to each developer or team. Each virtual key maps to an underlying provider key but adds access controls on top.

Create a virtual key per team:

# bifrost.yaml
virtual_keys:
  - id: "vk_team_frontend"
    name: "Frontend Team"
    provider_config: "anthropic_prod"
    allowed_models:
      - "claude-sonnet-4-20250514"
    rate_limit:
      requests_per_minute: 60
      tokens_per_minute: 100000

  - id: "vk_team_backend"
    name: "Backend Team"
    provider_config: "anthropic_prod"
    allowed_models:
      - "claude-sonnet-4-20250514"
      - "claude-opus-4-20250514"
    rate_limit:
      requests_per_minute: 120
      tokens_per_minute: 200000

  - id: "vk_dev_rahul"
    name: "Rahul - Backend"
    provider_config: "anthropic_prod"
    allowed_models:
      - "claude-sonnet-4-20250514"
    rate_limit:
      requests_per_minute: 30
      tokens_per_minute: 50000

Each developer gets their own virtual key. They never see the actual Anthropic API key. You revoke access by deleting the virtual key. No key rotation needed on the provider side.

Check the virtual keys documentation for tool-level scoping options.

Budget Hierarchy: Cap Spend at Every Level

Bifrost supports a four-tier budget hierarchy: Customer, Team, Virtual Key, and Provider Config. This maps cleanly to engineering org structures.

budgets:
  org_level:
    monthly_limit_usd: 10000

  teams:
    - name: "frontend"
      monthly_limit_usd: 2000
      alert_threshold: 0.8

    - name: "backend"
      monthly_limit_usd: 4000
      alert_threshold: 0.8

    - name: "ml_platform"
      monthly_limit_usd: 3000
      alert_threshold: 0.8

  virtual_key_overrides:
    - id: "vk_dev_rahul"
      daily_limit_usd: 50

When a team hits 80% of their budget, you get an alert. When they hit 100%, requests get blocked. No more surprise bills.

The daily limit on individual virtual keys is useful for catching runaway Claude Code agent loops. If a developer accidentally triggers an infinite tool-call cycle, it burns through $50 and stops. Not $500.

Audit Logging: Track Every Tool Call

This is the part that convinced me. Bifrost logs every request with granular detail. For MCP tool calls specifically, you get:

Tool name
Server name
Arguments passed
Results returned
Latency per call
Virtual key ID (so you know which developer triggered it)

Check per-tool audit logging docs for the full schema.

Query logs to answer questions like:

Which developer made the most LLM calls this week?
What tools is the frontend team using in Claude Code?
How much did code generation cost per team last month?
Are any developers hitting rate limits frequently?

This is the audit trail you need for SOC 2 compliance and internal cost attribution.

Rate Limiting: Prevent Runaway Usage

I already showed rate limits in the virtual key config. But let me explain why this matters specifically for Claude Code.

Claude Code in agent mode can make dozens of LLM calls per task. A single "refactor this module" command might trigger 15-20 API calls. Without rate limits, one developer running complex refactors back-to-back can consume your entire daily budget in an hour.

Set conservative limits per developer:

rate_limit:
  requests_per_minute: 30
  tokens_per_minute: 50000
  concurrent_requests: 3

This still allows normal Claude Code usage. But it prevents the scenario where someone kicks off a massive agent task and walks away.

Bifrost handles rate limiting at the gateway level with sub-millisecond overhead. The developer gets a clear 429 response. Claude Code handles these gracefully with built-in retry logic.

MCP Gateway: Control Which Tools Claude Code Can Access

This is the governance layer that most teams miss. Claude Code can connect to MCP servers that expose file system access, database queries, deployment tools. You need to control which tools each team can use.

Bifrost acts as an MCP gateway. You expose a single /mcp endpoint and control tool access per virtual key.

mcp:
  servers:
    - name: "filesystem"
      url: "http://localhost:9001"
      allowed_tools:
        - "read_file"
        - "write_file"
        - "list_directory"

    - name: "database"
      url: "http://localhost:9002"
      allowed_tools:
        - "query"
        - "describe_table"

    - name: "deployment"
      url: "http://localhost:9003"
      allowed_tools:
        - "deploy_staging"
        - "rollback"

  virtual_key_permissions:
    "vk_team_frontend":
      - "filesystem"
    "vk_team_backend":
      - "filesystem"
      - "database"
    "vk_team_platform":
      - "filesystem"
      - "database"
      - "deployment"

Frontend developers cannot accidentally trigger deployments through Claude Code. Backend developers cannot access deployment tools. Only the platform team gets full access.

Bifrost's MCP support includes Code Mode with 50%+ token reduction and sub-3ms latency. So you get governance without performance penalties.

Putting It All Together

Here is the minimal setup to govern Claude Code across a 20-person engineering team:

Deploy Bifrost (single binary, zero config)
Create virtual keys per developer
Set budget limits per team and per developer
Configure rate limits
Route MCP tools through the gateway with per-team permissions
Point each developer's Claude Code config at the gateway

Total setup time when I did this: about 45 minutes. Most of that was deciding on budget allocations.

The Bifrost docs cover each of these in detail. The GitHub repo has example configs for common setups.

What I Would Do Differently

After running this for a week, a few notes:

Start with generous rate limits and tighten based on actual usage data. Too strict and developers complain.
Set daily limits, not just monthly. Monthly limits let someone blow the budget on day 1.
Review audit logs weekly. You will find patterns. Some developers are 10x more efficient with Claude Code than others. Share what works.
Use separate virtual keys for Claude Code vs other AI tools. Makes cost attribution cleaner.

Bottom Line

Claude Code without governance is a liability. With a gateway layer, it becomes a controlled, auditable, budget-safe tool. Bifrost handles this at 11 microsecond overhead, so your developers do not notice the proxy.

The alternative is waiting for the bill shock. I have seen it happen. Set up governance before you scale Claude Code to your full team.

Bifrost GitHub | Documentation | Website