Pranay Batta

Posted on Jan 13

We Built AI Governance That Stops Cost Explosions (Here's How)

#webdev #programming #ai #openai

Enterprise customers started asking the same questions: "How do we prove compliance with SOC 2?" "Can we prevent PII from leaking to AI providers?" "How do we track who accessed which models?"

Traditional API gateways weren't built for these questions. They log HTTP requests, but AI governance requires understanding prompts, tracking sensitive data, and enforcing content policies.

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

View on GitHub

We needed to build governance into Bifrost's core, not bolt it on later.

The Three Pillars of AI Governance

1. Access Control (Who Can Do What)

Virtual Keys as Identity
Every request authenticates with a virtual key (x-bf-vk header). Virtual keys aren't just API keys. They're governance primitives that define:

Which models can be accessed (GPT-4 vs GPT-3.5)
Which providers are allowed (OpenAI only vs multi-provider)
Budget and rate limits
Team/customer attachment for attribution

Example configuration:

{
  "name": "engineering-prod",
  "provider_configs": [{
    "provider": "openai",
    "allowed_models": ["gpt-4o"],
    "weight": 0.7
  }, {
    "provider": "anthropic",
    "allowed_models": ["claude-3-5-sonnet-20241022"],
    "weight": 0.3
  }],
  "team_id": "team-eng-001",
  "budget": { "max_limit": 1000.00, "reset_duration": "1M" }
}

Model and Provider Filtering
Virtual keys restrict access at the model level. Marketing team gets GPT-3.5, engineering gets GPT-4. One team can't access another's quota.

If a key tries accessing a blocked model, the request fails immediately with HTTP 403.

2. Audit Logging (Complete Visibility)

Enterprise compliance (SOC 2, GDPR, HIPAA, ISO 27001) requires immutable audit trails. We log everything security-relevant:

Authentication Events

Login attempts (successful/failed)
MFA verification
Session creation/expiration
Account lockouts

Authorization Events

Model access attempts
Virtual key usage
Budget limit checks
Permission denials

Configuration Changes

Virtual key creation/modification/deletion
Team/customer updates
Budget adjustments
Provider key rotations

Security Events

Prompt injection attempts
Jailbreak attempts
API key abuse
Suspicious access patterns
Guardrail violations

Data Access Events

PII detection and handling
Sensitive configuration access
Log exports

All logs are cryptographically verified for tamper-proof trails. Retention policies are configurable (we default to 365 days with 90-day archiving).

Configuration:

{
  "enterprise": {
    "audit_logs": {
      "enabled": true,
      "retention": {
        "duration": "365d",
        "archive_after": "90d"
      },
      "immutability": {
        "enabled": true,
        "verification_method": "cryptographic_hash"
      }
    }
  }
}

Logs export to SIEM systems (Splunk, Datadog, Elastic) for centralized security monitoring.

3. Content Safety (Guardrails)

Governance isn't just about who accesses what. It's about what content flows through the system.

Guardrail Integration
Bifrost integrates with AWS Bedrock Guardrails, Azure Content Safety, and Patronus AI for real-time content validation.

What Guardrails Detect:

Hate speech, violence, sexual content
Prompt injection and jailbreak attempts
PII (50+ types: SSN, credit cards, health records, emails, IPs)
Denied topics and custom word filters
Factual grounding verification

Dual-Stage Validation

Input validation: Check prompts before sending to AI providers
Output validation: Check responses before returning to applications

When violations are detected, requests can be blocked, content redacted, or flagged for review.

Example AWS Bedrock integration:

{
  "guardrails": {
    "provider": "bedrock",
    "guardrail_id": "your-guardrail-id",
    "validate_input": true,
    "validate_output": true,
    "pii_detection": {
      "enabled": true,
      "action": "redact"
    }
  }
}

Integration with Enterprise Auth

SSO Support
Bifrost supports Google and GitHub SSO out of the box. Users authenticate through existing identity providers rather than managing separate credentials.

HashiCorp Vault Integration
Provider API keys are stored in Vault, not configuration files. Bifrost fetches keys at runtime, rotating them without downtime.

Self-Hosted Deployment
For compliance requiring data residency, Bifrost deploys entirely in your VPC. No data leaves your infrastructure.

MCP Governance (Tool Access Control)

AI agents use Model Context Protocol (MCP) to access tools. Without governance, agents can execute any tool.

Bifrost provides centralized MCP governance:

Filter which tools are available per virtual key
Track tool usage in audit logs
Enforce permissions per tool
Monitor agent behavior across tool interactions

Example MCP filtering:

{
  "mcp_configs": [{
    "mcp_client_name": "filesystem",
    "tools_to_execute": ["read_file", "list_directory"]
  }]
}

This virtual key can read files but not write/delete.

What Governance Looks Like in Practice

Scenario: Healthcare AI Application

Requirements:

HIPAA compliance
PII must be redacted
Only specific teams access patient data
Complete audit trail

Implementation:

Enable audit logging with 365-day retention
Configure Bedrock Guardrails for PII detection/redaction
Create virtual keys per team with model restrictions
Deploy in VPC for data residency
Export logs to SIEM for security monitoring

Result: Full compliance with minimal application changes.

Performance Impact

Governance adds overhead. We optimized to keep it minimal:

Audit logging: Asynchronous writes, no request blocking
Virtual key lookup: In-memory cache with sub-microsecond access
Guardrail validation: Parallel execution, configurable timeouts
Total overhead: 11 microseconds at 5,000 RPS (includes all governance checks)

Key Design Principles

1. Governance at the Gateway Layer
Don't push governance to applications. Enforce at infrastructure level where it can't be bypassed.

2. Immutable Audit Trails
Logs must be tamper-proof for compliance. Cryptographic verification prevents modification.

3. Real-Time Enforcement
Block violations immediately, don't just log them. Prevention is better than detection.

4. Zero Trust Architecture
Every request proves identity and authorization. No implicit trust.

Try It

# Deploy Bifrost
npx -y @maximhq/bifrost

# Enable governance
curl -X PUT http://localhost:8080/api/config \
  -H "Content-Type: application/json" \
  -d '{"client_config": {"enable_governance": true}}'

# Create governed virtual key
curl -X POST http://localhost:8080/api/governance/virtual-keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "prod-api",
    "provider_configs": [{
      "provider": "openai",
      "allowed_models": ["gpt-4o"]
    }],
    "budget": {"max_limit": 100.00, "reset_duration": "1M"}
  }'

GitHub | Docs

DEV Community