DEV Community

Pranay Batta
Pranay Batta

Posted on

How to Set Up an OpenAI API Proxy with Bifrost in 30 Seconds

Setting up an OpenAI API proxy typically requires configuring NGINX, managing SSL certificates, implementing retry logic, and setting up monitoring infrastructure. This tutorial shows how to deploy a production-ready OpenAI proxy in 30 seconds using Bifrost.

GitHub logo maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

Go Report Card Discord badge Known Vulnerabilities codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration…


Why Use an OpenAI API Proxy?

Cost optimization: Semantic caching reduces redundant API calls by 40-60%

Reliability: Automatic failover to backup providers when OpenAI experiences outages

Observability: Complete request/response logging, token usage tracking, cost attribution

Governance: Budget limits, rate limiting, team-based access control

Multi-provider: Route to Azure OpenAI, Anthropic, or other providers without code changes


Prerequisites

  • Node.js 16+ (for NPX method) OR Docker (for container method)
  • OpenAI API key

Method 1: NPX (Fastest)

Setting Up - Bifrost

Get Bifrost running as an HTTP API gateway in 30 seconds with zero configuration. Perfect for any programming language.

favicon docs.getbifrost.ai

Step 1: Install and Run Bifrost

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That's it. Bifrost is now running at http://localhost:8080.

Step 2: Open Web UI

Navigate to http://localhost:8080 in your browser.

Step 3: Add OpenAI API Key

  1. Click "Providers" in the sidebar
  2. Find "OpenAI" section
  3. Click "Add Key"
  4. Enter your OpenAI API key
  5. Click "Save"

Step 4: Test the Proxy

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    }
  }]
}
Enter fullscreen mode Exit fullscreen mode

Method 2: Docker

Step 1: Run Container

docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

Same as Method 1 steps 2-4.

For Configuration Persistence:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

This mounts a local directory so configuration persists across container restarts.


Integrating with Your Application

Python (OpenAI SDK):

from openai import OpenAI

# Before: Direct OpenAI
# client = OpenAI(api_key="sk-...")

# After: Through Bifrost proxy
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-openai-key"  # Or any placeholder if using Web UI config
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Node.js (OpenAI SDK):

import OpenAI from 'openai';

// Before: Direct OpenAI
// const client = new OpenAI({ apiKey: 'sk-...' });

// After: Through Bifrost proxy
const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'your-openai-key'
});

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

cURL:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

Advanced Configuration

Multiple OpenAI API Keys (Load Balancing)

Distribute requests across multiple API keys to prevent rate limiting.

Via Web UI:

  1. Go to "Providers" → "OpenAI"
  2. Click "Add Key" multiple times
  3. Set weights for each key (e.g., 0.5, 0.5 for equal distribution)

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-key1...",
        "weight": 0.5
      },
      {
        "name": "openai-key-2",
        "value": "sk-key2...",
        "weight": 0.5
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Custom Base URL (OpenAI-Compatible Endpoints)

Route to Azure OpenAI, self-hosted models, or other OpenAI-compatible endpoints.

Via Web UI:

  1. Go to "Providers" → "OpenAI" → "Advanced"
  2. Set "Base URL": https://your-deployment.openai.azure.com
  3. Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "azure-openai",
        "value": "your-azure-key",
        "weight": 1.0
      }
    ],
    "network_config": {
      "base_url": "https://your-deployment.openai.azure.com"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Retry Configuration

Configure exponential backoff for transient failures.

Via Web UI:

  1. Go to "Providers" → "OpenAI" → "Advanced"
  2. Set "Max Retries": 5
  3. Set "Initial Backoff": 1ms
  4. Set "Max Backoff": 10000ms
  5. Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "network_config": {
      "max_retries": 5,
      "retry_backoff_initial_ms": 1,
      "retry_backoff_max_ms": 10000
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Request Timeout

Set custom timeouts for long-running requests.

Via Web UI:

  1. Go to "Providers" → "OpenAI" → "Advanced"
  2. Set "Timeout": 30 seconds
  3. Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "network_config": {
      "default_request_timeout_in_seconds": 30
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Custom Headers

Pass custom headers to upstream providers.

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "network_config": {
      "extra_headers": {
        "x-user-id": "123",
        "x-tenant-id": "acme-corp"
      }
    }
  }'
Enter fullscreen mode Exit fullscreen mode

HTTP Proxy Configuration

Route requests through corporate proxies.

Via Web UI:

  1. Go to "Providers" → "OpenAI" → "Proxy"
  2. Select "Proxy Type": HTTP or SOCKS5
  3. Set "Proxy URL": http://proxy.company.com:8080
  4. Add credentials if needed
  5. Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "proxy_config": {
      "type": "http",
      "url": "http://proxy.company.com:8080",
      "username": "user",
      "password": "pass"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Production Features

Semantic Caching (40-60% Cost Reduction)

Enable semantic caching to reduce redundant API calls.

Via Web UI:

  1. Go to "Features" → "Semantic Caching"
  2. Toggle "Enable Semantic Caching"
  3. Set "Similarity Threshold": 0.85 (0.8-0.95 recommended)
  4. Set "TTL": 300s (5 minutes)
  5. Save

How It Works:

# First request - hits OpenAI
response1 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What are your business hours?"}]
)

# Second request (similar) - returns cached response
response2 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "When are you open?"}]
)
# Returns cached response in <1ms, no API call to OpenAI
Enter fullscreen mode Exit fullscreen mode

Virtual Keys (Team-Based Access Control)

Create separate API keys for different teams with custom budgets and rate limits.

Via Web UI:

  1. Go to "Virtual Keys"
  2. Click "Create Virtual Key"
  3. Set name: "team-frontend"
  4. Set budget: $100/month
  5. Set rate limit: 1000 requests/hour
  6. Save

Usage:

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="vk-team-frontend"  # Use virtual key instead of provider key
)
Enter fullscreen mode Exit fullscreen mode

Automatic Failover

Add backup providers for resilience.

Via Web UI:

  1. Go to "Providers"
  2. Add multiple providers (OpenAI, Azure OpenAI, Anthropic)
  3. Bifrost automatically creates fallback chains

Example:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# If OpenAI fails, automatically retries with Azure OpenAI
Enter fullscreen mode Exit fullscreen mode

Monitoring and Observability

Built-in Dashboard

Navigate to http://localhost:8080:

  • Real-time request logs
  • Token usage per model
  • Cost tracking per provider
  • Latency visualization
  • Error rates

Prometheus Metrics

Bifrost exposes metrics at http://localhost:8080/metrics:

curl http://localhost:8080/metrics
Enter fullscreen mode Exit fullscreen mode

Key Metrics:

  • bifrost_requests_total: Total requests by provider/model
  • bifrost_request_duration_seconds: Latency percentiles
  • bifrost_tokens_total: Token usage (prompt/completion)
  • bifrost_cost_total: Cost in USD

Example Prometheus Query:

# Request rate by model
rate(bifrost_requests_total[5m]) by (model)

# Average latency by provider
avg(bifrost_request_duration_seconds) by (provider)

# Total cost last hour
sum(increase(bifrost_cost_total[1h]))
Enter fullscreen mode Exit fullscreen mode

OpenTelemetry Tracing

Bifrost supports OpenTelemetry for distributed tracing.


Configuration Options

Command-Line Flags

# Custom port
npx -y @maximhq/bifrost -port 3000

# Custom host
npx -y @maximhq/bifrost -host 0.0.0.0

# Debug logging
npx -y @maximhq/bifrost -log-level debug

# Pretty logs (not JSON)
npx -y @maximhq/bifrost -log-style pretty

# Custom data directory
npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
Enter fullscreen mode Exit fullscreen mode

Docker Environment Variables

docker run -p 8080:8080 \
  -e APP_PORT=8080 \
  -e APP_HOST=0.0.0.0 \
  -e LOG_LEVEL=info \
  -e LOG_STYLE=json \
  maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Configuration File (config.json)

For GitOps workflows, create config.json:

{
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-key-1",
          "value": "env.OPENAI_API_KEY",
          "weight": 1.0
        }
      ],
      "network_config": {
        "max_retries": 5,
        "retry_backoff_initial_ms": 1,
        "retry_backoff_max_ms": 10000
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Run with:

npx -y @maximhq/bifrost -app-dir ./my-config
Enter fullscreen mode Exit fullscreen mode

Common Use Cases

1. Cost Optimization

Setup: Enable semantic caching

Result: 40-60% reduction in API costs

2. High Availability

Setup: Configure OpenAI + Azure OpenAI with automatic failover

Result: 99.99% uptime through multi-provider redundancy

3. Multi-Team Governance

Setup: Create virtual keys per team with budgets

Result: Prevent cost overruns, track spend by team

4. Development vs Production

Setup: Separate virtual keys for dev (rate limited) and prod (high limits)

Result: Environment isolation enforced at infrastructure level

5. Compliance & Auditing

Setup: Self-hosted deployment with complete request logging

Result: Full audit trail, data never leaves your infrastructure


Troubleshooting

Issue: "Connection refused"

Solution: Ensure Bifrost is running at http://localhost:8080

Issue: "Invalid API key"

Solution: Check API key in Web UI → Providers → OpenAI

Issue: "Rate limited"

Solution: Add multiple API keys for load balancing

Issue: "Timeout errors"

Solution: Increase timeout in Advanced settings

Issue: "Cannot access Web UI"

Solution: Check firewall, ensure port 8080 is open


Next Steps

Enable semantic caching: Reduce costs by 40-60%

Add backup providers: Configure automatic failover to Azure/Anthropic

Set up virtual keys: Team-based budgets and access control

Integrate monitoring: Connect Prometheus/Grafana for metrics

Deploy to production: Kubernetes/Docker Compose for high availability


Resources

Documentation: https://getmax.im/bifrostdocs

GitHub: https://git.new/bifrost

Quick Links:


Summary: Setting up an OpenAI API proxy with Bifrost takes 30 seconds (npx -y @maximhq/bifrost), provides 40-60% cost reduction through semantic caching, automatic failover for 99.99% uptime, complete observability with Prometheus metrics, and zero vendor lock-in with self-hosted deployment.

Top comments (0)