Pranay Batta

Posted on Feb 25

How to Set Up an OpenAI API Proxy with Bifrost in 30 Seconds

#ai #chatgpt #programming #openai

Setting up an OpenAI API proxy typically requires configuring NGINX, managing SSL certificates, implementing retry logic, and setting up monitoring infrastructure. This tutorial shows how to deploy a production-ready OpenAI proxy in 30 seconds using Bifrost.

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration…

View on GitHub

Why Use an OpenAI API Proxy?

Cost optimization: Semantic caching reduces redundant API calls by 40-60%

Reliability: Automatic failover to backup providers when OpenAI experiences outages

Observability: Complete request/response logging, token usage tracking, cost attribution

Governance: Budget limits, rate limiting, team-based access control

Multi-provider: Route to Azure OpenAI, Anthropic, or other providers without code changes

Prerequisites

Node.js 16+ (for NPX method) OR Docker (for container method)
OpenAI API key

Method 1: NPX (Fastest)

Setting Up - Bifrost

Get Bifrost running as an HTTP API gateway in 30 seconds with zero configuration. Perfect for any programming language.

docs.getbifrost.ai

Step 1: Install and Run Bifrost

npx -y @maximhq/bifrost

That's it. Bifrost is now running at http://localhost:8080.

Step 2: Open Web UI

Navigate to http://localhost:8080 in your browser.

Step 3: Add OpenAI API Key

Click "Providers" in the sidebar
Find "OpenAI" section
Click "Add Key"
Enter your OpenAI API key
Click "Save"

Step 4: Test the Proxy

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    }
  }]
}

Method 2: Docker

Step 1: Run Container

docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

Same as Method 1 steps 2-4.

For Configuration Persistence:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

This mounts a local directory so configuration persists across container restarts.

Integrating with Your Application

Python (OpenAI SDK):

from openai import OpenAI

# Before: Direct OpenAI
# client = OpenAI(api_key="sk-...")

# After: Through Bifrost proxy
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-openai-key"  # Or any placeholder if using Web UI config
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK):

import OpenAI from 'openai';

// Before: Direct OpenAI
// const client = new OpenAI({ apiKey: 'sk-...' });

// After: Through Bifrost proxy
const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'your-openai-key'
});

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(response.choices[0].message.content);

cURL:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Advanced Configuration

Multiple OpenAI API Keys (Load Balancing)

Distribute requests across multiple API keys to prevent rate limiting.

Via Web UI:

Go to "Providers" → "OpenAI"
Click "Add Key" multiple times
Set weights for each key (e.g., 0.5, 0.5 for equal distribution)

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-key1...",
        "weight": 0.5
      },
      {
        "name": "openai-key-2",
        "value": "sk-key2...",
        "weight": 0.5
      }
    ]
  }'

Custom Base URL (OpenAI-Compatible Endpoints)

Route to Azure OpenAI, self-hosted models, or other OpenAI-compatible endpoints.

Via Web UI:

Go to "Providers" → "OpenAI" → "Advanced"
Set "Base URL": https://your-deployment.openai.azure.com
Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "azure-openai",
        "value": "your-azure-key",
        "weight": 1.0
      }
    ],
    "network_config": {
      "base_url": "https://your-deployment.openai.azure.com"
    }
  }'

Retry Configuration

Configure exponential backoff for transient failures.

Via Web UI:

Go to "Providers" → "OpenAI" → "Advanced"
Set "Max Retries": 5
Set "Initial Backoff": 1ms
Set "Max Backoff": 10000ms
Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "network_config": {
      "max_retries": 5,
      "retry_backoff_initial_ms": 1,
      "retry_backoff_max_ms": 10000
    }
  }'

Request Timeout

Set custom timeouts for long-running requests.

Via Web UI:

Go to "Providers" → "OpenAI" → "Advanced"
Set "Timeout": 30 seconds
Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "network_config": {
      "default_request_timeout_in_seconds": 30
    }
  }'

Custom Headers

Pass custom headers to upstream providers.

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "network_config": {
      "extra_headers": {
        "x-user-id": "123",
        "x-tenant-id": "acme-corp"
      }
    }
  }'

HTTP Proxy Configuration

Route requests through corporate proxies.

Via Web UI:

Go to "Providers" → "OpenAI" → "Proxy"
Select "Proxy Type": HTTP or SOCKS5
Set "Proxy URL": http://proxy.company.com:8080
Add credentials if needed
Save

Via API:

curl -X POST http://localhost:8080/api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "keys": [
      {
        "name": "openai-key-1",
        "value": "sk-...",
        "weight": 1.0
      }
    ],
    "proxy_config": {
      "type": "http",
      "url": "http://proxy.company.com:8080",
      "username": "user",
      "password": "pass"
    }
  }'

Production Features

Semantic Caching (40-60% Cost Reduction)

Enable semantic caching to reduce redundant API calls.

Via Web UI:

Go to "Features" → "Semantic Caching"
Toggle "Enable Semantic Caching"
Set "Similarity Threshold": 0.85 (0.8-0.95 recommended)
Set "TTL": 300s (5 minutes)
Save

How It Works:

# First request - hits OpenAI
response1 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What are your business hours?"}]
)

# Second request (similar) - returns cached response
response2 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "When are you open?"}]
)
# Returns cached response in <1ms, no API call to OpenAI

Virtual Keys (Team-Based Access Control)

Create separate API keys for different teams with custom budgets and rate limits.

Via Web UI:

Go to "Virtual Keys"
Click "Create Virtual Key"
Set name: "team-frontend"
Set budget: $100/month
Set rate limit: 1000 requests/hour
Save

Usage:

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="vk-team-frontend"  # Use virtual key instead of provider key
)

Automatic Failover

Add backup providers for resilience.

Via Web UI:

Go to "Providers"
Add multiple providers (OpenAI, Azure OpenAI, Anthropic)
Bifrost automatically creates fallback chains

Example:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# If OpenAI fails, automatically retries with Azure OpenAI

Monitoring and Observability

Built-in Dashboard

Navigate to http://localhost:8080:

Real-time request logs
Token usage per model
Cost tracking per provider
Latency visualization
Error rates

Prometheus Metrics

Bifrost exposes metrics at http://localhost:8080/metrics:

curl http://localhost:8080/metrics

Key Metrics:

bifrost_requests_total: Total requests by provider/model
bifrost_request_duration_seconds: Latency percentiles
bifrost_tokens_total: Token usage (prompt/completion)
bifrost_cost_total: Cost in USD

Example Prometheus Query:

# Request rate by model
rate(bifrost_requests_total[5m]) by (model)

# Average latency by provider
avg(bifrost_request_duration_seconds) by (provider)

# Total cost last hour
sum(increase(bifrost_cost_total[1h]))

OpenTelemetry Tracing

Bifrost supports OpenTelemetry for distributed tracing.

Configuration Options

Command-Line Flags

# Custom port
npx -y @maximhq/bifrost -port 3000

# Custom host
npx -y @maximhq/bifrost -host 0.0.0.0

# Debug logging
npx -y @maximhq/bifrost -log-level debug

# Pretty logs (not JSON)
npx -y @maximhq/bifrost -log-style pretty

# Custom data directory
npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

Docker Environment Variables

docker run -p 8080:8080 \
  -e APP_PORT=8080 \
  -e APP_HOST=0.0.0.0 \
  -e LOG_LEVEL=info \
  -e LOG_STYLE=json \
  maximhq/bifrost

Configuration File (config.json)

For GitOps workflows, create config.json:

{
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-key-1",
          "value": "env.OPENAI_API_KEY",
          "weight": 1.0
        }
      ],
      "network_config": {
        "max_retries": 5,
        "retry_backoff_initial_ms": 1,
        "retry_backoff_max_ms": 10000
      }
    }
  }
}

Run with:

npx -y @maximhq/bifrost -app-dir ./my-config

Common Use Cases

1. Cost Optimization

Setup: Enable semantic caching

Result: 40-60% reduction in API costs

2. High Availability

Setup: Configure OpenAI + Azure OpenAI with automatic failover

Result: 99.99% uptime through multi-provider redundancy

3. Multi-Team Governance

Setup: Create virtual keys per team with budgets

Result: Prevent cost overruns, track spend by team

4. Development vs Production

Setup: Separate virtual keys for dev (rate limited) and prod (high limits)

Result: Environment isolation enforced at infrastructure level

5. Compliance & Auditing

Setup: Self-hosted deployment with complete request logging

Result: Full audit trail, data never leaves your infrastructure

Troubleshooting

Issue: "Connection refused"

Solution: Ensure Bifrost is running at http://localhost:8080

Issue: "Invalid API key"

Solution: Check API key in Web UI → Providers → OpenAI

Issue: "Rate limited"

Solution: Add multiple API keys for load balancing

Issue: "Timeout errors"

Solution: Increase timeout in Advanced settings

Issue: "Cannot access Web UI"

Solution: Check firewall, ensure port 8080 is open

Next Steps

Enable semantic caching: Reduce costs by 40-60%

Add backup providers: Configure automatic failover to Azure/Anthropic

Set up virtual keys: Team-based budgets and access control

Integrate monitoring: Connect Prometheus/Grafana for metrics

Deploy to production: Kubernetes/Docker Compose for high availability

Resources

Documentation: https://getmax.im/bifrostdocs

GitHub: https://git.new/bifrost

Quick Links:

Provider configuration: https://getmax.im/bifrostdocs (search "providers")
Semantic caching: https://getmax.im/bifrostdocs (search "caching")
Virtual keys: https://getmax.im/bifrostdocs (search "virtual keys")

Summary: Setting up an OpenAI API proxy with Bifrost takes 30 seconds (npx -y @maximhq/bifrost), provides 40-60% cost reduction through semantic caching, automatic failover for 99.99% uptime, complete observability with Prometheus metrics, and zero vendor lock-in with self-hosted deployment.