Setting up an OpenAI API proxy typically requires configuring NGINX, managing SSL certificates, implementing retry logic, and setting up monitoring infrastructure. This tutorial shows how to deploy a production-ready OpenAI proxy in 30 seconds using Bifrost.
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Why Use an OpenAI API Proxy?
Cost optimization: Semantic caching reduces redundant API calls by 40-60%
Reliability: Automatic failover to backup providers when OpenAI experiences outages
Observability: Complete request/response logging, token usage tracking, cost attribution
Governance: Budget limits, rate limiting, team-based access control
Multi-provider: Route to Azure OpenAI, Anthropic, or other providers without code changes
Prerequisites
- Node.js 16+ (for NPX method) OR Docker (for container method)
- OpenAI API key
Method 1: NPX (Fastest)
Step 1: Install and Run Bifrost
npx -y @maximhq/bifrost
That's it. Bifrost is now running at http://localhost:8080.
Step 2: Open Web UI
Navigate to http://localhost:8080 in your browser.
Step 3: Add OpenAI API Key
- Click "Providers" in the sidebar
- Find "OpenAI" section
- Click "Add Key"
- Enter your OpenAI API key
- Click "Save"
Step 4: Test the Proxy
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "gpt-4o-mini",
"choices": [{
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
}
}]
}
Method 2: Docker
Step 1: Run Container
docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
Same as Method 1 steps 2-4.
For Configuration Persistence:
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
This mounts a local directory so configuration persists across container restarts.
Integrating with Your Application
Python (OpenAI SDK):
from openai import OpenAI
# Before: Direct OpenAI
# client = OpenAI(api_key="sk-...")
# After: Through Bifrost proxy
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-openai-key" # Or any placeholder if using Web UI config
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Node.js (OpenAI SDK):
import OpenAI from 'openai';
// Before: Direct OpenAI
// const client = new OpenAI({ apiKey: 'sk-...' });
// After: Through Bifrost proxy
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'your-openai-key'
});
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(response.choices[0].message.content);
cURL:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Advanced Configuration
Multiple OpenAI API Keys (Load Balancing)
Distribute requests across multiple API keys to prevent rate limiting.
Via Web UI:
- Go to "Providers" → "OpenAI"
- Click "Add Key" multiple times
- Set weights for each key (e.g., 0.5, 0.5 for equal distribution)
Via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "openai-key-1",
"value": "sk-key1...",
"weight": 0.5
},
{
"name": "openai-key-2",
"value": "sk-key2...",
"weight": 0.5
}
]
}'
Custom Base URL (OpenAI-Compatible Endpoints)
Route to Azure OpenAI, self-hosted models, or other OpenAI-compatible endpoints.
Via Web UI:
- Go to "Providers" → "OpenAI" → "Advanced"
- Set "Base URL":
https://your-deployment.openai.azure.com - Save
Via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "azure-openai",
"value": "your-azure-key",
"weight": 1.0
}
],
"network_config": {
"base_url": "https://your-deployment.openai.azure.com"
}
}'
Retry Configuration
Configure exponential backoff for transient failures.
Via Web UI:
- Go to "Providers" → "OpenAI" → "Advanced"
- Set "Max Retries": 5
- Set "Initial Backoff": 1ms
- Set "Max Backoff": 10000ms
- Save
Via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "openai-key-1",
"value": "sk-...",
"weight": 1.0
}
],
"network_config": {
"max_retries": 5,
"retry_backoff_initial_ms": 1,
"retry_backoff_max_ms": 10000
}
}'
Request Timeout
Set custom timeouts for long-running requests.
Via Web UI:
- Go to "Providers" → "OpenAI" → "Advanced"
- Set "Timeout": 30 seconds
- Save
Via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "openai-key-1",
"value": "sk-...",
"weight": 1.0
}
],
"network_config": {
"default_request_timeout_in_seconds": 30
}
}'
Custom Headers
Pass custom headers to upstream providers.
Via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "openai-key-1",
"value": "sk-...",
"weight": 1.0
}
],
"network_config": {
"extra_headers": {
"x-user-id": "123",
"x-tenant-id": "acme-corp"
}
}
}'
HTTP Proxy Configuration
Route requests through corporate proxies.
Via Web UI:
- Go to "Providers" → "OpenAI" → "Proxy"
- Select "Proxy Type": HTTP or SOCKS5
- Set "Proxy URL":
http://proxy.company.com:8080 - Add credentials if needed
- Save
Via API:
curl -X POST http://localhost:8080/api/providers \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"keys": [
{
"name": "openai-key-1",
"value": "sk-...",
"weight": 1.0
}
],
"proxy_config": {
"type": "http",
"url": "http://proxy.company.com:8080",
"username": "user",
"password": "pass"
}
}'
Production Features
Semantic Caching (40-60% Cost Reduction)
Enable semantic caching to reduce redundant API calls.
Via Web UI:
- Go to "Features" → "Semantic Caching"
- Toggle "Enable Semantic Caching"
- Set "Similarity Threshold": 0.85 (0.8-0.95 recommended)
- Set "TTL": 300s (5 minutes)
- Save
How It Works:
# First request - hits OpenAI
response1 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What are your business hours?"}]
)
# Second request (similar) - returns cached response
response2 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "When are you open?"}]
)
# Returns cached response in <1ms, no API call to OpenAI
Virtual Keys (Team-Based Access Control)
Create separate API keys for different teams with custom budgets and rate limits.
Via Web UI:
- Go to "Virtual Keys"
- Click "Create Virtual Key"
- Set name: "team-frontend"
- Set budget: $100/month
- Set rate limit: 1000 requests/hour
- Save
Usage:
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="vk-team-frontend" # Use virtual key instead of provider key
)
Automatic Failover
Add backup providers for resilience.
Via Web UI:
- Go to "Providers"
- Add multiple providers (OpenAI, Azure OpenAI, Anthropic)
- Bifrost automatically creates fallback chains
Example:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# If OpenAI fails, automatically retries with Azure OpenAI
Monitoring and Observability
Built-in Dashboard
Navigate to http://localhost:8080:
- Real-time request logs
- Token usage per model
- Cost tracking per provider
- Latency visualization
- Error rates
Prometheus Metrics
Bifrost exposes metrics at http://localhost:8080/metrics:
curl http://localhost:8080/metrics
Key Metrics:
-
bifrost_requests_total: Total requests by provider/model -
bifrost_request_duration_seconds: Latency percentiles -
bifrost_tokens_total: Token usage (prompt/completion) -
bifrost_cost_total: Cost in USD
Example Prometheus Query:
# Request rate by model
rate(bifrost_requests_total[5m]) by (model)
# Average latency by provider
avg(bifrost_request_duration_seconds) by (provider)
# Total cost last hour
sum(increase(bifrost_cost_total[1h]))
OpenTelemetry Tracing
Bifrost supports OpenTelemetry for distributed tracing.
Configuration Options
Command-Line Flags
# Custom port
npx -y @maximhq/bifrost -port 3000
# Custom host
npx -y @maximhq/bifrost -host 0.0.0.0
# Debug logging
npx -y @maximhq/bifrost -log-level debug
# Pretty logs (not JSON)
npx -y @maximhq/bifrost -log-style pretty
# Custom data directory
npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
Docker Environment Variables
docker run -p 8080:8080 \
-e APP_PORT=8080 \
-e APP_HOST=0.0.0.0 \
-e LOG_LEVEL=info \
-e LOG_STYLE=json \
maximhq/bifrost
Configuration File (config.json)
For GitOps workflows, create config.json:
{
"providers": {
"openai": {
"keys": [
{
"name": "openai-key-1",
"value": "env.OPENAI_API_KEY",
"weight": 1.0
}
],
"network_config": {
"max_retries": 5,
"retry_backoff_initial_ms": 1,
"retry_backoff_max_ms": 10000
}
}
}
}
Run with:
npx -y @maximhq/bifrost -app-dir ./my-config
Common Use Cases
1. Cost Optimization
Setup: Enable semantic caching
Result: 40-60% reduction in API costs
2. High Availability
Setup: Configure OpenAI + Azure OpenAI with automatic failover
Result: 99.99% uptime through multi-provider redundancy
3. Multi-Team Governance
Setup: Create virtual keys per team with budgets
Result: Prevent cost overruns, track spend by team
4. Development vs Production
Setup: Separate virtual keys for dev (rate limited) and prod (high limits)
Result: Environment isolation enforced at infrastructure level
5. Compliance & Auditing
Setup: Self-hosted deployment with complete request logging
Result: Full audit trail, data never leaves your infrastructure
Troubleshooting
Issue: "Connection refused"
Solution: Ensure Bifrost is running at http://localhost:8080
Issue: "Invalid API key"
Solution: Check API key in Web UI → Providers → OpenAI
Issue: "Rate limited"
Solution: Add multiple API keys for load balancing
Issue: "Timeout errors"
Solution: Increase timeout in Advanced settings
Issue: "Cannot access Web UI"
Solution: Check firewall, ensure port 8080 is open
Next Steps
Enable semantic caching: Reduce costs by 40-60%
Add backup providers: Configure automatic failover to Azure/Anthropic
Set up virtual keys: Team-based budgets and access control
Integrate monitoring: Connect Prometheus/Grafana for metrics
Deploy to production: Kubernetes/Docker Compose for high availability
Resources
Documentation: https://getmax.im/bifrostdocs
GitHub: https://git.new/bifrost
Quick Links:
- Provider configuration: https://getmax.im/bifrostdocs (search "providers")
- Semantic caching: https://getmax.im/bifrostdocs (search "caching")
- Virtual keys: https://getmax.im/bifrostdocs (search "virtual keys")
Summary: Setting up an OpenAI API proxy with Bifrost takes 30 seconds (npx -y @maximhq/bifrost), provides 40-60% cost reduction through semantic caching, automatic failover for 99.99% uptime, complete observability with Prometheus metrics, and zero vendor lock-in with self-hosted deployment.

Top comments (0)