AI API Gateway
Stop hard-coding provider-specific API calls throughout your codebase. This gateway gives you a single unified interface to OpenAI, Anthropic, Google, Mistral, and local models — with automatic fallback routing, response caching, rate limiting, and real-time usage analytics. Switch providers, manage costs, and add resilience without changing a single line of application code.
Key Features
- Unified API Interface — One consistent request/response format across OpenAI, Anthropic, Google Gemini, Mistral, and Ollama
- Automatic Fallback Routing — Define provider priority chains; if the primary provider fails or hits rate limits, requests route to the next provider seamlessly
- Response Caching — Cache identical prompts with configurable TTL to slash costs on repeated queries (Redis or in-memory)
- Rate Limiting — Per-user, per-model, and global rate limits with token bucket algorithm
- Usage Analytics Dashboard — Track tokens, latency, cost, and error rates per provider/model/user in real time
- Request/Response Middleware — Plug in custom transforms (PII scrubbing, logging, prompt injection detection) as middleware
- Streaming Support — Full SSE streaming passthrough with provider-agnostic event format
- API Key Rotation — Rotate provider API keys without downtime via hot-reload configuration
Quick Start
from ai_gateway import Gateway, Provider
# 1. Configure providers
gateway = Gateway(
providers=[
Provider(
name="openai",
api_key="YOUR_OPENAI_KEY_HERE",
models=["gpt-4o", "gpt-4o-mini"],
priority=1,
),
Provider(
name="anthropic",
api_key="YOUR_ANTHROPIC_KEY_HERE",
models=["claude-sonnet-4-20250514"],
priority=2, # Fallback when OpenAI is unavailable
),
],
cache_backend="redis",
cache_ttl=3600,
)
# 2. Make requests — same interface regardless of provider
response = gateway.chat(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain quantum computing in one paragraph."}],
max_tokens=200,
)
print(response.content)
print(f"Provider: {response.provider}, Cost: ${response.cost:.4f}")
Architecture
Client Request
│
▼
┌──────────────┐
│ Rate Limiter │──── Reject (429) if over limit
└──────┬───────┘
▼
┌──────────────┐
│ Cache Check │──── Return cached response if hit
└──────┬───────┘
▼
┌──────────────┐
│ Middleware │──── Pre-process (PII scrub, logging, validation)
│ Pipeline │
└──────┬───────┘
▼
┌──────────────┐ ┌──────────┐
│ Router │────▶│Provider A│──── Success ──▶ Response
│ │ └──────────┘
│ │──── Failure ────▶┌──────────┐
│ │ │Provider B│──── Fallback
│ │ └──────────┘
└──────┬───────┘
▼
┌──────────────┐
│ Analytics │──── Log tokens, latency, cost, errors
└──────────────┘
Usage Examples
Fallback Routing with Cost Controls
from ai_gateway import Gateway, RoutingPolicy
gateway = Gateway(
routing=RoutingPolicy(
primary="openai/gpt-4o",
fallbacks=["anthropic/claude-sonnet-4-20250514", "mistral/mistral-large"],
fallback_on=["rate_limit", "timeout", "server_error"],
max_cost_per_request=0.05, # Skip expensive models if budget exceeded
)
)
Streaming Responses
for chunk in gateway.chat_stream(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku about Python."}],
):
print(chunk.delta, end="", flush=True)
Custom Middleware
from ai_gateway.middleware import Middleware
class PIIScrubber(Middleware):
"""Remove PII from prompts before sending to providers."""
def pre_request(self, request):
for msg in request.messages:
msg["content"] = self.redact_emails(msg["content"])
msg["content"] = self.redact_phones(msg["content"])
return request
gateway.add_middleware(PIIScrubber())
Usage Analytics
stats = gateway.analytics.summary(period="24h")
print(f"Total requests: {stats.total_requests}")
print(f"Total cost: ${stats.total_cost:.2f}")
print(f"Avg latency: {stats.avg_latency_ms:.0f}ms")
print(f"Cache hit rate: {stats.cache_hit_rate:.1%}")
for provider in stats.by_provider:
print(f" {provider.name}: {provider.requests} reqs, {provider.error_rate:.1%} errors")
Configuration
# gateway_config.yaml
providers:
openai:
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.example.com/v1/"
models: ["gpt-4o", "gpt-4o-mini"]
timeout_seconds: 30
max_retries: 2
priority: 1
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
models: ["claude-sonnet-4-20250514"]
timeout_seconds: 60
priority: 2
rate_limiting:
global_rpm: 1000 # Requests per minute across all users
per_user_rpm: 60
per_model_rpm: 500
algorithm: "token_bucket"
cache:
backend: "redis" # redis | memory | disabled
redis_url: "redis://localhost:6379/0"
ttl_seconds: 3600
max_cache_size_mb: 512
hash_strategy: "content" # content | content+model | full_request
analytics:
enabled: true
storage: "sqlite" # sqlite | postgres
retention_days: 90
dashboard_port: 8080
middleware:
- pii_scrubber
- request_logger
- prompt_injection_detector
Best Practices
- Set per-user rate limits — Prevent a single user from exhausting your entire API quota.
-
Cache aggressively for deterministic queries — If
temperature: 0, the same prompt always yields the same result. Cache it. -
Use the cheapest model that works — Route simple tasks to
gpt-4o-miniand reservegpt-4ofor complex reasoning. - Monitor error rates per provider — A sudden spike in 500s from one provider means your fallback chain is earning its keep.
- Rotate API keys on a schedule — Use hot-reload config to rotate keys monthly without gateway restarts.
- Test fallback paths — Intentionally disable your primary provider in staging to verify fallback routing works end-to-end.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| All requests return 429 Too Many Requests | Global rate limit too low for traffic volume | Increase global_rpm in config or add more provider API keys |
| Cache never hits despite repeated prompts | Message metadata (timestamps, request IDs) differ between calls | Set hash_strategy: "content" to hash only message content |
| Fallback provider returns format errors | Response schemas differ between providers | Ensure response_format normalization is enabled in middleware |
| Analytics dashboard shows $0 cost | Cost calculation requires model pricing table | Update pricing.yaml with current per-token rates for each model |
This is 1 of 11 resources in the AI Builder Pro toolkit. Get the complete [AI API Gateway] with all files, templates, and documentation for $39.
Or grab the entire AI Builder Pro bundle (11 products) for $169 — save 30%.
Top comments (0)