## The Problem
Like many indie developers, I've been building small AI-powered projects over the past year. And like many of you, I kept running into the same frustrating issues:
- **Rate limiting** — `429 Too Many Requests` became a daily sight
- **Multiple API keys** — one for GPT, one for Claude, one for Gemini... managing them all was a mess
- **Regional restrictions** — certain models simply weren't available from my location
- **Unpredictable costs** — hard to track spending across different providers
Every time I hit one of these walls, I'd spend hours debugging infrastructure instead of building actual features. That's when I decided to solve this once and for all.
## The Solution
I built **ourhubapi.com** — a unified API gateway that acts as a smart relay between your application and multiple LLM providers.
Here's the core idea:
[Your App] --> [Single API Endpoint] --> [Smart Router] --> [GPT/Claude/Gemini/...]
|
--> [Auto-failover when rate-limited]
Instead of calling each provider directly, your app talks to **one endpoint**. The gateway handles everything else behind the scenes.
## Key Technical Decisions
### 1. Smart Load Balancing
The most critical feature is automatic failover. When one upstream account hits a rate limit, the router instantly switches to another available account. Your app never sees a `429` error.
Here's a simplified version of the routing logic:
python
def route_request(model, messages):
upstreams = get_available_upstreams(model)
for upstream in upstreams:
try:
response = upstream.call(messages)
return response
except RateLimitError:
mark_rate_limited(upstream)
continue
raise AllUpstreamsBusy()
2. Drop-in OpenAI SDK Compatibility
The API is fully compatible with the OpenAI SDK format. Switching takes exactly one line change:
plaintext
Before: calling OpenAI directly
client = OpenAI(api_key="sk-...")
After: routing through the gateway
client = OpenAI(
api_key="your-ourhubapi-key",
base_url="https://api.ourhubapi.com/v1"
)
Everything else stays the same
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
3. Usage Quotas per API Key
For small teams, cost control is essential. Each API key can have:
Spending caps (daily / monthly)
Rate limits (requests per minute)
Model access control (enable only what the team needs)
This way, you can give keys to team members without worrying about surprise bills.
Why Not Just Use the Official APIs?
A fair question. If you're using a single model with low traffic, the official API might work fine. But once you:
Need multiple models in one project
Hit rate limits during development
Want predictable costs across a team
Having a middleware layer becomes genuinely useful. It's the same reason we use load balancers for web servers — redundancy and simplicity.
What I Learned
Building this taught me a lot about:
Handling distributed rate limits gracefully
Designing APIs that developers actually want to use
The importance of "it just works" over feature overload
Try It Out
The service is live at ourhubapi.com . I'd love to hear your feedback — what features would make this useful for your own projects?
This is very much a v1, built by a developer for developers. If you have thoughts, criticisms, or feature requests, drop a comment below. I'm reading every single one.
Top comments (0)