Originally published at Building Maester — Enable Multi-provider LLM APIs.
We Locked Ourselves Into GCP
Most infrastructure mistakes don’t start as mistakes. They start as reasonable decisions. This one started with a discount.
It Worked Beautifully
In the beginning, the decision felt obvious. We had a large GCP startup credit, so our entire stack ran there.
Compute.
Storage.
Data pipelines.
Model training.
...
Everything.
And honestly, it worked beautifully! Monitoring was already integrated.
Identity management was built in. IAM policies were easy to manage.
Even LDAP integration was already available.
One of my teammates said something that sounded perfectly reasonable:
“Don’t reinvent the wheel.”
And he was right. Why build infrastructure when the cloud already solved it?
We were a small team. Most of our compute was tied to token usage, so costs looked predictable. Everything felt lightweight.
So we did what most startups do. We committed.
Where Did the Cost Come From?
"Don't spend like a billionaire with the company's money !"
What? We were all so much confused with the bill complaints at a Monday morning standup meeting months later. The bill arrived and nobody could clearly explain it. It was the cloud bill. And it started eating into margins.
Where did the cost come from?
Storage?
Network egress?
Pipelines?
Inference traffic?
Someone suggested hiring a cloud optimization engineer. Another suggested redesigning the entire data pipeline.
But we were still a startup. Every time we opened the roadmap we saw something else staring at us:
- Customer requests.
- Feature releases.
- Revenue milestones.
Infrastructure work always lost that fight. So the bills kept climbing. We weren't bankrupt. But we were trapped.
We Split the Stack
Eventually we did something radical. We split the stack. The architecture finally looked like this:
Azure → Identity / Compliance
AWS → Applications / Storage
GCP → Data Pipelines / Training
And the cost?
Still expensive. But predictable. Even without our original startup discount, the system became easier to control.
Vendor lock-in is invisible when things work. It becomes obvious only when you try to leave.
We Are Not Going to Lock into OpenAI
So when we started building the AI APIs, I began seeing the same pattern again.
It was just:
response = openai.responses.create(...)
And honestly, that works.
But I kept remembering the GCP moment. The moment when switching vendors became impossible. We were about to repeat the same mistake.
Except this time the vendor was not a cloud. It was a model provider. So I made a decision. We are not going to lock into OpenAI.
Approach 1 — Let the client choose the model
The simplest idea was letting the client select the model.
POST /generate
{
"model": "gpt-4.1-mini"
}
This allowed switching between providers.
- OpenAI
- Anthropic
- Others later
Technically it worked. But users quickly complained.
“I don't want to choose the model. I just want the best answer.”
The user is always right. They just wanted results.
Approach 2 — Introduce a Model Gateway
So we moved the decision out of the client. Instead of clients choosing providers, we introduced a Model Gateway.
Application
↓
Model Gateway
↓
Provider Router
↓
Provider Adapter
(OpenAI / Anthropic / others)
This gateway would manage:
- provider routing
- fallback logic
- cost tracking
- observability
- evaluation
The application now simply asks for a response. And the infrastructure decides how to produce it.
The Real Code
The implementation lives inside a small reference project I’ve been building called Maester.
The goal of the project is not to build a full AI platform, but to demonstrate a reliable AI API architecture.
The gateway sits inside the system like this:
apps/
api/
routes/
reliable_completion.py
packages/
model_gateway/
base.py
provider_openai.py
provider_anthropic.py
router.py
client.py
The Provider Contract
The first step was defining a provider interface. This follows the Adapter Pattern, allowing different model vendors to conform to a shared interface.
class ModelProvider(Protocol):
def supports(self, model: str) -> bool:
...
def generate(self, request: GenerationRequest) -> GenerationResponse:
...
Each provider adapter simply implements this contract.
For example:
OpenAIProvider
AnthropicProvider
Both produce the same normalized response.
GenerationResponse
├─ provider
├─ model
├─ content
└─ usage
This means the rest of the system never deals with vendor-specific formats.
The Router
Next comes the router. The router decides which provider handles a request.
class ModelRouter:
def route(self, model: str) -> ModelProvider:
for provider in self.providers:
if provider.supports(model):
return provider
return self.fallback_provider
In production systems this layer can later evolve into:
- cost-aware routing
- latency-aware routing
- capability routing
- traffic shaping
But the interface stays the same.
The Gateway Client
Finally the application talks to the gateway through a simple client.
class ModelGateway:
def generate(self, model: str, prompt: str, max_tokens: int):
request = GenerationRequest(
model=model,
prompt=prompt,
max_tokens=max_tokens,
)
provider = self.router.route(model)
return provider.generate(request)
The API layer doesn't know which provider was selected. It just receives a normalized response.
The API Layer
The FastAPI route becomes extremely simple.
model_response = model_gateway.generate(
model=requested_model,
prompt=payload.prompt,
max_tokens=payload.max_tokens,
)
After generation, the system runs the reliability pipeline:
- Cost metering
- Evaluation
- Structured logging
Example log:
model_routed
requested_model: gpt-4.1-mini
selected_provider: openai
fallback_used: false
This gives operators visibility without leaking provider logic into application code.
Why This Architecture Matters
This design combines several classic software engineering principles:
- Dependency Inversion Application code depends on abstractions, not providers.
- Adapter Pattern Each vendor SDK is wrapped behind a provider adapter.
- Strategy Pattern Routing policies are interchangeable strategies.
- Separation of Concerns API layer handles orchestration.Gateway handles provider logic.
What This Enables Later
Once this boundary exists, the system becomes far easier to evolve.
For example:
- multi-provider fallback
- provider benchmarking
- cost-aware routing
- latency optimization
- evaluation-based routing
All of those changes can happen inside the gateway. The application API never changes. That is the real value of the design.
The Lesson
Vendor lock-in rarely feels dangerous at the beginning. Everything works. Costs look reasonable. The roadmap is full of features.
Then one day something changes. Prices rise. Performance shifts. A better provider appears.And suddenly the architecture makes switching painful.
The lesson I learned from our cloud migration was simple: Always design one layer where you can change your mind later.
For our AI systems, that layer became the Model Gateway.
The application talks to the gateway.
The gateway talks to providers.
And the providers can change.
Because eventually they always do.
Note: This article was originally published on my egineering blog where I document the design of Maester, an AI SaaS infrastructure system built in public.
Original post: Building Maester — Enable Multi-provider LLM APIs.
Top comments (0)