The Problem
Many developers already rely on the OpenAI Chat Completions API. But what if your platform needs vendor independence and cost optimisation by running inference on AWS Bedrock (or GCP Vertex AI) — while still giving developers a drop-in OpenAI-compatible API?
That was the challenge I faced when building OnglX deploy
Context & Architectural Decisions
I wanted developers to adopt OnglX deploy with zero code changes. That meant:
- Support the OpenAI API format as-is.
- Run inference on Bedrock, not OpenAI.
- Keep things extensible for multi-cloud (AWS + GCP).
- Secure it with API keys and serverless IAM best practices.
The architectural bet: a protocol translation layer that accepts OpenAI-style requests and translates them to provider-specific formats — then back again.
The Specific Challenge
Bedrock (and other providers) models don’t speak the same “language.”
- Claude requires separate handling of system prompts.
- Llama expects a continuation-style prompt.
-
Titan only supports
inputText
.
Even token usage reporting and error messages differ.
So: how do you unify all that while still looking and behaving like OpenAI?
Solution Approaches I Considered
-
Provider-Specific Endpoints
- ✅ Cleaner per model family
- ❌ Breaks drop-in compatibility
-
Unified Translation Layer (Chosen)
- ✅ Works with OpenAI clients directly
- ✅ Extensible for multi-cloud
- ❌ More translation logic & testing required
Trade-Offs
- Compatibility vs Performance → We favoured compatibility. Cold starts and translation overhead are acceptable since inference latency dominates anyway.
- Multi-cloud support vs Simplicity → Added complexity, but ensures portability and avoids lock-in.
- Serverless vs Dedicated Infra → Serverless chosen for cost efficiency and scaling, at the expense of slightly higher cold start latency.
Implementation Details
Request Translation Example
System messages in OpenAI can appear anywhere, but Claude needs them to be separate:
system_messages = [m for m in messages if m['role'] == 'system']
conversation_messages = [m for m in messages if m['role'] != 'system']
payload = {
"messages": conversation_messages,
"system": "\\n".join(m["content"] for m in system_messages)
}
Model-specific handling:
if self.model_id.startswith("anthropic."):
# Claude format
elif self.model_id.startswith("meta.llama"):
# Llama continuation format
elif self.model_id.startswith("amazon.titan"):
# Titan inputText format
Response Translation Example
Unifying token usage:
if self.model_id.startswith("anthropic."):
usage = {
"prompt_tokens": response["usage"]["input_tokens"],
"completion_tokens": response["usage"]["output_tokens"],
"total_tokens": response["usage"]["input_tokens"] + response["usage"]["output_tokens"]
}
Error mapping to OpenAI style:
except ClientError as e:
if e.response["Error"]["Code"] == "ValidationException":
raise ValueError("Invalid request or model")
Infrastructure Setup
- API Gateway v2 (CORS + auth)
- DynamoDB (API key management with TTL)
- AWS Lambda (Python 3.12, 512MB, 30s timeout)
- Amazon Bedrock (AI Models offering)
- IAM (least-privilege Bedrock + DynamoDB access)
Terraform example:
resource "aws_lambda_function" "api" {
runtime = "python3.12"
timeout = 30
memory_size = 512
}
Results & Lessons Learned
- Cold starts aren’t a big issue since inference latency dominates.
- 512MB Lambda was the sweet spot — lower failures, higher waste costs.
- Model prefix detection was a clean way to handle provider quirks.
- Error translation was critical for dev UX: users expect OpenAI-like errors.
Most importantly, developers could point their OpenAI clients to our endpoint without code changes.
When to Apply This Approach
- You need drop-in OpenAI API compatibility but want to use another provider (AWS/GCP/self-hosted).
- You’re building multi-cloud AI infra and need consistent APIs.
- You prioritise developer adoption and compatibility over raw performance.
Call to Action
If you’re building AI infra and want OpenAI compatibility without vendor lock-in, this translation-layer pattern is worth considering.
👉 Explore OnglX Deploy’s OpenAI-compatible API Docs and try deploying your own OpenAI-compatible API on AWS in minutes.
Top comments (0)