DEV Community

Cover image for Building OpenAI API Compatibility with AWS Bedrock
Njabulo Majozi
Njabulo Majozi

Posted on

Building OpenAI API Compatibility with AWS Bedrock

The Problem

Many developers already rely on the OpenAI Chat Completions API. But what if your platform needs vendor independence and cost optimisation by running inference on AWS Bedrock (or GCP Vertex AI) — while still giving developers a drop-in OpenAI-compatible API?

That was the challenge I faced when building OnglX deploy


Context & Architectural Decisions

I wanted developers to adopt OnglX deploy with zero code changes. That meant:

  • Support the OpenAI API format as-is.
  • Run inference on Bedrock, not OpenAI.
  • Keep things extensible for multi-cloud (AWS + GCP).
  • Secure it with API keys and serverless IAM best practices.

The architectural bet: a protocol translation layer that accepts OpenAI-style requests and translates them to provider-specific formats — then back again.


The Specific Challenge

Bedrock (and other providers) models don’t speak the same “language.”

  • Claude requires separate handling of system prompts.
  • Llama expects a continuation-style prompt.
  • Titan only supports inputText.

Even token usage reporting and error messages differ.

So: how do you unify all that while still looking and behaving like OpenAI?


Solution Approaches I Considered

  1. Provider-Specific Endpoints
    • ✅ Cleaner per model family
    • ❌ Breaks drop-in compatibility
  2. Unified Translation Layer (Chosen)
    • ✅ Works with OpenAI clients directly
    • ✅ Extensible for multi-cloud
    • ❌ More translation logic & testing required

Trade-Offs

  • Compatibility vs Performance → We favoured compatibility. Cold starts and translation overhead are acceptable since inference latency dominates anyway.
  • Multi-cloud support vs Simplicity → Added complexity, but ensures portability and avoids lock-in.
  • Serverless vs Dedicated Infra → Serverless chosen for cost efficiency and scaling, at the expense of slightly higher cold start latency.

Implementation Details

Request Translation Example

System messages in OpenAI can appear anywhere, but Claude needs them to be separate:

system_messages = [m for m in messages if m['role'] == 'system']
conversation_messages = [m for m in messages if m['role'] != 'system']

payload = {
    "messages": conversation_messages,
    "system": "\\n".join(m["content"] for m in system_messages)
}
Enter fullscreen mode Exit fullscreen mode

Model-specific handling:

if self.model_id.startswith("anthropic."):
    # Claude format
elif self.model_id.startswith("meta.llama"):
    # Llama continuation format
elif self.model_id.startswith("amazon.titan"):
    # Titan inputText format
Enter fullscreen mode Exit fullscreen mode

Response Translation Example

Unifying token usage:

if self.model_id.startswith("anthropic."):
    usage = {
        "prompt_tokens": response["usage"]["input_tokens"],
        "completion_tokens": response["usage"]["output_tokens"],
        "total_tokens": response["usage"]["input_tokens"] + response["usage"]["output_tokens"]
    }
Enter fullscreen mode Exit fullscreen mode

Error mapping to OpenAI style:

except ClientError as e:
    if e.response["Error"]["Code"] == "ValidationException":
        raise ValueError("Invalid request or model")
Enter fullscreen mode Exit fullscreen mode

Infrastructure Setup

AWS Infra

  • API Gateway v2 (CORS + auth)
  • DynamoDB (API key management with TTL)
  • AWS Lambda (Python 3.12, 512MB, 30s timeout)
  • Amazon Bedrock (AI Models offering)
  • IAM (least-privilege Bedrock + DynamoDB access)

Terraform example:

resource "aws_lambda_function" "api" {
  runtime     = "python3.12"
  timeout     = 30
  memory_size = 512
}
Enter fullscreen mode Exit fullscreen mode

Results & Lessons Learned

  • Cold starts aren’t a big issue since inference latency dominates.
  • 512MB Lambda was the sweet spot — lower failures, higher waste costs.
  • Model prefix detection was a clean way to handle provider quirks.
  • Error translation was critical for dev UX: users expect OpenAI-like errors.

Most importantly, developers could point their OpenAI clients to our endpoint without code changes.


When to Apply This Approach

  • You need drop-in OpenAI API compatibility but want to use another provider (AWS/GCP/self-hosted).
  • You’re building multi-cloud AI infra and need consistent APIs.
  • You prioritise developer adoption and compatibility over raw performance.

Call to Action

If you’re building AI infra and want OpenAI compatibility without vendor lock-in, this translation-layer pattern is worth considering.

👉 Explore OnglX Deploy’s OpenAI-compatible API Docs and try deploying your own OpenAI-compatible API on AWS in minutes.

Top comments (0)