Gabriel Koo for AWS Community Builders

Posted on Jan 2 • Edited on Aug 28

Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint - Without Fixed Cost!

#aws #openai #serverless #bedrock

UPDATE (2025-08-05): AWS silently launched an official OpenAI compatible API endpoint - but so far it only supports the gpt-oss reasoning models and doesn't support tool calls (refer to my new article). So this solution still rocks until AWS gradually adds more support.

Why `bedrock-access-gateway-function-url`

This article is for GenAI builders who cares for all of these:

I want to use OpenAI SDK/compatible API
No fixed cost, pay as you go priced
Serverless LLM, no self hosting
Multiple models in one codebase
Lightweight solution without bloatware

If you want to stick to AWS, and wants to keep the simplicity without the burden of maintain extra configuration, continue reading.

Why not XXX as solution instead?

Solution	Pros & Cons
LiteLLM (SDK)	(-) full list of unnecessary dependencies potentially bloating your Python environment/application, e.g., `gunicorn`, `fastapi`, `google-cloud-kms`, etc. (-) Python only
LiteLLM (Proxy)	(-) Huge infra cost (Worker + Database + Redis) (-) Good luck with maintaining `docker-compose` / K8S
bedrock-access-gateway	(-) >US$16/month (-) Extra Load Balancer needed + Fargate/Lambda pricing
aisuite	(+) No bloatware issue with usage of `extra` Python dependencies (+) No extra infra cost (-) Python Only
This Solution	(+) Only minimal pay-as-you-go Lambda exec costs

A Typical GenAI Builder's Struggle

You are a builder specialized on AWS, maybe with a lot of AWS Credits like me.

You want to build GenAI applications when you found that most starters/examples are based on OpenAI's official Python/NodeJS SDKs, e.g.:

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(completion.choices[0].message)
# > Hello! What can I help you?

Or separately, you have a bunch of AWS Credits that would be useful for you to utilize them with e.g. a coding CLI agent, but Amazon Q Developer CLI ($19/month) is too big of a spending for you - you might want to use OpenAI Codex CLI instead - https://dev.to/aws-builders/use-openai-codex-cli-with-amazon-bedrock-models-pay-as-you-go-48eb.

If you did read through Amazon Bedrock docs, you would realize that the data schema of the Bedrock Runtime Converse API for chat completions is very different from OpenAI's. If you need to allow model/provider switching in your GenAI application, this is particularly a burden because you might need to write very different implementations for each provider.

There are also other provider specific implementations: VertexAI, Gemini API, LangChain, etc. It takes effort to rewrite your code to cater for more models. If you are working on multiple projects, you might be maintaining the same set of code within different projects.

A New Hope - But It Comes with a Fixed Cost

To fix this issue, AWS has provided the great project aws-samples/bedrock-access-gateway - It allows you to deploy an Application Load Balancer + Lambda/Fargate pair so that you can use OpenAI's official SDKs with the OpenAI-API compatibile Rest API endpoint via the environment variables OPENAI_API_BASE and OPENAI_API_KEY.

It achieves goals #2 and #3 in the first section. You can work on projects utilizing OpenAI SDKs with ease.

The Fixed Cost Strikes Back

Yes it's absolutely great, but it's also costly if you are building your GenAI project particularly with your own money/limited budget:

Application Load Balancer is running 24/7 once deployed, it comes with a fixed cost per hour:
- $0.0225 per Application Load Balancer-hour; or
- $16.2 / month FIXED cost regardless of usage
- In addition, there is also the variable cost: No. LCUs used * $0.008 per LCU-hour
Fargate (the alternative deployment option) is also running 24/7, so it also comes with an additional fixed cost on top of ALB:
- $0.04048 / vCPU hour
- $0.004445 / GB hour
- $35.5 / month FIXED cost under the default 1vCPU+2GB RAM setup

It's a cost nightmare especially for those who don't require 24/7 uptime and usage for the OpenAI compatible API endpoint.

Also if a fixed cost is unavoidable, why don’t we just start a cloud VM and put everything inside it instead?

Why Bedrock in the First Place?

Something feels wrong to me. I used Amazon Bedrock with the 1st reason being it's serverless nature and pay as you go capability - Why bother to pay a gigantic fixed monthly cost to host your own open sourced LLM with a VM paired with expensive GPU when you can just pick the serverless option?

The 2nd reason of picking Bedrock is on the ease of switching models.

With Bedrock, not only you can use proprietary models like Amazon Nova, but also it's immediate compatibility with other open source models like LLaMA 3.3 (While VertexAI is still offering LLaMA 3.2 at most) or Mistral by just changing the model field in your code - without extra “endpoint deployments” - this is what other major Cloud AI providers can't provide at the moment.

For example for Azure AI, every non-OpenAI model needs to be deployed into separate inference endpoints:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model_a = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL_A"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY_A"]),
)
model_b = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL_B"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY_B"]),
)

For VertexAI, while you can use non-Gemini models with the same API endpoint as well as credential, the OpenAI API compatible endpoint by Google Cloud is still in beta as of time of writing (2025 Jan) - as well as multiple users are still reporting issues with tool calling.

Again, I want to stick with Bedrock with my OpenAI SDKs, but I am not willing to pay a fixed recurring cost for my GenAI application that might not generate 24/7 traffic.

When in Doubt, Read the Docs First

The maintainers of bedrock-access-gateway suggested, namely for performance improvements that:

Also, you can use Lambda Web Adapter + Function URL (see example) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.

We could have used a Lambda Function URL to replace ALB via Lambda Web Adapter.

This sample app provided by AWS was based on a Lambda function with a Docker runtime, and as it's name suggests, it is a sample app not used for general purposes: Serverless Bedtime Storyteller. With this example in place, I can build a serverless "Fixed Cost lessness" version of the bedrock access gateway.

Building the `bedrock-access-gateway-function-url` project

I made a few tweaks from the original bedrock-access-gateway project:

Ditched Magnum for Lambda Web Adapter
Switched from a Docker Container runtime back to Python runtime with layers - as Lambda Docker runtimes are famous for it's cold start times
Enabled the option (--no-embedding) to exclude embedding related dependencies which could drastically increase the build size - tiktoken and numpy
Wrapped the Python handler with a custom entry point run.sh

  #!/bin/bash
  PATH=$PATH:$LAMBDA_TASK_ROOT/bin \
    PYTHONPATH=$LAMBDA_TASK_ROOT:$PYTHONPATH:/opt/python \
    exec python3 api/app.py

This is necessary since using the Lambda Web Adapter resets some Python Path settings which would cause your Layered dependencies to be un-importable.

Lastly, the crux of my project is the very prepare_source.sh file - it fetches the latest Python source of bedrock-access-gateway with git so that the latest efforts from the aws-examples contributors are included. The scripts clones from the latest main branch of the project, and copies the Python FastAPI implementation of the access gateway.

It also conducts an optional dependency reduction if you do not need to call the embeddings endpoint, as large PyPI dependencies like numpy or tiktoken could have been avoided.

Deployment

Straightforward. I personally recommend using the AWS CloudShell as you can even do so with your mobile AWS Console, and you can save some time by skipping the need of a Docker build:

sudo yum update -y
sudo yum install -y python3.12 python3.12-pip

git clone --depth=1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
cd bedrock-access-gateway-function-url

./prepare_source.sh
sam build
sam deploy --guided

After within a minute, grab the value of FunctionUrl as well as recall the value of ApiKey value you supplied earlier in sam deploy:

Outputs                                                                                                                                                                                                                                       

Key                 Function                                                                                                                                                                                                                  
Description         FastAPI Lambda Function ARN                                                                                                                                                                                               
Value               arn:aws:lambda:us-east-1:123456789012:function:sam-app-BedrockAccessGatewayFunction-yLLzetPaKSq5                                                                                                                          

Key                 FunctionUrl                                                                                                                                                                                                               
Description         Function URL for FastAPI function                                                                                                                                                                                         
Value               https://lukeskywalker.lambda-url.us-east-1.on.aws/                                                                                                                                                     

Successfully created/updated stack - sam-app in us-east-1

Now, test your own dedicated pay-as-you-go serverless infrastructure OpenAI-compatible API endpoint in your GenAI application!

curl "${FUNCTION_URL}api/v1/models" \
     -H "Authorization: Bearer $API_KEY"
# {
#   "object": "list",
#   "data": [
#     {
#       "id": "amazon.titan-tg1-large",
#       "created": 1735826872,
#       "object": "model",
#       "owned_by": "bedrock"
#     },
#     ...
#   ]
# }

Alternatively, I have built a minimal UI based on the deep-chat project so that you can test it without access to any local shell environment: https://chat.gab.hk/.

No worries about security - it’s an open sourced static website, no backend and tracking scripts. Just bring your own endpoint and key.

Return of Cost Effectiveness

With the new true serverless option, here are the costs incurred:

Amazon Bedrock costs: Pay-as-you-go according to token usage
Lambda Invocation costs: Per GB-second + Per Requests

So here is the final repository containing the entire setup:

gabrielkoo / bedrock-access-gateway-function-url

OpenAI-Compatible RESTful APIs for Amazon Bedrock, modified from the original "bedrock-access-gateway" project for not using ALB, so that one could deploy and use it under a pay as you go model WITH NO FIXED COSTS.

bedrock-access-gateway-function-url

Why not XXX?

Solution	Pros & Cons
LiteLLM (Python SDK)	(-) full list of unnecessary dependencies potentially bloating your Python environment/application, e.g., `gunicorn`, `fastapi`, `google-cloud-kms`, etc. (-) Python Only
LiteLLM (Proxy)	(-) Huge Infra Cost (Worker + Database + Redis) (-) Good luck with maintaining `docker-compose` / K8S
bedrock-access-gateway	(-) >US$16/month (-) Extra Load Balancer needed + Fargate/Lambda pricing
aisuite	(+) No bloatware issue with usage of `extra` Python dependencies (+) No extra infra cost (-) Python Only
This Solution	(+) Only minimal pay-as-you-go Lambda exec costs

Intro

This repo is combining the great works of the original implementations of bedrock-access-gateway…

View on GitHub

Feel free to fork it and create your own!

Next Steps

In order to further productionize it, here a list of to-dos that could have been done:

Wrap the Function URL with Amazon CloudFront and adopt OAC - Reference Article - Secure your Lambda function URLs using Amazon CloudFront origin access control
Experiment for the optimal memory size and timeout for the Lambda handler to achieve better cost efficiency
Use provisioned throughput to further avoid Lambda cold starts
Support multiple API keys by updating api.auth.api_key_auth logic
Support non-text/image content, such as DocumentContent or VideoContent which are well supported by Amazon Bedrock Converse API.