DEV Community

Cover image for Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint - Without Fixed Cost!
Gabriel Koo for AWS Community Builders

Posted on • Edited on • Originally published at gabrielkoo.com

13

Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint - Without Fixed Cost!

Why bedrock-access-gateway-function-url

This article is for GenAI builders who cares for all of these:

  1. No fixed cost, pay as you go priced
  2. Serverless LLM, no self hosting
  3. Multiple models in one codebase

A Typical GenAI Builder's Struggle

You are a builder specialized on AWS, maybe with a lot of AWS Credits like me.

You want to build GenAI applications when you found that most starters/examples are based on OpenAI's official Python/NodeJS SDKs, e.g.:

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(completion.choices[0].message)
# > Hello! What can I help you?
Enter fullscreen mode Exit fullscreen mode

If you did read through Amazon Bedrock docs, you would realize that the data schema of the Bedrock Runtime Converse API for chat completions is very different from OpenAI's. If you need to allow model/provider switching in your GenAI application, this is particularly a burden because you might need to write very different implementations for each provider.

There are also other provider specific implementations: VertexAI, Gemini API, LangChain, etc. It takes effort to rewrite your code to cater for more models. If you are working on multiple projects, you might be maintaining the same set of code within different projects.

A New Hope - But It Comes with a Fixed Cost

To fix this issue, AWS has provided the great project aws-samples/bedrock-access-gateway - It allows you to deploy an Application Load Balancer + Lambda/Fargate pair so that you can use OpenAI's official SDKs with the OpenAI-API compatibile Rest API endpoint via the environment variables OPENAI_API_BASE and OPENAI_API_KEY.

It achieves goals #2 and #3 in the first section. You can work on projects utilizing OpenAI SDKs with ease.

The Fixed Cost Strikes Back

Yes it's absolutely great, but it's also costly if you are building your GenAI project particularly with your own money/limited budget:

  1. Application Load Balancer is running 24/7 once deployed, it comes with a fixed cost per hour:

    • $0.0225 per Application Load Balancer-hour; or
    • $16.2 / month FIXED cost regardless of usage
    • In addition, there is also the variable cost: No. LCUs used * $0.008 per LCU-hour
  2. Fargate (the alternative deployment option) is also running 24/7, so it also comes with an additional fixed cost on top of ALB:

    • $0.04048 / vCPU hour
    • $0.004445 / GB hour
    • $35.5 / month FIXED cost under the default 1vCPU+2GB RAM setup

It's a cost nightmare especially for those who don't require 24/7 uptime and usage for the OpenAI compatible API endpoint.

Also if a fixed cost is unavoidable, why don’t we just start a cloud VM and put everything inside it instead?

Why Bedrock in the First Place?

Something feels wrong to me. I used Amazon Bedrock with the 1st reason being it's serverless nature and pay as you go capability - Why bother to pay a gigantic fixed monthly cost to host your own open sourced LLM with a VM paired with expensive GPU when you can just pick the serverless option?

The 2nd reason of picking Bedrock is on the ease of switching models.

With Bedrock, not only you can use proprietary models like Amazon Nova, but also it's immediate compatibility with other open source models like LLaMA 3.3 (While VertexAI is still offering LLaMA 3.2 at most) or Mistral by just changing the model field in your code - without extra “endpoint deployments” - this is what other major Cloud AI providers can't provide at the moment.

For example for Azure AI, every non-OpenAI model needs to be deployed into separate inference endpoints:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model_a = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL_A"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY_A"]),
)
model_b = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL_B"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY_B"]),
)
Enter fullscreen mode Exit fullscreen mode

For VertexAI, while you can use non-Gemini models with the same API endpoint as well as credential, the OpenAI API compatible endpoint by Google Cloud is still in beta as of time of writing (2025 Jan) - as well as multiple users are still reporting issues with tool calling.

Again, I want to stick with Bedrock with my OpenAI SDKs, but I am not willing to pay a fixed recurring cost for my GenAI application that might not generate 24/7 traffic.

When in Doubt, Read the Docs First

The maintainers of bedrock-access-gateway suggested, namely for performance improvements that:

Also, you can use Lambda Web Adapter + Function URL (see example) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.

We could have used a Lambda Function URL to replace ALB via Lambda Web Adapter.

This sample app provided by AWS was based on a Lambda function with a Docker runtime, and as it's name suggests, it is a sample app not used for general purposes: Serverless Bedtime Storyteller. With this example in place, I can build a serverless "Fixed Cost lessness" version of the bedrock access gateway.

Building the bedrock-access-gateway-function-url project

I made a few tweaks from the original bedrock-access-gateway project:

  1. Ditched Magnum for Lambda Web Adapter
  2. Switched from a Docker Container runtime back to Python runtime with layers - as Lambda Docker runtimes are famous for it's cold start times
  3. Enabled the option (--no-embedding) to exclude embedding related dependencies which could drastically increase the build size - tiktoken and numpy
  4. Wrapped the Python handler with a custom entry point run.sh
  #!/bin/bash
  PATH=$PATH:$LAMBDA_TASK_ROOT/bin \
    PYTHONPATH=$LAMBDA_TASK_ROOT:$PYTHONPATH:/opt/python \
    exec python3 api/app.py
Enter fullscreen mode Exit fullscreen mode

This is necessary since using the Lambda Web Adapter resets some Python Path settings which would cause your Layered dependencies to be un-importable.

Lastly, the crux of my project is the very prepare_source.sh file - it fetches the latest Python source of bedrock-access-gateway with git so that the latest efforts from the aws-examples contributors are included. The scripts clones from the latest main branch of the project, and copies the Python FastAPI implementation of the access gateway.

It also conducts an optional dependency reduction if you do not need to call the embeddings endpoint, as large PyPI dependencies like numpy or tiktoken could have been avoided.

Deployment

Straightforward. I personally recommend using the AWS CloudShell as you can even do so with your mobile AWS Console, and you can save some time by skipping the need of a Docker build:

sudo yum update -y
sudo yum install -y python3.12 python3.12-pip
(
    cd /tmp && \
    curl -L https://github.com/aws/aws-sam-cli/releases/latest/download/aws-sam-cli-linux-x86_64.zip -o aws-sam-cli-linux-x86_64.zip && \
    unzip aws-sam-cli-linux-x86_64.zip -q -d sam-installation && \
    sudo ./sam-installation/install
)

git clone --depth=1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
cd bedrock-access-gateway-function-url

./prepare_source.sh
sam build
sam deploy --guided
Enter fullscreen mode Exit fullscreen mode

After within a minute, grab the value of FunctionUrl as well as recall the value of ApiKey value you supplied earlier in sam deploy:

Outputs                                                                                                                                                                                                                                       

Key                 Function                                                                                                                                                                                                                  
Description         FastAPI Lambda Function ARN                                                                                                                                                                                               
Value               arn:aws:lambda:us-east-1:123456789012:function:sam-app-BedrockAccessGatewayFunction-yLLzetPaKSq5                                                                                                                          

Key                 FunctionUrl                                                                                                                                                                                                               
Description         Function URL for FastAPI function                                                                                                                                                                                         
Value               https://lukeskywalker.lambda-url.us-east-1.on.aws/                                                                                                                                                     

Successfully created/updated stack - sam-app in us-east-1
Enter fullscreen mode Exit fullscreen mode

Now, test your own dedicated pay-as-you-go serverless infrastructure OpenAI-compatible API endpoint in your GenAI application!

curl "${FUNCTION_URL}api/v1/models" \
     -H "Authorization: Bearer $API_KEY"
# {
#   "object": "list",
#   "data": [
#     {
#       "id": "amazon.titan-tg1-large",
#       "created": 1735826872,
#       "object": "model",
#       "owned_by": "bedrock"
#     },
#     ...
#   ]
# }
Enter fullscreen mode Exit fullscreen mode

Streaming with the Access Gateway

Alternatively, I have built a minimal UI based on the deep-chat project so that you can test it without access to any local shell environment: https://chat.gab.hk/.

No worries about security - it’s an open sourced static website, no backend and tracking scripts. Just bring your own endpoint and key.

Return of Cost Effectiveness

With the new true serverless option, here are the costs incurred:

  • Amazon Bedrock costs: Pay-as-you-go according to token usage
  • Lambda Invocation costs: Per GB-second + Per Requests

So here is the final repository containing the entire setup:
https://github.com/gabrielkoo/bedrock-access-gateway-function-url

Feel free to fork it and create your own!

Next Steps

In order to further productionize it, here a list of to-dos that could have been done:

  1. Wrap the Function URL with Amazon CloudFront and adopt OAC - Reference Article - Secure your Lambda function URLs using Amazon CloudFront origin access control
  2. Experiment for the optimal memory size and timeout for the Lambda handler to achieve better cost efficiency
  3. Use provisioned throughput to further avoid Lambda cold starts
  4. Support multiple API keys by updating api.auth.api_key_auth logic
  5. Support non-text/image content, such as DocumentContent or VideoContent which are well supported by Amazon Bedrock Converse API.

Credits

Special thanks to the contributors of the following two projects. As without their efforts, this cost effective gateway won't even exist:

Billboard image

Synthetic monitoring. Built for developers.

Join Vercel, Render, and thousands of other teams that trust Checkly to streamline monitor creation and configuration with Monitoring as Code.

Start Monitoring

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay