Unify Your GenAI Arsenal: Deploying Bedrock, Gemini, and More with LiteLLM

#ai #aws #machinelearning #llm

The world of generative AI is expanding at an incredible pace. Developers now have access to a powerful array of Large Language Models (LLMs) from providers like OpenAI, Google (Gemini), Anthropic (Claude), and a vast collection available through services like AWS Bedrock and Hugging Face. While this choice is empowering, it introduces a significant challenge for engineering teams: each model comes with its own unique API, SDK, and authentication mechanism.

Managing this complexity can lead to a fragmented codebase, vendor lock-in, and operational headaches. What if you could interact with all of these models through a single, consistent interface?

Enter LiteLLM, the open-source library designed to be the Swiss Army knife for GenAI deployment. It provides a universal translation layer, allowing you to call over 100 different LLMs using the exact same code format. Let's explore how you can leverage LiteLLM to streamline your development and deployment workflows.

The Challenge: A Multi-API World

Before a tool like LiteLLM, interacting with different models meant writing provider-specific code.

For example, a call to OpenAI might look like this:

# Requires 'openai' library
from openai import OpenAI
client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[{"role": "user", "content": "Hello, world!"}]
)

Now, if you wanted to switch to Anthropic's Claude on AWS Bedrock, you'd need a completely different setup:

# Requires 'boto3' library
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name='us-east-1')

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, world!"}]
})

response = bedrock_runtime.invoke_model(
    body=body,
    modelId='anthropic.claude-3-sonnet-v1:0'
)

This approach is not scalable. It complicates A/B testing, prevents easy failover to a backup provider, and bloats your application with multiple SDKs and conditional logic.

LiteLLM to the Rescue: A Unified Interface

LiteLLM elegantly solves this problem by providing a single function, litellm.completion(), that acts as a universal entry point.

Getting Started

Installation:
Getting started is as simple as a pip install.
```
pip install litellm
```

Configuration:
Set your API keys as environment variables. LiteLLM automatically detects them based on the model you are calling.

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-..."
export AWS_ACCESS_KEY_ID="your-aws-key-id"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-key"
export GOOGLE_API_KEY="your-google-api-key"

Unified Code:
Now, you can call any supported model by simply changing the model parameter string.

import litellm

# Call OpenAI's GPT-4o
response = litellm.completion(
  model="gpt-4o",
  messages=[{"role": "user", "content": "Write a tagline for a coffee shop."}]
)
print(response.choices[0].message.content)

# Switch to Claude 3 Sonnet on Bedrock
response = litellm.completion(
  model="bedrock/anthropic.claude-3-sonnet-v1:0",
  messages=[{"role": "user", "content": "Write a tagline for a coffee shop."}]
)
print(response.choices[0].message.content)

# Switch to Google's Gemini Pro
response = litellm.completion(
  model="gemini/gemini-pro",
  messages=[{"role": "user", "content": "Write a tagline for a coffee shop."}]
)
print(response.choices[0].message.content)

As you can see, the application logic remains identical. The only thing that changes is the model identifier. This dramatically simplifies development and makes your application incredibly flexible.

Deploying for Production: The LiteLLM Proxy

For production environments, LiteLLM offers a powerful proxy server. This standalone service acts as a centralized gateway for all LLM requests within your organization. It exposes an OpenAI-compatible API, meaning any tool or application built to work with OpenAI can immediately work with any model you configure in LiteLLM.

Why use the Proxy?

Centralized Key Management: Your applications don't need to store sensitive API keys. All keys are managed securely within the proxy's configuration.
Load Balancing & Failover: Distribute requests across multiple API keys or even different models. If one model provider has an outage, the proxy can automatically route traffic to a configured backup.
Standardized Endpoint: All your internal services point to a single, consistent API endpoint, abstracting away the underlying model providers.
Cost Control & Observability: The proxy provides detailed logging, usage tracking, and allows you to set budgets and rate limits per key or model.

How to Deploy the Proxy

Create a Configuration File:
Create a config.yaml to define your models and API keys.

model_list:
  - model_name: gpt-4-turbo
    litellm_params:
      model: gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-3-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-sonnet-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: gemini-pro-router
    litellm_params:
      model: gemini/gemini-pro
      api_key: os.environ/GOOGLE_API_KEY

litellm_settings:
  # Sets the proxy to be non-blocking
  # For production, you would run this with a process manager like gunicorn
  background_tasks: True

Run the Proxy:
Start the proxy using the LiteLLM CLI.
```
litellm --config /path/to/your/config.yaml
```

Make a Request:
You can now make a standard OpenAI-compatible request to your local proxy endpoint.

curl -X POST http://0.0.0.0:4000/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "claude-3-sonnet",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}'

From here, you can easily containerize the proxy using Docker and deploy it to any environment, such as Kubernetes, providing a robust, scalable, and manageable gateway for your entire organization's GenAI needs.

Conclusion

LiteLLM is more than just a convenience library; it's a strategic tool for any team building with generative AI. By providing a unified abstraction layer, it decouples your application from specific model providers, giving you the freedom to choose the best tool for the job without rewriting your code.

Whether you're a developer looking to simplify your workflow or a DevOps engineer building a resilient, multi-provider AI infrastructure, LiteLLM provides the features you need to succeed. It transforms the complex, fragmented LLM landscape into a simple, manageable, and unified resource.