LiteLLM Has a Free API — Unified Proxy for 100+ LLM Providers

#ai #python #tutorial #api

LiteLLM is a unified API gateway that lets you call 100+ LLM providers using the same OpenAI-compatible format. Switch between OpenAI, Anthropic, Bedrock, Vertex AI, Ollama, and more — without changing your code.

Free, open source, Python-native. Used by thousands of companies for LLM routing.

Why Use LiteLLM?

One interface, 100+ providers — OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure, Cohere, Replicate, Ollama, and more
OpenAI-compatible proxy — deploy as a server, use with any OpenAI SDK
Cost tracking — track spend per model, per user, per team
Load balancing — route between multiple API keys/deployments
Fallbacks — automatic retry with different providers
Rate limiting — per-user and per-model rate limits

Quick Setup

1. Install

pip install litellm[proxy]

# Start proxy server
litellm --model gpt-4o --port 4000

2. Use as Python Library

from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

# Same function, different providers
response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is web scraping?"}]
)
print(response.choices[0].message.content)

# Switch to Claude — same code!
response = completion(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "What is web scraping?"}]
)
print(response.choices[0].message.content)

# Use Ollama (local)
response = completion(
    model="ollama/llama3.1",
    messages=[{"role": "user", "content": "What is web scraping?"}],
    api_base="http://localhost:11434"
)

3. Proxy Server with Config

# config.yaml
model_list:
  - model_name: gpt-4
    litellm_params:
      model: gpt-4o
      api_key: sk-...
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://my-azure.openai.azure.com
      api_key: ...
  - model_name: claude
    litellm_params:
      model: claude-3-5-sonnet-20241022
      api_key: sk-ant-...
  - model_name: local
    litellm_params:
      model: ollama/llama3.1
      api_base: http://localhost:11434

router_settings:
  routing_strategy: least-busy
  num_retries: 3
  fallbacks: [{"gpt-4": ["claude"]}]

litellm --config config.yaml --port 4000

4. Query the Proxy

# Uses OpenAI format
curl -s http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-litellm-master-key" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Compare Playwright vs Puppeteer"}]
  }' | jq '.choices[0].message.content'

# Switch to Claude — same endpoint
curl -s http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-litellm-master-key" \
  -d '{"model": "claude", "messages": [{"role": "user", "content": "Hello"}]}' | jq '.choices[0].message.content'

5. Cost Tracking

from litellm import completion

response = completion(model="gpt-4o", messages=[{"role": "user", "content": "Hi"}])

print(f"Cost: ${response._hidden_params['response_cost']:.6f}")
print(f"Tokens: {response.usage.total_tokens}")

Key Proxy Endpoints

Endpoint	Description
/v1/chat/completions	Chat
/v1/completions	Text completion
/v1/embeddings	Embeddings
/v1/models	List available models
/v1/images/generations	Image generation
/spend/logs	Spending logs
/model/info	Model configuration
/health	Health check

Supported Providers (100+)

Provider	Model Format
OpenAI	gpt-4o, gpt-4o-mini
Anthropic	claude-3-5-sonnet-...
AWS Bedrock	bedrock/anthropic.claude-v2
Google Vertex	vertex_ai/gemini-pro
Azure	azure/gpt-4
Ollama	ollama/llama3.1
Replicate	replicate/model-name
Cohere	command-r-plus

Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors

DEV Community