DEV Community

EveryLocalAI
EveryLocalAI

Posted on

Build a Unified AI Gateway with LiteLLM and Ollama

Unify all your AI models - local and cloud - behind a single OpenAI-compatible API with LiteLLM and Ollama.

LiteLLM is a proxy server that exposes 100+ LLM providers through one endpoint. Connect it to Ollama for local inference, and you get load balancing, cost tracking, rate limits, and automatic fallback routing.

What You Need

  • Python 3.9+
  • Ollama installed and running
  • About 20 minutes

Setup

1. Install LiteLLM

pip install 'litellm[proxy]'
Enter fullscreen mode Exit fullscreen mode

2. Create config.yaml

model_list:
  - model_name: qwen3-local
    litellm_params:
      model: ollama/qwen3:14b
      api_base: http://localhost:11434
      rpm: 30
  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

general_settings:
  master_key: sk-your-key
Enter fullscreen mode Exit fullscreen mode

3. Start the Proxy

litellm --config config.yaml --port 4000
Enter fullscreen mode Exit fullscreen mode

4. Use It

from openai import OpenAI
client = OpenAI(api_key="sk-your-key",
  base_url="http://localhost:4000/v1")
response = client.chat.completions.create(
  model="qwen3-local",
  messages=[{"role": "user", "content": "Hello!"}])
Enter fullscreen mode Exit fullscreen mode

Key Features

  • Smart fallback - if local model fails, auto-route to cloud
  • Load balancing - distribute across multiple GPU instances
  • Cost tracking - per-model spend dashboard
  • Rate limiting - control requests per user/key
  • One API - use any tool that supports OpenAI format

Cost vs Cloud

LiteLLM + Ollama Direct Cloud APIs
Gateway Free, self-hosted Free
Local inference $0 N/A
Model switching One endpoint Multiple SDKs
Failover Automatic Manual

Full guide with advanced config examples: https://everylocalai.com/stack/litellm-ollama-gateway

Top comments (0)