Khalifa Muyideen

Posted on May 21 • Originally published at hackernoon.com

How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI

#docker #selfhosted #ai #devops

If you've ever self-hosted AI tools, you know how quickly things get messy.

One app talks to OpenAI. Another uses Anthropic. You spin up Ollama locally and now there's a third endpoint to manage. Authentication is different everywhere. Switching models means rewriting integration code. And before long, you're spending more time maintaining glue code than actually building anything.

I ran into this exact problem — so I built a cleaner setup.

The idea is simple: put a single gateway in front of every provider, so the rest of your stack only ever talks to one API.

I open-sourced the full working implementation here:

🔗 github.com/dixon400/myllm

Clone it. Run docker compose up. You'll have a working AI gateway in under 30 minutes.

What This Stack Does

One API → OpenAI, Anthropic, Groq, and Ollama all behind a single OpenAI-compatible endpoint
One frontend → Open WebUI as the unified chat interface
Secure remote access → Cloudflare Tunnel, no exposed ports, no open firewall rules
Easy to maintain → providers can change underneath without touching your apps

The full setup takes roughly 20–45 minutes depending on Docker image downloads and whether you already have local Ollama models installed.

The Stack

Nothing exotic here — just well-composed tools:

Component	Role
LiteLLM	Gateway / routing layer
Open WebUI	Chat frontend
PostgreSQL	State + metadata
Docker Compose	Orchestration
Cloudflare Tunnel	Secure remote exposure

Architecture

User
  ↓
Open WebUI
  ↓
LiteLLM (gateway)
  ↓
OpenAI / Anthropic / Groq / Ollama

The key insight: your apps and frontend only talk to LiteLLM. Providers become interchangeable underneath. Add a new model, swap a provider, change routing — nothing else needs to know.

Who This Is For

Developers experimenting with local AI infrastructure
Teams consolidating multiple providers behind one API layer
Engineers building internal AI tooling
Anyone tired of maintaining separate provider integrations

If you've ever thought "there has to be a simpler way to manage all these AI endpoints" — this is that simpler way.

Why This Stack Exists

Most self-hosted AI environments become hard to manage surprisingly fast. One application talks directly to OpenAI. Another uses Anthropic separately. Local Ollama models need their own endpoints. Authentication is inconsistent, and model switching slowly turns into infrastructure sprawl.

By placing LiteLLM in front of every provider, the rest of your system only needs to understand one interface. Providers can change, local models can be added, routing logic can evolve — without rewriting frontend or application logic every time.

Prerequisites

Before starting containers, make sure you have:

Docker Desktop
Docker Compose
curl
cloudflared
Ollama (optional, for local models)

Quick verification:

docker --version
docker compose version
cloudflared --version
curl --version

If using local Ollama models:

ollama list

If installed models appear, local inference is ready.

Repository Structure

The repo is intentionally lightweight:

🔗 github.com/dixon400/myllm

├── Docker-compose.yml
├── litellm-config.yml
└── .env

Each file has a distinct job: Docker Compose orchestrates services, LiteLLM config handles routing and model aliases, .env stores secrets and runtime configuration.

Setting Up Environment Variables

Your .env file is where provider credentials live. Create or update it in the project root:

LITELLM_MASTER_KEY=sk-very-strong-key
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GROQ_API_KEY=...
OLLAMA_CLOUD_API_BASE=https://<host>/v1
OLLAMA_CLOUD_API_KEY=...

A few things that matter more than people expect:

LITELLM_MASTER_KEY becomes the authentication layer between Open WebUI and LiteLLM
These values should never be committed into Git
Weak keys become a real security problem once remote access exists

Even if this starts as a personal setup, treat the environment config like production from day one.

Wiring Docker Compose

The goal here is making sure the containers can talk to each other. Most deployment failures come from small config mismatches rather than Docker itself.

Your Docker-compose.yml should point Open WebUI directly at LiteLLM:

OPENAI_API_BASE_URL=http://litellm:4000/v1
OPENAI_API_KEY=${LITELLM_MASTER_KEY}

This tells Open WebUI where the gateway lives and which key to use.

LiteLLM should mount the configuration file correctly:

./litellm-config.yml:/app/config.yaml

⚠️ That filename matters more than it looks. A typo here can cause Docker to create a directory instead of mounting the file — which leads to confusing startup errors later.

Configuring LiteLLM

Inside litellm-config.yml, LiteLLM defines model aliases, provider routing, gateway behavior, and authentication settings.

The most important section is the master key:

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

This keeps secrets outside the YAML file and makes credential rotation easier later.

Inside model_list, make sure provider model IDs are current. Model names change more frequently than most people expect — especially across Groq and newer OpenAI releases.

Starting the Stack

Once your config looks right, start everything:

docker compose up -d --force-recreate

The initial startup may take a minute while Docker pulls images, initializes PostgreSQL, and creates container state.

Verify container health:

docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

You should see open-webui, litellm, and litellm-db all running.

If a container exits immediately, check its logs before moving forward:

docker logs <container-name>

Validating the Gateway

Before touching Open WebUI, validate LiteLLM first. The /v1/models endpoint confirms authentication works, providers loaded correctly, and model routing initialized.

set -a && source .env && \
curl -s http://localhost:4000/v1/models \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY"

For readable output:

set -a && source .env && \
curl -s http://localhost:4000/v1/models \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  | python3 -m json.tool | head -n 80

If this endpoint fails, Open WebUI will almost certainly fail too — so resolve gateway issues first.

Verifying Open WebUI

Once LiteLLM responds correctly, open the interface:

http://localhost:3000

You should be able to create chats, select models, and send prompts normally.

If the model dropdown is empty, LiteLLM authentication is usually the cause — mismatched master keys, stale model IDs, or invalid provider credentials.

Keeping Provider Models Current

This catches a lot of people off guard: provider model identifiers change more often than you'd think. A deployment that worked perfectly a few months ago can break because a provider deprecated a model name.

Checking Local Ollama Models

ollama list

Checking Groq Models

set -a && source .env && \
curl -s https://api.groq.com/openai/v1/models \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json"

After updating model IDs in litellm-config.yml, recreate the stack:

docker compose up -d --force-recreate

Secure Remote Access with Cloudflare Tunnel

At this point, the stack only exists locally. The next step is exposing Open WebUI to the internet — without opening inbound ports, exposing your home IP, or managing reverse proxies manually.

Cloudflare Tunnel creates an outbound encrypted connection from your machine to Cloudflare's edge. You get:

Automatic HTTPS
Hidden origin infrastructure
Cloudflare proxy protection
Simple DNS management

Move DNS to Cloudflare

Add your domain to Cloudflare
Update nameservers at your registrar
Wait for propagation

Authenticate cloudflared

cloudflared tunnel login

This opens a browser window for authorization.

Create the Tunnel

cloudflared tunnel create openwebui

Cloudflare generates a tunnel UUID and a credentials JSON file.

Route a Subdomain

Assuming your domain is chat.yourdomain.tech:

cloudflared tunnel route dns openwebui chat.yourdomain.tech

Create the Tunnel Configuration

Create ~/.cloudflared/config.yml:

tunnel: openwebui
credentials-file: ~/.cloudflared/<tunnel-uuid>.json

ingress:
  - hostname: chat.yourdomain.tech
    service: http://localhost:3000
  - service: http_status:404

Replace <tunnel-uuid> with the generated filename.

Run the Tunnel

cloudflared tunnel run openwebui

For persistent startup on macOS:

cloudflared service install
cloudflared service start

Running the tunnel as a background service is significantly more reliable than keeping it in a terminal session.

Verifying Remote Access

Quick DNS and connectivity check:

dig +short chat.yourdomain.tech

curl -I https://chat.yourdomain.tech

Always use HTTPS for remote access. Cloudflare Tunnel is designed around secure proxied traffic.

Operational Health Checks

Once the stack is stable, this single command gives you a quick overview of everything:

set -a && source .env && \
echo "== Docker services ==" && \
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}' && \
echo "\n== Local Ollama models ==" && \
ollama list && \
echo "\n== Groq model count ==" && \
curl -s https://api.groq.com/openai/v1/models \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  | python3 -c 'import sys,json; d=json.load(sys.stdin); print(len(d.get("data", [])))' && \
echo "\n== LiteLLM models ==" && \
curl -s http://localhost:4000/v1/models \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY"

Even for personal deployments, having this kind of visibility saves a lot of debugging time.

Troubleshooting

Stable deployments drift. Provider APIs change, Docker mounts break, credentials expire, model IDs get deprecated. Here are the most common failure patterns.

Open WebUI Loads but Models Are Missing

Empty dropdowns, missing providers, or authentication errors in LiteLLM logs.

docker logs --tail 200 litellm

Verify model visibility directly:

set -a && source .env && \
curl -s http://localhost:4000/v1/models \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY"

Typical fixes:

Verify OPENAI_API_KEY=${LITELLM_MASTER_KEY}
Confirm master_key uses the environment variable
Recreate containers:

docker compose up -d --force-recreate
docker compose restart open-webui

LiteLLM Fails with IsADirectoryError

Docker accidentally created a directory instead of mounting the YAML file.

ls -la ./litellm-config.yml ./litellm-config.yaml
grep -n "litellm-config" Docker-compose.yml

Correct mount:

./litellm-config.yml:/app/config.yaml

Then recreate:

docker compose up -d --force-recreate litellm

Works Locally but Not Through Cloudflare

If local access works but the public hostname fails, focus on the tunnel:

cloudflared tunnel list
cloudflared tunnel info openwebui
cat ~/.cloudflared/config.yml
dig +short chat.yourdomain.tech
curl -I https://chat.yourdomain.tech

Most remote-access failures come from inactive tunnel connectors, incorrect ingress targets, missing proxied DNS records, or running cloudflared in a temporary terminal session.

Models Appear but Generation Fails

If /v1/models works but prompts fail — provider credentials may be invalid, quotas exhausted, or model IDs no longer exist.

set -a && source .env && \
env | grep -E '^(OPENAI_API_KEY|GROQ_API_KEY|ANTHROPIC_API_KEY|LITELLM_MASTER_KEY)=' \
  | sed 's/=.*/=<set>/'

Then inspect LiteLLM logs:

docker logs --tail 300 litellm

Refreshing provider model IDs solves this surprisingly often.

Security Recommendations

Once remote access exists, basic hardening matters:

Use a strong LITELLM_MASTER_KEY
Don't expose LiteLLM directly to the internet
Rotate provider API keys periodically
Keep CORS rules restrictive

For private or team usage, Cloudflare Access adds identity-aware access control in front of Open WebUI — worth enabling.

Capture a Known-Good Baseline

Once things are stable, save:

docker ps output
/v1/models output
Active model aliases
Tunnel status:

cloudflared tunnel info openwebui

When something breaks later, comparing against a working snapshot is almost always faster than debugging from scratch.

Try It Yourself

The full working implementation — Docker Compose, LiteLLM config, environment setup — is all here:

🔗 github.com/dixon400/myllm

Clone it, add your API keys, run docker compose up, and you'll have a working AI gateway.

If you found this useful, ⭐ the repo — it helps more people find it.

Got questions or improvements? Open an issue or drop a comment below. I'm actively maintaining this.

Originally published on HackerNoon.

DEV Community

How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI

What This Stack Does

The Stack

Architecture

Who This Is For

Why This Stack Exists

Prerequisites

Repository Structure

Setting Up Environment Variables

Wiring Docker Compose

Configuring LiteLLM

Starting the Stack

Validating the Gateway

Verifying Open WebUI

Keeping Provider Models Current

Checking Local Ollama Models

Checking Groq Models

Secure Remote Access with Cloudflare Tunnel

Move DNS to Cloudflare

Authenticate cloudflared

Create the Tunnel

Route a Subdomain

Create the Tunnel Configuration

Run the Tunnel

Verifying Remote Access

Operational Health Checks

Troubleshooting

Open WebUI Loads but Models Are Missing

LiteLLM Fails with IsADirectoryError

Works Locally but Not Through Cloudflare

Models Appear but Generation Fails

Security Recommendations

Capture a Known-Good Baseline

Try It Yourself

Top comments (0)