<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Khalifa Muyideen</title>
    <description>The latest articles on DEV Community by Khalifa Muyideen (@khalifornia).</description>
    <link>https://dev.to/khalifornia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3933466%2F83d377c4-a18a-491e-98b6-6ba15760522d.jpg</url>
      <title>DEV Community: Khalifa Muyideen</title>
      <link>https://dev.to/khalifornia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/khalifornia"/>
    <language>en</language>
    <item>
      <title>How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI</title>
      <dc:creator>Khalifa Muyideen</dc:creator>
      <pubDate>Thu, 21 May 2026 15:19:09 +0000</pubDate>
      <link>https://dev.to/khalifornia/how-to-build-a-self-hosted-ai-gateway-with-litellm-and-open-webui-fn3</link>
      <guid>https://dev.to/khalifornia/how-to-build-a-self-hosted-ai-gateway-with-litellm-and-open-webui-fn3</guid>
      <description>&lt;p&gt;If you've ever self-hosted AI tools, you know how quickly things get messy.&lt;/p&gt;

&lt;p&gt;One app talks to OpenAI. Another uses Anthropic. You spin up Ollama locally and now there's a third endpoint to manage. Authentication is different everywhere. Switching models means rewriting integration code. And before long, you're spending more time maintaining glue code than actually building anything.&lt;/p&gt;

&lt;p&gt;I ran into this exact problem — so I built a cleaner setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The idea is simple:&lt;/strong&gt; put a single gateway in front of every provider, so the rest of your stack only ever talks to one API.&lt;/p&gt;

&lt;p&gt;I open-sourced the full working implementation here:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/dixon400/myllm" rel="noopener noreferrer"&gt;github.com/dixon400/myllm&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Clone it. Run &lt;code&gt;docker compose up&lt;/code&gt;. You'll have a working AI gateway in under 30 minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Stack Does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One API&lt;/strong&gt; → OpenAI, Anthropic, Groq, and Ollama all behind a single OpenAI-compatible endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One frontend&lt;/strong&gt; → Open WebUI as the unified chat interface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure remote access&lt;/strong&gt; → Cloudflare Tunnel, no exposed ports, no open firewall rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy to maintain&lt;/strong&gt; → providers can change underneath without touching your apps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full setup takes roughly &lt;strong&gt;20–45 minutes&lt;/strong&gt; depending on Docker image downloads and whether you already have local Ollama models installed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;Nothing exotic here — just well-composed tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LiteLLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gateway / routing layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open WebUI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chat frontend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State + metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker Compose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloudflare Tunnel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure remote exposure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  ↓
Open WebUI
  ↓
LiteLLM (gateway)
  ↓
OpenAI / Anthropic / Groq / Ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: your apps and frontend only talk to LiteLLM. Providers become interchangeable underneath. Add a new model, swap a provider, change routing — nothing else needs to know.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Developers experimenting with local AI infrastructure&lt;/li&gt;
&lt;li&gt;Teams consolidating multiple providers behind one API layer&lt;/li&gt;
&lt;li&gt;Engineers building internal AI tooling&lt;/li&gt;
&lt;li&gt;Anyone tired of maintaining separate provider integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've ever thought "there has to be a simpler way to manage all these AI endpoints" — this is that simpler way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Stack Exists
&lt;/h2&gt;

&lt;p&gt;Most self-hosted AI environments become hard to manage surprisingly fast. One application talks directly to OpenAI. Another uses Anthropic separately. Local Ollama models need their own endpoints. Authentication is inconsistent, and model switching slowly turns into infrastructure sprawl.&lt;/p&gt;

&lt;p&gt;By placing LiteLLM in front of every provider, the rest of your system only needs to understand one interface. Providers can change, local models can be added, routing logic can evolve — without rewriting frontend or application logic every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before starting containers, make sure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker Desktop&lt;/li&gt;
&lt;li&gt;Docker Compose&lt;/li&gt;
&lt;li&gt;&lt;code&gt;curl&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cloudflared&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Ollama (optional, for local models)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quick verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nt"&gt;--version&lt;/span&gt;
docker compose version
cloudflared &lt;span class="nt"&gt;--version&lt;/span&gt;
curl &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If using local Ollama models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If installed models appear, local inference is ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;

&lt;p&gt;The repo is intentionally lightweight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/dixon400/myllm" rel="noopener noreferrer"&gt;github.com/dixon400/myllm&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── Docker-compose.yml
├── litellm-config.yml
└── .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each file has a distinct job: Docker Compose orchestrates services, LiteLLM config handles routing and model aliases, &lt;code&gt;.env&lt;/code&gt; stores secrets and runtime configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up Environment Variables
&lt;/h2&gt;

&lt;p&gt;Your &lt;code&gt;.env&lt;/code&gt; file is where provider credentials live. Create or update it in the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LITELLM_MASTER_KEY=sk-very-strong-key
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GROQ_API_KEY=...
OLLAMA_CLOUD_API_BASE=https://&amp;lt;host&amp;gt;/v1
OLLAMA_CLOUD_API_KEY=...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things that matter more than people expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;LITELLM_MASTER_KEY&lt;/code&gt; becomes the authentication layer between Open WebUI and LiteLLM&lt;/li&gt;
&lt;li&gt;These values should &lt;strong&gt;never&lt;/strong&gt; be committed into Git&lt;/li&gt;
&lt;li&gt;Weak keys become a real security problem once remote access exists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if this starts as a personal setup, treat the environment config like production from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wiring Docker Compose
&lt;/h2&gt;

&lt;p&gt;The goal here is making sure the containers can talk to each other. Most deployment failures come from small config mismatches rather than Docker itself.&lt;/p&gt;

&lt;p&gt;Your &lt;code&gt;Docker-compose.yml&lt;/code&gt; should point Open WebUI directly at LiteLLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;OPENAI_API_BASE_URL=http://litellm:4000/v1&lt;/span&gt;
&lt;span class="s"&gt;OPENAI_API_KEY=${LITELLM_MASTER_KEY}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Open WebUI where the gateway lives and which key to use.&lt;/p&gt;

&lt;p&gt;LiteLLM should mount the configuration file correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;./litellm-config.yml:/app/config.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ That filename matters more than it looks. A typo here can cause Docker to create a directory instead of mounting the file — which leads to confusing startup errors later.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Configuring LiteLLM
&lt;/h2&gt;

&lt;p&gt;Inside &lt;code&gt;litellm-config.yml&lt;/code&gt;, LiteLLM defines model aliases, provider routing, gateway behavior, and authentication settings.&lt;/p&gt;

&lt;p&gt;The most important section is the master key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;general_settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;master_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;os.environ/LITELLM_MASTER_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps secrets outside the YAML file and makes credential rotation easier later.&lt;/p&gt;

&lt;p&gt;Inside &lt;code&gt;model_list&lt;/code&gt;, make sure provider model IDs are current. Model names change more frequently than most people expect — especially across Groq and newer OpenAI releases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Starting the Stack
&lt;/h2&gt;

&lt;p&gt;Once your config looks right, start everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--force-recreate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The initial startup may take a minute while Docker pulls images, initializes PostgreSQL, and creates container state.&lt;/p&gt;

&lt;p&gt;Verify container health:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker ps &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'table {{.Names}}\t{{.Status}}\t{{.Ports}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see &lt;code&gt;open-webui&lt;/code&gt;, &lt;code&gt;litellm&lt;/code&gt;, and &lt;code&gt;litellm-db&lt;/code&gt; all running.&lt;/p&gt;

&lt;p&gt;If a container exits immediately, check its logs before moving forward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker logs &amp;lt;container-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Validating the Gateway
&lt;/h2&gt;

&lt;p&gt;Before touching Open WebUI, validate LiteLLM first. The &lt;code&gt;/v1/models&lt;/code&gt; endpoint confirms authentication works, providers loaded correctly, and model routing initialized.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .env &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:4000/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$LITELLM_MASTER_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For readable output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .env &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:4000/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$LITELLM_MASTER_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this endpoint fails, Open WebUI will almost certainly fail too — so resolve gateway issues first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verifying Open WebUI
&lt;/h2&gt;

&lt;p&gt;Once LiteLLM responds correctly, open the interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should be able to create chats, select models, and send prompts normally.&lt;/p&gt;

&lt;p&gt;If the model dropdown is empty, LiteLLM authentication is usually the cause — mismatched master keys, stale model IDs, or invalid provider credentials.&lt;/p&gt;




&lt;h2&gt;
  
  
  Keeping Provider Models Current
&lt;/h2&gt;

&lt;p&gt;This catches a lot of people off guard: provider model identifiers change more often than you'd think. A deployment that worked perfectly a few months ago can break because a provider deprecated a model name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checking Local Ollama Models
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Checking Groq Models
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .env &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://api.groq.com/openai/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$GROQ_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After updating model IDs in &lt;code&gt;litellm-config.yml&lt;/code&gt;, recreate the stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--force-recreate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Secure Remote Access with Cloudflare Tunnel
&lt;/h2&gt;

&lt;p&gt;At this point, the stack only exists locally. The next step is exposing Open WebUI to the internet — without opening inbound ports, exposing your home IP, or managing reverse proxies manually.&lt;/p&gt;

&lt;p&gt;Cloudflare Tunnel creates an outbound encrypted connection from your machine to Cloudflare's edge. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic HTTPS&lt;/li&gt;
&lt;li&gt;Hidden origin infrastructure&lt;/li&gt;
&lt;li&gt;Cloudflare proxy protection&lt;/li&gt;
&lt;li&gt;Simple DNS management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Move DNS to Cloudflare
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Add your domain to Cloudflare&lt;/li&gt;
&lt;li&gt;Update nameservers at your registrar&lt;/li&gt;
&lt;li&gt;Wait for propagation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Authenticate cloudflared
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a browser window for authorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the Tunnel
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel create openwebui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloudflare generates a tunnel UUID and a credentials JSON file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Route a Subdomain
&lt;/h3&gt;

&lt;p&gt;Assuming your domain is &lt;code&gt;chat.yourdomain.tech&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel route dns openwebui chat.yourdomain.tech
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create the Tunnel Configuration
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;~/.cloudflared/config.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tunnel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openwebui&lt;/span&gt;
&lt;span class="na"&gt;credentials-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.cloudflared/&amp;lt;tunnel-uuid&amp;gt;.json&lt;/span&gt;

&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chat.yourdomain.tech&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http_status:404&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;&amp;lt;tunnel-uuid&amp;gt;&lt;/code&gt; with the generated filename.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run the Tunnel
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel run openwebui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For persistent startup on macOS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared service &lt;span class="nb"&gt;install
&lt;/span&gt;cloudflared service start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the tunnel as a background service is significantly more reliable than keeping it in a terminal session.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verifying Remote Access
&lt;/h2&gt;

&lt;p&gt;Quick DNS and connectivity check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dig +short chat.yourdomain.tech
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-I&lt;/span&gt; https://chat.yourdomain.tech
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Always use HTTPS for remote access. Cloudflare Tunnel is designed around secure proxied traffic.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Operational Health Checks
&lt;/h2&gt;

&lt;p&gt;Once the stack is stable, this single command gives you a quick overview of everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .env &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"== Docker services =="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
docker ps &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'table {{.Names}}\t{{.Status}}\t{{.Ports}}'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;== Local Ollama models =="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
ollama list &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;== Groq model count =="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://api.groq.com/openai/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$GROQ_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'import sys,json; d=json.load(sys.stdin); print(len(d.get("data", [])))'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;== LiteLLM models =="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:4000/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$LITELLM_MASTER_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even for personal deployments, having this kind of visibility saves a lot of debugging time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;Stable deployments drift. Provider APIs change, Docker mounts break, credentials expire, model IDs get deprecated. Here are the most common failure patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open WebUI Loads but Models Are Missing
&lt;/h3&gt;

&lt;p&gt;Empty dropdowns, missing providers, or authentication errors in LiteLLM logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker logs &lt;span class="nt"&gt;--tail&lt;/span&gt; 200 litellm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify model visibility directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .env &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:4000/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$LITELLM_MASTER_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typical fixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify &lt;code&gt;OPENAI_API_KEY=${LITELLM_MASTER_KEY}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Confirm &lt;code&gt;master_key&lt;/code&gt; uses the environment variable&lt;/li&gt;
&lt;li&gt;Recreate containers:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--force-recreate&lt;/span&gt;
docker compose restart open-webui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  LiteLLM Fails with IsADirectoryError
&lt;/h3&gt;

&lt;p&gt;Docker accidentally created a directory instead of mounting the YAML file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; ./litellm-config.yml ./litellm-config.yaml
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"litellm-config"&lt;/span&gt; Docker-compose.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correct mount:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;./litellm-config.yml:/app/config.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then recreate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--force-recreate&lt;/span&gt; litellm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Works Locally but Not Through Cloudflare
&lt;/h3&gt;

&lt;p&gt;If local access works but the public hostname fails, focus on the tunnel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel list
cloudflared tunnel info openwebui
&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.cloudflared/config.yml
dig +short chat.yourdomain.tech
curl &lt;span class="nt"&gt;-I&lt;/span&gt; https://chat.yourdomain.tech
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most remote-access failures come from inactive tunnel connectors, incorrect ingress targets, missing proxied DNS records, or running &lt;code&gt;cloudflared&lt;/code&gt; in a temporary terminal session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models Appear but Generation Fails
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;/v1/models&lt;/code&gt; works but prompts fail — provider credentials may be invalid, quotas exhausted, or model IDs no longer exist.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .env &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nb"&gt;env&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'^(OPENAI_API_KEY|GROQ_API_KEY|ANTHROPIC_API_KEY|LITELLM_MASTER_KEY)='&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/=.*/=&amp;lt;set&amp;gt;/'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then inspect LiteLLM logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker logs &lt;span class="nt"&gt;--tail&lt;/span&gt; 300 litellm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Refreshing provider model IDs solves this surprisingly often.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security Recommendations
&lt;/h2&gt;

&lt;p&gt;Once remote access exists, basic hardening matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a strong &lt;code&gt;LITELLM_MASTER_KEY&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Don't expose LiteLLM directly to the internet&lt;/li&gt;
&lt;li&gt;Rotate provider API keys periodically&lt;/li&gt;
&lt;li&gt;Keep CORS rules restrictive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For private or team usage, &lt;strong&gt;Cloudflare Access&lt;/strong&gt; adds identity-aware access control in front of Open WebUI — worth enabling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Capture a Known-Good Baseline
&lt;/h2&gt;

&lt;p&gt;Once things are stable, save:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker ps&lt;/code&gt; output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/v1/models&lt;/code&gt; output&lt;/li&gt;
&lt;li&gt;Active model aliases&lt;/li&gt;
&lt;li&gt;Tunnel status:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudflared tunnel info openwebui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When something breaks later, comparing against a working snapshot is almost always faster than debugging from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full working implementation — Docker Compose, LiteLLM config, environment setup — is all here:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/dixon400/myllm" rel="noopener noreferrer"&gt;github.com/dixon400/myllm&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Clone it, add your API keys, run &lt;code&gt;docker compose up&lt;/code&gt;, and you'll have a working AI gateway.&lt;/p&gt;

&lt;p&gt;If you found this useful, ⭐ the repo — it helps more people find it.&lt;/p&gt;

&lt;p&gt;Got questions or improvements? Open an issue or drop a comment below. I'm actively maintaining this.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://hackernoon.com/how-to-build-a-self-hosted-ai-gateway-with-litellm-and-open-webui" rel="noopener noreferrer"&gt;HackerNoon&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>devops</category>
      <category>selfhosted</category>
    </item>
  </channel>
</rss>
