Loki Bein Blodsson

Posted on May 9

Open-WebUI + Ollama Guide: Run LLMs Locally with Docker

#docker #llm #opensource

1️⃣ Introduction

Welcome to the ultimate Open-WebUI guide. If you've ever wanted the power and sleek interface of ChatGPT but with the privacy of a local server, you are in the right place.

Ollama is a lightweight inference engine that makes running large language models (LLMs) dead simple, while Open-WebUI (formerly Ollama WebUI) provides a beautiful, feature-rich, and extensible front-end. By combining them, you can build your own private AI assistant.
Why a self-hosted FOSS version matters:
Absolute Privacy: Your chats, code snippets, and intellectual property never leave your machine.
Zero Subscription Costs: Run powerful open-source models for free.
Offline Access: Work seamlessly even without an internet connection.
TL;DR - What you will accomplish today:
Install Docker & Docker Compose.
Deploy a unified Ollama and Open-WebUI stack using a single file.
Prevent data loss with persistent volumes.
Download and run Llama locally.

2️⃣ Prerequisites

Before we spin up our Ollama local LLM stack, ensure your system meets these baseline requirements:
Hardware: * RAM: 8 GB minimum (16 GB highly recommended to run 7B-8B parameter models).
CPU: Modern multi-core processor.
GPU (Optional but recommended): An NVIDIA GPU with at least 6GB VRAM will drastically improve token generation speed.
Software: Docker and Docker-Compose installed on your system. (If you haven't done this yet, check out our Beginner's Guide to Docker).
Network: Ports 8080 (WebUI) and 11434 (Ollama API) available.

3️⃣ Quick-Start Installation

The biggest mistake beginners make is running Open-WebUI and Ollama in separate, disjointed Docker commands, leading to localhost connection errors. We will solve this by deploying them together in a single docker-compose.yml file.

Create a new directory and create your compose file:

mkdir open-webui-stack && cd open-webui-stack
nano docker-compose.yml

Paste the following configuration:

version: '3.8'

services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
# Uncomment the following lines if you have an NVIDIA GPU and nvidia-docker2 installed
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]

open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- open-webui_data:/app/backend/data
ports:
- "8080:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True
depends_on:
- ollama
restart: unless-stopped

volumes:
ollama_data:
open-webui_data:

Save the file and run:
docker compose up -d

First-run verification: Wait about 60 seconds for the containers to initialize, then open your browser and navigate to http://localhost:8080. You should be greeted by the Open-WebUI login screen!

4️⃣ Detailed Configuration

Let's break down why this configuration solves the most common self-hosting headaches:
Persistent Storage (Volumes): Notice the ollama_data and open-webui_data volumes? Without these, every time you update or restart your container, you would lose your downloaded models and chat history. This setup ensures your data is permanently safe.
Internal Network Routing: By setting OLLAMA_BASE_URL=http://ollama:11434, we tell the WebUI to talk directly to the Ollama container via Docker's internal DNS. This completely bypasses annoying localhost or 127.0.0.1 routing conflicts.
Authentication (WEBUI_AUTH=True): This forces users to create an account before accessing the AI, securing your server from unauthorized use.

Pro-Tip: If you want to access this outside your home network, we highly recommend putting Open-WebUI behind Nginx Proxy Manager or Traefik with an SSL certificate.

5️⃣ Common Use-Cases & Mini-Projects

Downloading Your First Model
Once logged into Open-WebUI, click on the Settings gear, navigate to Models, and type llama3 or llama3.1 into the pull model field. Click download.
Alternatively, you can pull a model directly via your terminal:
docker exec -it ollama ollama run llama3.1

API Access for Developers
Because we exposed port 11434, you can use your new local LLM server just like the OpenAI API. Test it with this simple curl request:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Why is the sky blue? Explain in one sentence.",
"stream": false
}'

6️⃣ Troubleshooting & FAQ

Q: My GPU isn't being detected by Ollama. Tokens are generating very slowly!

A: If you are on Linux with an NVIDIA card, you must install the NVIDIA Container Toolkit (nvidia-docker2). Once installed, uncomment the deploy block in the docker-compose.yml file and restart the stack (docker compose up -d --force-recreate).

Q: I keep getting "Out of Memory" errors on my 8GB RAM machine.

A: Standard 7B or 8B models might be too heavy for your system. Switch to a smaller, highly efficient model. Try pulling gemma:2b or Microsoft's phi3 inside the Open-WebUI interface.

Q: Open-WebUI says "Ollama connection failed."

A: Double-check that your OLLAMA_BASE_URL is set to http://ollama:11434 (not localhost) and that the ollama container is running without restart loops (docker ps).

7️⃣ Security & Production Hardening

If you plan to expose this setup to the internet, you must harden it:
Disable Open Signups: Once you have created your admin account, go to the WebUI Admin Panel -> Settings -> General, and turn off "Enable New User Signups".
Backup Strategy: Regularly back up your Docker volumes. You can easily tarball your volumes located in /var/lib/docker/volumes/ to keep your chat history safe.
Reverse Proxy: Never expose port 8080 directly to the web. Route it through a proxy manager with Let's Encrypt SSL.

8️⃣ Extending the Stack

Your private AI assistant doesn't have to exist in a vacuum. You can seamlessly integrate this setup with other FOSS homelab tools:
Give it internet access: Connect Open-WebUI to SearXNG (Self-Hosted Search Engine) to allow your LLM to scrape the live web.
Safe Code Execution: Integrate Open-Terminal to give your AI agents a sandboxed browser-based shell to write and test code safely.

9️⃣ Conclusion & Next Steps

You now have a fully functional, highly secure, and persistent Ollama local LLM server with a gorgeous user interface. You've eliminated third-party privacy risks and unlocked the world of open-weight AI models.

DEV Community

Open-WebUI + Ollama Guide: Run LLMs Locally with Docker

Top comments (0)