DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at aifoss.dev

ollama-open-webui-linux-setup

This article was originally published on aifoss.dev

---
title: 'Ollama + Open WebUI on Linux: 15-Minute Setup Guide'
description: 'Step-by-step guide to installing Ollama and Open WebUI on Linux. Covers Docker and pip install, GPU setup, model management, and first-run configuration.'
pubDate: 'May 21 2026'

tags: ["ollama", "ai", "selfhosted", "llm", "opensource"]

By the end of this guide you'll have a local LLM running under a full browser-based chat interface — no internet required after setup, no API keys, nothing leaving your machine.

The stack: Ollama handles model downloading, management, and inference via a local REST API. Open WebUI sits on top as the chat frontend. Together they replicate the core ChatGPT experience on your own hardware.

Versions used: Ollama v0.24.0, Open WebUI v0.9.5 (May 2026 — check their GitHub pages for newer releases before starting).


What you need

Hardware:

Setup Minimum RAM GPU What you can run
CPU-only 8 GB Not required 3B models at 3–6 tok/s
CPU-only 16 GB Not required 7B models at 3–5 tok/s
GPU (8 GB VRAM) 16 GB CUDA / ROCm / Vulkan 7B–13B at 20–40 tok/s
GPU (16 GB VRAM) 32 GB CUDA / ROCm 30B–34B at 15–25 tok/s

CPU inference is functional for experiments. For daily interactive use, a GPU makes the difference.

Software prerequisites:

  • 64-bit Linux (Ubuntu 22.04+ or Debian 12 recommended)
  • curl for the Ollama installer
  • Docker (preferred) or Python 3.11+ for Open WebUI

Step 1: Install Ollama

The official installer handles the binary, the ollama system user, and a systemd service:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

After it completes, Ollama starts automatically and listens on http://localhost:11434.

Verify the install:

ollama --version
# ollama version is 0.24.0

curl http://localhost:11434
# Ollama is running
Enter fullscreen mode Exit fullscreen mode

NVIDIA GPU: The installer auto-detects CUDA. Nothing extra required.

AMD GPU: The installer attempts ROCm. If GPU layers don't engage, set the override for your GPU architecture:

export HSA_OVERRIDE_GFX_VERSION=10.3.0
Enter fullscreen mode Exit fullscreen mode

Add it to the Ollama service environment if you want it permanent (see model storage section below for how to edit service env).

CPU-only: Nothing extra — Ollama falls back to CPU automatically.

Change the model storage location

By default models go in ~/.ollama/models. For a different drive:

sudo systemctl edit ollama
Enter fullscreen mode Exit fullscreen mode

In the override file that opens:

[Service]
Environment="OLLAMA_MODELS=/mnt/storage/ollama"
Enter fullscreen mode Exit fullscreen mode

Save, then reload:

sudo systemctl daemon-reload && sudo systemctl restart ollama
Enter fullscreen mode Exit fullscreen mode

Service management

sudo systemctl status ollama       # check it's running
sudo systemctl restart ollama      # restart
sudo journalctl -u ollama -f       # live logs
Enter fullscreen mode Exit fullscreen mode

Step 2: Pull a model and verify inference

Before touching Open WebUI, confirm Ollama itself works end-to-end.

ollama pull llama3.2:3b    # 2.0 GB — fast to download for testing
ollama run llama3.2:3b
Enter fullscreen mode Exit fullscreen mode

Type a prompt and press Enter. If you get a coherent response, the inference stack is good. Exit with /bye.

For 7B models with 16 GB RAM or a GPU:

ollama pull gemma3:7b       # 5.0 GB
ollama pull mistral:7b      # 4.1 GB
Enter fullscreen mode Exit fullscreen mode

Check what's downloaded:

ollama list
Enter fullscreen mode Exit fullscreen mode

Step 3: Install Open WebUI

Two paths — Docker is simpler to maintain; pip avoids the Docker dependency.

Option A: Docker (recommended)

Install Docker if you don't have it:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in for the group change
Enter fullscreen mode Exit fullscreen mode

Run Open WebUI with Ollama as the backend. On Linux, host.docker.internal doesn't always resolve — use your LAN IP:

# Find your LAN IP
ip addr show | grep 'inet ' | grep -v 127.0.0.1
Enter fullscreen mode Exit fullscreen mode

Then start the container:

docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://192.168.1.X:11434 \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Replace 192.168.1.X with your actual IP.

Simpler alternative — if Ollama and Open WebUI are on the same host, use host networking and skip the IP lookup:

docker run -d \
  --name open-webui \
  --restart always \
  --network host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

With --network host, Open WebUI runs on port 8080, so open http://localhost:8080.

NVIDIA GPU passthrough:

docker run -d \
  --name open-webui \
  --gpus all \
  --network host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  ghcr.io/open-webui/open-webui:cuda
Enter fullscreen mode Exit fullscreen mode

Option B: pip

pip install open-webui
open-webui serve
Enter fullscreen mode Exit fullscreen mode

First startup takes a minute — it installs frontend dependencies on initial run. Open WebUI starts on port 8080.


Step 4: First login

Navigate to http://localhost:3000 (Docker with -p 3000:8080) or http://localhost:8080 (pip or --network host).

  1. The signup screen appears — create an admin account. This is local-only, no email verification.
  2. Open WebUI auto-detects your Ollama models. Click the model selector at the top of the chat interface.
  3. Select a model and start chatting.

If no models appear, go to Settings → Connections and confirm the Ollama URL shows http://localhost:11434. You can also pull models directly through the UI at Settings → Models — type a model name and click Pull.


Step 5: Configuration worth knowing

System prompts per model:

Settings → Models → select model → System Prompt. Set a persistent persona, coding language preference, or output format rule that applies to every conversation with that model.

Multi-user access:

The first account becomes admin. Subsequent signups are standard users. Open WebUI has full user management — conversations are scoped per user, admin can see usage stats. Good for a shared home server.

Built-in RAG:

Open WebUI includes document upload with RAG. For simple use — upload a PDF, ask questions about it — it works. For a large corpus or persistent document collections, AnythingLLM handles that workflow better.

LAN access from other devices:

Bind to 0.0.0.0 is the default when using --network host. Open your firewall port:

sudo ufw allow 3000/tcp    # or 8080 depending on your setup
Enter fullscreen mode Exit fullscreen mode

Then access from any device on the same network at http://192.168.1.X:3000.


A note on Open WebUI's license

Ollama is MIT-licensed — no restrictions.

Open WebUI has changed licenses several times. As of mid-2026, it uses a custom BSD-3-clause variant with a CLA requirement. For personal and small-team use (under 50 users) it's free to self-host and rebrand. Larger or commercial deployments should read the official license page before deploying. It's not OSI-certified open source, which matters if that distinction is important to your context.


When NOT to use this setup

You need concurrent users or high throughput. Ollama processes one request at a time per model instance. That's fine for a household or a few developers sharing a machine, but falls apart under real load. For multi-user production serving, vLLM is the right tool. If you don't want to manage inference hardware at all, cloud GPU on RunPod runs an A40 for around $0.20/hr — cheaper than buying a GPU if inference is occasional.

You're on Windows or macOS and want a desktop app. LM Studio installs in two clicks, has a native UI, and doesn't require Docker or a browser tab. Open WebUI's browser-based approach is a feature on a Linux server; on a personal laptop it's an extra step.

**You

Top comments (0)