Jovan Chan

Posted on Jun 2 • Originally published at aifoss.dev

ollama-open-webui-linux-setup

#opensource #ai #selfhosted #linux

This article was originally published on aifoss.dev

---
title: 'Ollama + Open WebUI on Linux: 15-Minute Setup Guide'
description: 'Step-by-step guide to installing Ollama and Open WebUI on Linux. Covers Docker and pip install, GPU setup, model management, and first-run configuration.'
pubDate: 'May 21 2026'

tags: ["ollama", "ai", "selfhosted", "llm", "opensource"]

By the end of this guide you'll have a local LLM running under a full browser-based chat interface — no internet required after setup, no API keys, nothing leaving your machine.

The stack: Ollama handles model downloading, management, and inference via a local REST API. Open WebUI sits on top as the chat frontend. Together they replicate the core ChatGPT experience on your own hardware.

Versions used: Ollama v0.24.0, Open WebUI v0.9.5 (May 2026 — check their GitHub pages for newer releases before starting).

What you need

Hardware:

Setup	Minimum RAM	GPU	What you can run
CPU-only	8 GB	Not required	3B models at 3–6 tok/s
CPU-only	16 GB	Not required	7B models at 3–5 tok/s
GPU (8 GB VRAM)	16 GB	CUDA / ROCm / Vulkan	7B–13B at 20–40 tok/s
GPU (16 GB VRAM)	32 GB	CUDA / ROCm	30B–34B at 15–25 tok/s

CPU inference is functional for experiments. For daily interactive use, a GPU makes the difference.

Software prerequisites:

64-bit Linux (Ubuntu 22.04+ or Debian 12 recommended)
curl for the Ollama installer
Docker (preferred) or Python 3.11+ for Open WebUI

Step 1: Install Ollama

The official installer handles the binary, the ollama system user, and a systemd service:

curl -fsSL https://ollama.com/install.sh | sh

After it completes, Ollama starts automatically and listens on http://localhost:11434.

Verify the install:

ollama --version
# ollama version is 0.24.0

curl http://localhost:11434
# Ollama is running

NVIDIA GPU: The installer auto-detects CUDA. Nothing extra required.

AMD GPU: The installer attempts ROCm. If GPU layers don't engage, set the override for your GPU architecture:

export HSA_OVERRIDE_GFX_VERSION=10.3.0

Add it to the Ollama service environment if you want it permanent (see model storage section below for how to edit service env).

CPU-only: Nothing extra — Ollama falls back to CPU automatically.

Change the model storage location

By default models go in ~/.ollama/models. For a different drive:

sudo systemctl edit ollama

In the override file that opens:

[Service]
Environment="OLLAMA_MODELS=/mnt/storage/ollama"

Save, then reload:

sudo systemctl daemon-reload && sudo systemctl restart ollama

Service management

sudo systemctl status ollama       # check it's running
sudo systemctl restart ollama      # restart
sudo journalctl -u ollama -f       # live logs

Step 2: Pull a model and verify inference

Before touching Open WebUI, confirm Ollama itself works end-to-end.

ollama pull llama3.2:3b    # 2.0 GB — fast to download for testing
ollama run llama3.2:3b

Type a prompt and press Enter. If you get a coherent response, the inference stack is good. Exit with /bye.

For 7B models with 16 GB RAM or a GPU:

ollama pull gemma3:7b       # 5.0 GB
ollama pull mistral:7b      # 4.1 GB

Check what's downloaded:

ollama list

Step 3: Install Open WebUI

Two paths — Docker is simpler to maintain; pip avoids the Docker dependency.

Option A: Docker (recommended)

Install Docker if you don't have it:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in for the group change

Run Open WebUI with Ollama as the backend. On Linux, host.docker.internal doesn't always resolve — use your LAN IP:

# Find your LAN IP
ip addr show | grep 'inet ' | grep -v 127.0.0.1

Then start the container:

docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://192.168.1.X:11434 \
  ghcr.io/open-webui/open-webui:main

Replace 192.168.1.X with your actual IP.

Simpler alternative — if Ollama and Open WebUI are on the same host, use host networking and skip the IP lookup:

docker run -d \
  --name open-webui \
  --restart always \
  --network host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  ghcr.io/open-webui/open-webui:main

With --network host, Open WebUI runs on port 8080, so open http://localhost:8080.

NVIDIA GPU passthrough:

docker run -d \
  --name open-webui \
  --gpus all \
  --network host \
  -v open-webui:/app/backend/data \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  ghcr.io/open-webui/open-webui:cuda

Option B: pip

pip install open-webui
open-webui serve

First startup takes a minute — it installs frontend dependencies on initial run. Open WebUI starts on port 8080.

Step 4: First login

Navigate to http://localhost:3000 (Docker with -p 3000:8080) or http://localhost:8080 (pip or --network host).

The signup screen appears — create an admin account. This is local-only, no email verification.
Open WebUI auto-detects your Ollama models. Click the model selector at the top of the chat interface.
Select a model and start chatting.

If no models appear, go to Settings → Connections and confirm the Ollama URL shows http://localhost:11434. You can also pull models directly through the UI at Settings → Models — type a model name and click Pull.

Step 5: Configuration worth knowing

System prompts per model:

Settings → Models → select model → System Prompt. Set a persistent persona, coding language preference, or output format rule that applies to every conversation with that model.

Multi-user access:

The first account becomes admin. Subsequent signups are standard users. Open WebUI has full user management — conversations are scoped per user, admin can see usage stats. Good for a shared home server.

Built-in RAG:

Open WebUI includes document upload with RAG. For simple use — upload a PDF, ask questions about it — it works. For a large corpus or persistent document collections, AnythingLLM handles that workflow better.

LAN access from other devices:

Bind to 0.0.0.0 is the default when using --network host. Open your firewall port:

sudo ufw allow 3000/tcp    # or 8080 depending on your setup

Then access from any device on the same network at http://192.168.1.X:3000.

A note on Open WebUI's license

Ollama is MIT-licensed — no restrictions.

Open WebUI has changed licenses several times. As of mid-2026, it uses a custom BSD-3-clause variant with a CLA requirement. For personal and small-team use (under 50 users) it's free to self-host and rebrand. Larger or commercial deployments should read the official license page before deploying. It's not OSI-certified open source, which matters if that distinction is important to your context.

When NOT to use this setup

You need concurrent users or high throughput. Ollama processes one request at a time per model instance. That's fine for a household or a few developers sharing a machine, but falls apart under real load. For multi-user production serving, vLLM is the right tool. If you don't want to manage inference hardware at all, cloud GPU on RunPod runs an A40 for around $0.20/hr — cheaper than buying a GPU if inference is occasional.

You're on Windows or macOS and want a desktop app. LM Studio installs in two clicks, has a native UI, and doesn't require Docker or a browser tab. Open WebUI's browser-based approach is a feature on a Linux server; on a personal laptop it's an extra step.

**You

DEV Community