Want a ChatGPT-like experience that runs entirely on your own GPU? No monthly fees, no data leaving your machine, and it works offline. Here's how to set it up in 15 minutes.
What You'll Build
- A full ChatGPT-style web UI running locally
- Your choice of open-source LLM (Qwen3 14B or Llama 3.1 8B)
- Multiple user accounts for your LAN
- 100% private - nothing leaves your network
Prerequisites
- A GPU with 12GB+ VRAM (RTX 3060 12GB works great)
- Docker + Docker Compose installed
- NVIDIA Container Toolkit for GPU passthrough (Linux) or WSL2 (Windows)
Setup
Create a docker-compose.yml file:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui:/app/backend/data
ports:
- "3000:8080"
restart: unless-stopped
volumes:
ollama:
open-webui:
Run It
docker compose up -d
docker exec ollama ollama pull qwen3:14b
Open http://localhost:3000, create your admin account, pick qwen3:14b from the dropdown, and start chatting.
What Makes It Great
- $0/month vs $20/month for ChatGPT Plus
- Full privacy - conversations stay on your machine
- Works offline - no internet connection needed after setup
- Multi-user - share with family or your team on the same LAN
- Model switching - swap between different models mid-conversation
Performance
On an RTX 3060 12GB with Qwen3 14B (Q4): ~20-25 tok/s, smooth for chat. For 8GB cards, use Llama 3.1 8B instead.
Originally published on everylocalai.com
Top comments (0)