DEV Community

Cover image for Run Your Own ChatGPT Offline: Open WebUI + Ollama + Local Knowledge Base

Run Your Own ChatGPT Offline: Open WebUI + Ollama + Local Knowledge Base

“The future of AI will be hybrid: local intelligence with cloud augmentation.” - Satya Nadella

When it comes to running AI assistants privately and offline, you have two main paths: use cloud-hosted LLM APIs like OpenAI or Anthropic, or run models locally on your own machine. Both approaches are valid, but in this article, we’ll focus on building a fully local AI assistant using Open WebUI and Ollama.

Why Local AI?

Because it gives you full privacy, offline access, zero API cost per token, and control over your models and data - especially important for sensitive documents or internal knowledge bases.

As a practical example, we’ll build a local assistant that can answer questions from your own documents (PDFs, notes, markdown files) using a local RAG (Retrieval-Augmented Generation) pipeline.

What Is Open WebUI?

Open WebUI is a self-hosted web interface for local LLMs.
It provides a ChatGPT-like experience in your browser while running models entirely on your own hardware via Ollama.

Key capabilities:

  • Chat with local LLMs
  • Upload and query documents
  • Multi-model switching
  • Local RAG knowledge base
  • User accounts & roles
  • Tool and plugin support

Think of it as:

ChatGPT UI + Local Models + Private Knowledge

Ollama vs Cloud LLM APIs

Cloud APIs (OpenAI, Claude, etc.)

  • Require internet access
  • Pay per token
  • Data leaves your environment
  • Highest model quality
  • No hardware requirements

Ollama Local Models

  • Run fully offline
  • No per-token cost
  • Private data stays local
  • Hardware dependent
  • Slightly lower model quality In this article, we’ll use Ollama for fully local inference.

“Privacy is not an option, and it shouldn’t be the price we accept for just getting on the Internet.” - Gary Kovacs

Index

  1. Set Up Ollama
  2. Install Open WebUI
  3. Download Local LLM
  4. Create Local Knowledge Base
  5. Upload Documents
  6. Ask Questions Over Your Data
  7. How Local RAG Works
  8. Why Local AI Makes Sense
  9. Next Steps You Can Take
  10. Watch Out For
  11. Interesting Facts
  12. FAQ
  13. Conclusion

Building Your Offline ChatGPT: Open WebUI + Ollama

1. Set Up Ollama

Install Ollama:
Mac / Linux:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Windows:
Download installer from https://ollama.com
Verify installation:

ollama --version
Enter fullscreen mode Exit fullscreen mode

Start Ollama service:

ollama serve
Enter fullscreen mode Exit fullscreen mode

Ollama runs a local model server at:

http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

2. Install Open WebUI

The easiest method is Docker.

docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Open your browser:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Create your admin account on first launch.

3. Download a Local LLM

Pull a model via Ollama:

ollama pull llama3
Enter fullscreen mode Exit fullscreen mode

Other good options:

  • mistral
  • qwen2.5
  • phi3
  • codellama Check installed models:
ollama list
Enter fullscreen mode Exit fullscreen mode

Open WebUI will automatically detect Ollama models.

4. Create a Local Knowledge Base (RAG)

Open WebUI includes built-in Retrieval-Augmented Generation.
Steps:

  1. Go to Workspace → Knowledge
  2. Create new knowledge base
  3. Upload documents:
    • PDFs
    • TXT
    • Markdown
    • DOCX
    • HTML

The system automatically:

  • Splits text into chunks
  • Generates embeddings
  • Stores vectors locally
  • Links to your LLM No external vector DB required.

5. Upload Documents

Example:

  • Company docs
  • Personal notes
  • Research papers
  • Codebase
  • Product manuals

Once uploaded, documents become searchable context for the model.

“Data gravity will pull AI to where the data lives.” - Dave McCrory

6. Ask Questions Over Your Data

Now chat normally:
“Summarize our API documentation”
“What does the onboarding process require?”
“Explain section 4 of the PDF”
Open WebUI retrieves relevant chunks and sends them to the model.

This is local RAG in action.

7.How Local RAG Works

Pipeline:
User question
→ Embed query
→ Search local vectors
→ Retrieve relevant chunks
→ Send to LLM with context
→ Generate answer
Everything runs locally.
No cloud.
No API.

8.Why Local AI Makes Sense

  • Full privacy - data never leaves your machine
  • Zero API costs after setup
  • Offline availability
  • Custom models & prompts
  • Internal knowledge assistants
  • Regulatory compliance friendly

Perfect for:

  • Companies
  • Developers
  • Researchers
  • Students
  • Privacy-focused users

“Open models accelerate innovation by removing access barriers.” - Yann LeCun

9.Next Steps You Can Take

  • Connect multiple models (coding + chat)
  • Use larger models with GPU
  • Share internal AI assistant in LAN
  • Index full code repositories
  • Build private ChatGPT for your team
  • Add tools and function calling

10.Watch Out For

  • Hardware limits: large models need RAM/GPU
  • Embedding size: big docs consume storage
  • Model quality: smaller local models less capable
  • Chunking errors: bad splits reduce accuracy
  • Context limits: local models have smaller windows

11.Interesting Facts

12.FAQ

  1. Do I need a GPU to run Open WebUI + Ollama?
    No. Small models run on CPU. GPU improves speed and allows larger models.

  2. Is everything really offline?
    Yes. Models, embeddings, and documents stay local unless you enable external APIs.

  3. What models work best locally?
    Mistral, Llama 3, Qwen, and Phi are strong general-purpose local models.

  4. Can I use my own documents?
    Yes. Upload PDFs, text, markdown, or code to the knowledge base.

  5. Is this secure for company data?
    Yes. Nothing leaves your infrastructure if hosted locally.

  6. How big can my knowledge base be?
    Limited by disk space and embeddings storage. Many GB is fine.

  7. Can multiple users access it?
    Yes. Open WebUI supports accounts and roles.

13.Conclusion

Running your own offline ChatGPT-style assistant is now practical with Open WebUI and Ollama. You get privacy, control, and zero per-token cost while still enabling powerful AI search over your own knowledge.

For individuals, it’s a personal AI brain.
For teams, it’s a private knowledge assistant.
For companies, it’s compliant AI infrastructure.

Local AI isn’t replacing cloud models - it’s becoming the private layer that sits beside them.

About the Author:Ankit is a full-stack developer at AddWebSolution and AI enthusiast who crafts intelligent web solutions with PHP, Laravel, and modern frontend tools.

Top comments (0)