Running large language models locally has never been easier thanks to Ollama. It allows you to download, run, and manage LLMs on your own machine with minimal configuration. In this guide, you’ll learn how to install Ollama on both Linux and Windows, configure it properly, and run your first model.
🚀 What is Ollama?
Ollama is a lightweight runtime that lets you run open-source LLMs locally (like Llama, Mistral, Gemma, and others). It handles model downloading, optimization, and inference through a simple CLI and API.
Key benefits:
Runs models locally (privacy-first)
Simple CLI interface
Supports multiple LLMs
Works on Linux, Windows (WSL2), and macOS
Built-in model management
🐧 How to Install Ollama on Linux
1. System Requirements
Before installing, make sure you have:
Linux distro (Ubuntu recommended)
64-bit system
At least 8GB RAM (16GB+ recommended for larger models)
GPU optional (NVIDIA improves performance)
2. Install via official script
Open terminal and run:
curl -fsSL https://ollama.com/install.sh | sh
This script will:
Download Ollama
Install binaries
Set up system service (if supported)
3. Verify installation
ollama --version
If installed correctly, you should see version output.
4. Run your first model
ollama run llama3
This will:
Download the model (first run only)
Start interactive chat session
- Useful Linux commands
ollama list # show installed models
ollama pull mistral # download model
ollama rm llama3 # remove model
🪟 How to Install Ollama on Windows
Windows installation is slightly different because it uses a native app or WSL2.
**Option 1: Native Windows Installation (Recommended)
- Download installer**
Go to:
👉 https://ollama.com/download
Download the Windows installer (.exe).
2. Install
Run the installer
Follow setup wizard
Ollama will install system services automatically
3. Verify installation
Open PowerShell:
ollama --version
4. Run a model
ollama run llama3
Option 2: Install via WSL2 (Advanced users)
If you want Linux-like performance:
1. Enable WSL2
wsl --install
Restart your system.
2. Install Ubuntu
From Microsoft Store, install Ubuntu.
3. Install Ollama inside WSL
curl -fsSL https://ollama.com/install.sh | sh
4. Run model
ollama run mistral
⚡ Running Models with Ollama
Once installed, you can run different models:
ollama run llama3
ollama run mistral
ollama run gemma
You can also pass prompts directly:
ollama run llama3 "Explain quantum computing in simple terms"
🔌 Using Ollama API (Local AI Server)
Ollama runs a local API server automatically.
Example request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Write a blog about AI"
}'
This makes Ollama usable for:
apps
bots
automation
coding assistants
🧠 Performance Tips
Use smaller models (3B–8B) for CPU-only machines
Enable GPU acceleration on NVIDIA systems
Close heavy apps to free RAM
Use quantized models for better speed
🧩 Common Issues & Fixes
❌ Command not found
Restart terminal
Check PATH variables
❌ Slow performance
Use smaller model
Ensure GPU drivers are installed
❌ Model download stuck
ollama rm
ollama pull
🔥 Final Thoughts
Ollama is one of the easiest ways to run local AI models on Linux and Windows in 2026. Whether you're a developer, researcher, or AI enthusiast, it provides a simple yet powerful way to bring LLMs directly to your machine without relying on cloud APIs.
If you're building AI tools or experimenting with local inference, Ollama is the fastest way to get started.
Source: https://inferencerig.com/setup/how-to-install-ollama-on-linux-and-windows-complete-setup-guide/
Top comments (0)