DEV Community

Cover image for Running Local GGUF Models with Ollama (GPU Enabled)
KALPESH
KALPESH

Posted on

Running Local GGUF Models with Ollama (GPU Enabled)

1. Install & Start Ollama

curl -fsSL https://ollama.com/install.sh | sh
systemctl start ollama
ollama --version
Enter fullscreen mode Exit fullscreen mode

2. Verify GPU Detection

NVIDIA

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

AMD

rocm-smi
Enter fullscreen mode Exit fullscreen mode

3. Set Up Model Directory

mkdir -p ~/Documents/LLM
cd ~/Documents/LLM
# Copy your .gguf file here
Enter fullscreen mode Exit fullscreen mode

4. Create a Modelfile

vim Modelfile
Enter fullscreen mode Exit fullscreen mode

Vim quick reference:

  • i — enter insert mode (start typing)
  • Esc — exit insert mode
  • :wq — save and quit
  • :q! — quit without saving
FROM ./Phi-4-mini-instruct-Q4_K_M.gguf

SYSTEM """
You are a helpful AI assistant.
"""

TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
"""

PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "<|end|>"
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
Enter fullscreen mode Exit fullscreen mode

Note: Always include TEMPLATE for custom GGUFs. Use instruct/chat variants, not base models.


5. Create & Run the Model

ollama create mymodel -f Modelfile
ollama run mymodel
Enter fullscreen mode Exit fullscreen mode

6. Verify GPU Usage

Open a second terminal and monitor VRAM — an increase confirms GPU acceleration.

# NVIDIA
watch -n 1 nvidia-smi

# AMD
watch -n 1 rocm-smi
Enter fullscreen mode Exit fullscreen mode

To confirm via logs:

journalctl -u ollama -f
# Look for: "using CUDA" or "offloading layers to GPU"
Enter fullscreen mode Exit fullscreen mode

7. Ollama Command Reference

Model Management

Task Command
Pull a model ollama pull <model>
Create from Modelfile ollama create <name> -f Modelfile
List installed models ollama list
Show model details ollama show <model>
Copy a model ollama cp <source> <dest>
Remove a model ollama rm <model>
Push model to registry ollama push <model>

Running Models

Task Command
Run model (interactive) ollama run <model>
Run with single prompt ollama run <model> "your prompt"
Run with stdin input `echo "prompt" \
Show running models {% raw %}ollama ps
Stop a running model ollama stop <model>

In-Chat Commands

Command Action
/clear Clear chat history
/bye Exit chat
/set parameter <key> <val> Change param on the fly
/show info Show model info
/show modelfile Show current Modelfile
/show parameters Show active parameters
/help List all in-chat commands

API (REST)

Ollama runs a local server at http://localhost:11434.

# Generate (single turn)
curl http://localhost:11434/api/generate -d '{
  "model": "mymodel",
  "prompt": "Explain Docker in simple terms",
  "stream": false
}'

# Chat (multi-turn)
curl http://localhost:11434/api/chat -d '{
  "model": "mymodel",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}'

# List models via API
curl http://localhost:11434/api/tags

# Check running models
curl http://localhost:11434/api/ps
Enter fullscreen mode Exit fullscreen mode

8. Manage Ollama Service (systemctl)

Start / Stop / Restart

# Start Ollama service
systemctl start ollama

# Stop Ollama service
systemctl stop ollama

# Restart Ollama service
systemctl restart ollama
Enter fullscreen mode Exit fullscreen mode

Status & Logs

# Check service status
systemctl status ollama

# View live logs
journalctl -u ollama -f

# View last 50 log lines
journalctl -u ollama -n 50
Enter fullscreen mode Exit fullscreen mode

Enable / Disable on Boot

# Enable Ollama to start on boot
systemctl enable ollama

# Disable autostart
systemctl disable ollama

# Check if enabled
systemctl is-enabled ollama
Enter fullscreen mode Exit fullscreen mode

9. Gollama — Chat TUI for Ollama

Gollama is a terminal chat interface for Ollama with conversation history saved via SQLite.

Install Go (Fedora)

sudo dnf install golang -y
go version
Enter fullscreen mode Exit fullscreen mode

Install Gollama

go install github.com/gaurav-gosain/gollama@latest

# Add Go binaries to PATH
echo 'export PATH=$PATH:~/go/bin' >> ~/.bashrc
source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Launch

gollama
Enter fullscreen mode Exit fullscreen mode

Keyboard Shortcuts

Key Action
/ k Navigate up
/ j Navigate down
Ctrl+N New chat
/ Fuzzy search chats
d Delete chat
Ctrl+C Quit

Top comments (0)