KALPESH

Posted on May 16

Running Local GGUF Models with Ollama (GPU Enabled)

#ai #linux #llm #tutorial

1. Install & Start Ollama

curl -fsSL https://ollama.com/install.sh | sh
systemctl start ollama
ollama --version

2. Verify GPU Detection

NVIDIA

nvidia-smi

AMD

rocm-smi

3. Set Up Model Directory

mkdir -p ~/Documents/LLM
cd ~/Documents/LLM
# Copy your .gguf file here

4. Create a Modelfile

vim Modelfile

Vim quick reference:

i — enter insert mode (start typing)
Esc — exit insert mode
:wq — save and quit
:q! — quit without saving

FROM ./Phi-4-mini-instruct-Q4_K_M.gguf

SYSTEM """
You are a helpful AI assistant.
"""

TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
"""

PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "<|end|>"
PARAMETER temperature 0.7
PARAMETER num_ctx 8192

Note: Always include TEMPLATE for custom GGUFs. Use instruct/chat variants, not base models.

5. Create & Run the Model

ollama create mymodel -f Modelfile
ollama run mymodel

6. Verify GPU Usage

Open a second terminal and monitor VRAM — an increase confirms GPU acceleration.

# NVIDIA
watch -n 1 nvidia-smi

# AMD
watch -n 1 rocm-smi

To confirm via logs:

journalctl -u ollama -f
# Look for: "using CUDA" or "offloading layers to GPU"

7. Ollama Command Reference

Model Management

Task	Command
Pull a model	`ollama pull <model>`
Create from Modelfile	`ollama create <name> -f Modelfile`
List installed models	`ollama list`
Show model details	`ollama show <model>`
Copy a model	`ollama cp <source> <dest>`
Remove a model	`ollama rm <model>`
Push model to registry	`ollama push <model>`

Running Models

Task	Command
Run model (interactive)	`ollama run <model>`
Run with single prompt	`ollama run <model> "your prompt"`
Run with stdin input	`echo "prompt" \
Show running models	{% raw %}`ollama ps`
Stop a running model	`ollama stop <model>`

In-Chat Commands

Command	Action
`/clear`	Clear chat history
`/bye`	Exit chat
`/set parameter <key> <val>`	Change param on the fly
`/show info`	Show model info
`/show modelfile`	Show current Modelfile
`/show parameters`	Show active parameters
`/help`	List all in-chat commands

API (REST)

Ollama runs a local server at http://localhost:11434.

# Generate (single turn)
curl http://localhost:11434/api/generate -d '{
  "model": "mymodel",
  "prompt": "Explain Docker in simple terms",
  "stream": false
}'

# Chat (multi-turn)
curl http://localhost:11434/api/chat -d '{
  "model": "mymodel",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}'

# List models via API
curl http://localhost:11434/api/tags

# Check running models
curl http://localhost:11434/api/ps

8. Manage Ollama Service (systemctl)

Start / Stop / Restart

# Start Ollama service
systemctl start ollama

# Stop Ollama service
systemctl stop ollama

# Restart Ollama service
systemctl restart ollama

Status & Logs

# Check service status
systemctl status ollama

# View live logs
journalctl -u ollama -f

# View last 50 log lines
journalctl -u ollama -n 50

Enable / Disable on Boot

# Enable Ollama to start on boot
systemctl enable ollama

# Disable autostart
systemctl disable ollama

# Check if enabled
systemctl is-enabled ollama

9. Gollama — Chat TUI for Ollama

Gollama is a terminal chat interface for Ollama with conversation history saved via SQLite.

Install Go (Fedora)

sudo dnf install golang -y
go version

Install Gollama

go install github.com/gaurav-gosain/gollama@latest

# Add Go binaries to PATH
echo 'export PATH=$PATH:~/go/bin' >> ~/.bashrc
source ~/.bashrc

Launch

gollama

Keyboard Shortcuts

Key	Action
`↑` / `k`	Navigate up
`↓` / `j`	Navigate down
`Ctrl+N`	New chat
`/`	Fuzzy search chats
`d`	Delete chat
`Ctrl+C`	Quit

DEV Community