1. Install & Start Ollama
curl -fsSL https://ollama.com/install.sh | sh
systemctl start ollama
ollama --version
2. Verify GPU Detection
NVIDIA
nvidia-smi
AMD
rocm-smi
3. Set Up Model Directory
mkdir -p ~/Documents/LLM
cd ~/Documents/LLM
# Copy your .gguf file here
4. Create a Modelfile
vim Modelfile
Vim quick reference:
-
i— enter insert mode (start typing) -
Esc— exit insert mode -
:wq— save and quit -
:q!— quit without saving
FROM ./Phi-4-mini-instruct-Q4_K_M.gguf
SYSTEM """
You are a helpful AI assistant.
"""
TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
"""
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "<|end|>"
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
Note: Always include
TEMPLATEfor custom GGUFs. Use instruct/chat variants, not base models.
5. Create & Run the Model
ollama create mymodel -f Modelfile
ollama run mymodel
6. Verify GPU Usage
Open a second terminal and monitor VRAM — an increase confirms GPU acceleration.
# NVIDIA
watch -n 1 nvidia-smi
# AMD
watch -n 1 rocm-smi
To confirm via logs:
journalctl -u ollama -f
# Look for: "using CUDA" or "offloading layers to GPU"
7. Ollama Command Reference
Model Management
| Task | Command |
|---|---|
| Pull a model | ollama pull <model> |
| Create from Modelfile | ollama create <name> -f Modelfile |
| List installed models | ollama list |
| Show model details | ollama show <model> |
| Copy a model | ollama cp <source> <dest> |
| Remove a model | ollama rm <model> |
| Push model to registry | ollama push <model> |
Running Models
| Task | Command |
|---|---|
| Run model (interactive) | ollama run <model> |
| Run with single prompt | ollama run <model> "your prompt" |
| Run with stdin input | `echo "prompt" \ |
| Show running models | {% raw %}ollama ps
|
| Stop a running model | ollama stop <model> |
In-Chat Commands
| Command | Action |
|---|---|
/clear |
Clear chat history |
/bye |
Exit chat |
/set parameter <key> <val> |
Change param on the fly |
/show info |
Show model info |
/show modelfile |
Show current Modelfile |
/show parameters |
Show active parameters |
/help |
List all in-chat commands |
API (REST)
Ollama runs a local server at http://localhost:11434.
# Generate (single turn)
curl http://localhost:11434/api/generate -d '{
"model": "mymodel",
"prompt": "Explain Docker in simple terms",
"stream": false
}'
# Chat (multi-turn)
curl http://localhost:11434/api/chat -d '{
"model": "mymodel",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
# List models via API
curl http://localhost:11434/api/tags
# Check running models
curl http://localhost:11434/api/ps
8. Manage Ollama Service (systemctl)
Start / Stop / Restart
# Start Ollama service
systemctl start ollama
# Stop Ollama service
systemctl stop ollama
# Restart Ollama service
systemctl restart ollama
Status & Logs
# Check service status
systemctl status ollama
# View live logs
journalctl -u ollama -f
# View last 50 log lines
journalctl -u ollama -n 50
Enable / Disable on Boot
# Enable Ollama to start on boot
systemctl enable ollama
# Disable autostart
systemctl disable ollama
# Check if enabled
systemctl is-enabled ollama
9. Gollama — Chat TUI for Ollama
Gollama is a terminal chat interface for Ollama with conversation history saved via SQLite.
Install Go (Fedora)
sudo dnf install golang -y
go version
Install Gollama
go install github.com/gaurav-gosain/gollama@latest
# Add Go binaries to PATH
echo 'export PATH=$PATH:~/go/bin' >> ~/.bashrc
source ~/.bashrc
Launch
gollama
Keyboard Shortcuts
| Key | Action |
|---|---|
↑ / k
|
Navigate up |
↓ / j
|
Navigate down |
Ctrl+N |
New chat |
/ |
Fuzzy search chats |
d |
Delete chat |
Ctrl+C |
Quit |
Top comments (0)