Ollama Environment Variables

#ollama #configuration #tooling

If you're writing an integration with Ollama or want to influence its function on your device, there are several environment variables you can set to configure how Ollama operates.

Source (2025-01-29): https://github.com/ollama/ollama/blob/main/envconfig/config.go

OLLAMA_HOST- defaults to http://127.0.0.1:11434. Can be http/https. Host returns the scheme and host. Host can be configured via the OLLAMA_HOST environment variable.
OLLAMA_ORIGINS- comma separated values. Always appends http(s)://localhost, 127.0.0.1, 0.0.0.0, app://, file://, tauri://, vscode-webview://. Origins returns a list of allowed origins. Origins can be configured via the OLLAMA_ORIGINS environment variable.
OLLAMA_MODELS- Models returns the path to the models directory. Models directory can be configured via the OLLAMA_MODELS environment variable. Default is $HOME/.ollama/models
OLLAMA_KEEP_ALIVE- KeepAlive returns the duration that models stay loaded in memory. KeepAlive can be configured via the OLLAMA_KEEP_ALIVE environment variable. Negative values are treated as infinite. Zero is treated as no keep alive. Default is 5 minutes.
OLLAMA_LOAD_TIMEOUT- LoadTimeout returns the duration for stall detection during model loads. LoadTimeout can be configured via the OLLAMA_LOAD_TIMEOUT environment variable. Zero or Negative values are treated as infinite. Default is 5 minutes.

Boolean flags. It accepts 1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False. Any other value returns an error. (https://pkg.go.dev/strconv#ParseBool)

OLLAMA_DEBUG - Debug enabled additional debug information.
OLLAMA_FLASH_ATTENTION - FlashAttention enables the experimental flash attention feature.
OLLAMA_KV_CACHE_TYPE - KvCacheType is the quantization type for the K/V cache.
OLLAMA_NOHISTORY - NoHistory disables readline history.
OLLAMA_NOPRUNE - NoPrune disables pruning of model blobs on startup.
OLLAMA_SCHED_SPREAD - SchedSpread allows scheduling models across all GPUs.
OLLAMA_INTEL_GPU - IntelGPU enables experimental Intel GPU detection.
OLLAMA_MULTIUSER_CACHE - MultiUserCache optimizes prompt caching for multi-user scenarios

Library Settings

OLLAMA_LLM_LIBRARY - Set LLM library to bypass autodetection
CUDA_VISIBLE_DEVICES - Set which NVIDIA devices are visible
HIP_VISIBLE_DEVICES - Set which AMD devices are visible by numeric ID
ROCR_VISIBLE_DEVICES - Set which AMD devices are visible by UUID or numeric ID
GPU_DEVICE_ORDINAL - Set which AMD devices are visible by numeric ID
HSA_OVERRIDE_GFX_VERSION - Override the gfx used for all detected AMD GPUs

Performance Settings

OLLAMA_NUM_PARALLEL - Maximum number of parallel requests. Defaults to unlimited
OLLAMA_MAX_LOADED_MODELS - Maximum number of loaded models per GPU. Defaults to unlimited.
OLLAMA_MAX_QUEUE - Maximum number of queued requests. Defaults to 512.
OLLAMA_MAX_VRAM - Maximum amount of VRAM that can be consumed per GPU. Default is unlimited
OLLAMA_GPU_OVERHEAD - Set aside VRAM per GPU. Default is 0

Proxy Settings

HTTP_PROXY - HTTP proxy
HTTPS_PROXY - HTTPS proxy
NO_PROXY - No proxy

DEV Community

Ollama Environment Variables

Top comments (0)