Jovan Chan

Posted on Jun 11 • Originally published at aifoss.dev

Ollama Security 2026: Lock Down Your Exposed LLM Server

#ollama #security #selfhosted #linux

This article was originally published on aifoss.dev

TL;DR: Researchers found 175,000 Ollama servers publicly accessible with no authentication, no TLS, and no rate limiting. CVE-2026-7482 ("Bleeding Llama," CVSS 9.1) lets an unauthenticated attacker drain your entire process memory — API keys, system prompts, user conversations — in three API calls. This guide covers the exact steps to close every gap, tested against Ollama v0.30.6.

What you'll have running after this guide:

Ollama bound to localhost only, unreachable from the public internet
An nginx reverse proxy with API key authentication and TLS termination
UFW firewall rules that block direct port 11434 access from any external address

Honest take: 90% of exposures come from one line — OLLAMA_HOST=0.0.0.0 — added once and never revisited. Fix the binding first. Everything else is defense in depth.

How 175,000 Servers Ended Up on the Open Internet

Ollama's default behavior on Linux is actually safe: it binds to 127.0.0.1:11434, which means only processes on the same machine can reach the API. The problem starts when developers need to use Ollama from another machine on their network — or from a remote IDE plugin like Continue.dev — and reach for the fastest solution:

# The line that's behind most of those 175K exposures
export OLLAMA_HOST=0.0.0.0

That one environment variable flips Ollama from "only this machine" to "anyone on any network who can reach this IP." Add a cloud VM with a permissive security group and port 11434 open, and you're in the dataset.

In January 2026, SentinelOne and Censys published a scan covering 130 countries and found 175,000 unique hosts with port 11434 publicly reachable and the Ollama API responding unauthenticated. 48% of those hosts advertised tool-calling capabilities — meaning attackers could not only pull models and run completions, they could trigger function calls that touch external services. Between October 2025 and January 2026, documented attack sessions against these hosts totaled over 91,000. Some hosts were racking up $46,000 to $100,000 per day in GPU inference costs run by unauthorized third parties.

The root cause isn't a bug in Ollama. It's a design that prioritizes developer ergonomics (no auth out of the box, easy LAN access) over deployment safety. That design is fine for a local dev box. It is not fine for anything internet-reachable.

CVE-2026-7482: What Bleeding Llama Actually Does

Beyond misconfiguration, Cyera Research disclosed a code-level vulnerability in early 2026 that affects any exposed Ollama server regardless of version — until you patch it.

The flaw: a heap out-of-bounds read in Ollama's GGUF model loader. The GGUF format stores tensor metadata (offsets, sizes) before the actual weight data. Ollama's parser trusted those values without bounds checking. A malicious GGUF file with inflated tensor offsets could coerce the loader to read memory far outside the file buffer — directly from the process heap.

The attack: three API calls, no authentication required:

POST /api/create — upload the crafted GGUF file, trigger the OOB read
GET /api/show — retrieve the model artifact, which now contains leaked memory
POST /api/push — optionally exfiltrate to an attacker-controlled registry

What leaks: environment variables (including OPENAI_API_KEY, ANTHROPIC_API_KEY, any secrets passed at startup), system prompts from currently loaded models, in-flight conversation data from other users on shared servers, and internal memory state that can help chain additional attacks.

CVSS score: 9.1 (Critical). Estimated affected servers at disclosure: 300,000+.

The patch shipped in Ollama v0.17.1 on February 25, 2026. Check your version:

ollama --version
# ollama version 0.30.6

If you're running anything below 0.17.1, update immediately:

curl -fsSL https://ollama.com/install.sh | sh

On the current version (0.30.6 as of this writing), the GGUF parser bounds checks are in place. But patching the binary is only one layer — the sections below cover the rest.

Step 1: Lock the Binding

Verify what Ollama is actually listening on before doing anything else:

ss -tlnp | grep 11434
# Safe:     0.0.0.0 is NOT what you want
# LISTEN    0.0.0.0:11434   ← exposed to all interfaces
# Safe:
# LISTEN    127.0.0.1:11434 ← localhost only

If you see 0.0.0.0:11434, fix it. Edit the systemd service override:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=127.0.0.1"

Then reload:

sudo systemctl daemon-reload
sudo systemctl restart ollama
ss -tlnp | grep 11434
# LISTEN   127.0.0.1:11434  ← correct

If you're running Ollama via Docker, set the environment variable in your compose file and do not publish port 11434 to the host unless you have a reverse proxy in front of it:

services:
  ollama:
    image: ollama/ollama
    environment:
      - OLLAMA_HOST=127.0.0.1
    # No "ports:" section here — nginx handles external access
    volumes:
      - ollama_data:/root/.ollama

Step 2: Firewall Rules

Even with the binding fixed, add a firewall layer so any future misconfiguration doesn't immediately create a public endpoint.

UFW (Ubuntu/Debian):

# Allow SSH and web traffic
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Block direct access to Ollama port from everywhere
sudo ufw deny 11434

# Enable if not already on
sudo ufw enable
sudo ufw status

If you need LAN-only access without a reverse proxy (internal network only), allow the subnet instead:

sudo ufw allow from 192.168.1.0/24 to any port 11434

iptables equivalent (if you're not using UFW):

iptables -A INPUT -p tcp --dport 11434 -s 127.0.0.1 -j ACCEPT
iptables -A INPUT -p tcp --dport 11434 -j DROP

Step 3: Nginx Reverse Proxy with API Key Authentication

This is the most important step if you need remote access. Ollama has no built-in authentication as of v0.30.6 — the official docs note this explicitly. All auth must happen at the proxy layer.

Install nginx:

sudo apt install nginx -y

Generate an API key (a 32-byte random token works fine):

openssl rand -hex 32
# e.g.: a3f1c8d2e9b0456f7a2c1d8e3b4f5a6d7e8c9b0a1f2d3e4c5b6a7f8e9d0c1b2

Create the auth file:

sudo mkdir -p /etc/nginx/conf.d
echo "a3f1c8d2e9b0456f7a2c1d8e3b4f5a6d7e8c9b0a1f2d3e4c5b6a7f8e9d0c1b2" | sudo tee /etc/nginx/ollama-keys.txt
sudo chmod 640 /etc/nginx/ollama-keys.txt

Create the nginx site config at /etc/nginx/sites-available/ollama:

map $http_authorization $auth_valid {
    default 0;
    "Bearer a3f1c8d2e9b0456f7a2c1d8e3b4f5a6d7e8c9b0a1f2d3e4c5b6a7f8e9d0c1b2" 1;
}

server {
    listen 443 ssl;
    server_name your-server.example.com;

    ssl_certificate     /etc/letsencrypt/live/your-server.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-server.example.com/privkey.pem;

    location / {
        if ($auth_valid = 0) {
            return 401 "Unauthorized";
        }

        proxy_pass         http://127.0.0.1:11434;
        proxy_set_header   Host $host;

        # Required for token streaming — without these, you get the full
        # response only after generation completes
        proxy_buffering    off;
        proxy_http_version 1.1;
        chunked_transfer_encoding on;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }
}

server {
    listen 80;
    server_name your-server.example.com;
    return 301 https://$host$request_uri;
}

Enable and test:

sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo nginx -t
# nginx: configuration file /etc/nginx/nginx.conf test is successful
sudo systemctl reload nginx

Test the auth:


bash
# Should be rejected
curl -s https://your-server.example.com/api/tags
# {"error":"Unauthorized"}

# Should work
curl -s -H

DEV Community