Gowsiya Syednoor Shek

Posted on May 17 • Edited on Jun 2

Run LLMs Locally with Docker Model Runner – A Real-World Developer Guide (Part 2)

#docker #llm #ai

In Part 1 of this series, I explored how Docker Compose Watch helped accelerate development when working with Python-based AI apps. In this second part, I dive into Docker Model Runner – a powerful new feature that lets developers run large language models (LLMs) locally using OpenAI-compatible APIs, all powered by Docker.

But getting it to work wasn't as plug-and-play as I had hoped. So this tutorial is both a how-to and a real-world troubleshooting log for anyone trying to follow the same path.

What Is Docker Model Runner?

Docker Model Runner is a beta feature (from Docker Desktop 4.40+) that allows you to:

Pull open-source LLMs from Docker Hub or Hugging Face
Run them locally on your machine
Interact with them via OpenAI-compatible endpoints

It removes the need to use separate tools like Ollama or Hugging Face Transformers manually. And best of all: it plugs into Docker Desktop.

What I Tried to Build

I wanted a simple Python app that:

Sends a prompt to a local LLM using the /chat/completions API
Receives a response, all without touching OpenAI's cloud

Setup Troubles I Faced

Here are the real-world challenges I encountered:

1. Docker Desktop Version Was Too Old

At first, I was running Docker Desktop 4.39. This version showed experimental features like "Docker AI" and "Wasm," but didn't expose Docker Model Runner in the UI.

🔧 Fix: I upgraded to Docker Desktop 4.41, which finally showed "Enable Docker Model Runner" under the Beta Features tab.

2. Pulling the Model Fails Unless You Use the Right Name

Running this:

docker model pull mistral

resulted in:

401 Unauthorized

🔧 Fix: I checked Docker’s official documentation and found that model names are namespaced. I changed it to:

docker model pull ai/mistral

✅ This worked instantly.

3. TCP Port Refused Until I Enabled Host-Side Access

My Python script kept failing with:

ConnectionRefusedError: [WinError 10061] No connection could be made...

Even though the model was running in interactive mode.

🔧 Fix:I opened Docker Desktop → Features in Development → Enable Docker Model Runner
Then I scrolled down to find the checkbox: Enable host-side TCP support

Once enabled, the model ran with the HTTP API exposed on http://localhost:12434

Final Working Setup

Pull and Run the Model

docker model pull ai/mistral docker model run ai/mistral

This launches the model and exposes an OpenAI-compatible endpoint.

Test via Python

import requests

API_URL = "http://localhost:12434/engines/v1/chat/completions"
payload = {
    "model": "ai/mistral",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Why is containerization important for deploying AI models?"}
    ]
}
headers = {"Content-Type": "application/json"}

response = requests.post(API_URL, headers=headers, json=payload)
print(response.json()["choices"][0]["message"]["content"])

Output: You’ll receive a well-formed response from the model — all running locally on your machine.

Final Thoughts

Docker Model Runner is incredibly promising — especially for developers who want to:

Build apps using LLMs without relying on cloud APIs
Save costs and protect sensitive data
Learn and prototype GenAI apps offline

But like many beta features, it takes some experimentation. If you're running into issues, check:

Docker version (must be 4.40+)
Correct model name (use the ai/ namespace)
TCP support enabled in the Model Runner settings

In the next article, I’ll explore integrating Docker Model Runner with a front-end GenAI UI.

👉 Stay tuned for Part 3

DEV Community