In Part 1 of this series, I explored how Docker Compose Watch helped accelerate development when working with Python-based AI apps. In this second part, I dive into Docker Model Runner – a powerful new feature that lets developers run large language models (LLMs) locally using OpenAI-compatible APIs, all powered by Docker.
But getting it to work wasn't as plug-and-play as I had hoped. So this tutorial is both a how-to and a real-world troubleshooting log for anyone trying to follow the same path.
What Is Docker Model Runner?
Docker Model Runner is a beta feature (from Docker Desktop 4.40+) that allows you to:
- Pull open-source LLMs from Docker Hub or Hugging Face
- Run them locally on your machine
- Interact with them via OpenAI-compatible endpoints
It removes the need to use separate tools like Ollama or Hugging Face Transformers manually. And best of all: it plugs into Docker Desktop.
What I Tried to Build
I wanted a simple Python app that:
- Sends a prompt to a local LLM using the /chat/completions API
- Receives a response, all without touching OpenAI's cloud
Setup Troubles I Faced
Here are the real-world challenges I encountered:
1. Docker Desktop Version Was Too Old
At first, I was running Docker Desktop 4.39. This version showed experimental features like "Docker AI" and "Wasm," but didn't expose Docker Model Runner in the UI.
🔧 Fix: I upgraded to Docker Desktop 4.41, which finally showed "Enable Docker Model Runner" under the Beta Features tab.
2. Pulling the Model Fails Unless You Use the Right Name
Running this:
docker model pull mistral
resulted in:
401 Unauthorized
🔧 Fix: I checked Docker’s official documentation and found that model names are namespaced. I changed it to:
docker model pull ai/mistral
✅ This worked instantly.
3. TCP Port Refused Until I Enabled Host-Side Access
My Python script kept failing with:
ConnectionRefusedError: [WinError 10061] No connection could be made...
Even though the model was running in interactive mode.
🔧 Fix:I opened Docker Desktop → Features in Development → Enable Docker Model Runner
Then I scrolled down to find the checkbox: Enable host-side TCP support
Once enabled, the model ran with the HTTP API exposed on http://localhost:12434
Final Working Setup
- Pull and Run the Model
docker model pull ai/mistral
docker model run ai/mistral
This launches the model and exposes an OpenAI-compatible endpoint.
- Test via Python
import requests
API_URL = "http://localhost:12434/engines/v1/chat/completions"
payload = {
"model": "ai/mistral",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Why is containerization important for deploying AI models?"}
]
}
headers = {"Content-Type": "application/json"}
response = requests.post(API_URL, headers=headers, json=payload)
print(response.json()["choices"][0]["message"]["content"])
Output: You’ll receive a well-formed response from the model — all running locally on your machine.
Final Thoughts
Docker Model Runner is incredibly promising — especially for developers who want to:
- Build apps using LLMs without relying on cloud APIs
- Save costs and protect sensitive data
- Learn and prototype GenAI apps offline
But like many beta features, it takes some experimentation. If you're running into issues, check:
- Docker version (must be 4.40+)
- Correct model name (use the ai/ namespace)
- TCP support enabled in the Model Runner settings
In the next article, I’ll explore integrating Docker Model Runner with a front-end GenAI UI.
👉 Stay tuned for Part 3
Top comments (0)