DEV Community

Gowsiya Syednoor Shek
Gowsiya Syednoor Shek

Posted on • Edited on

Run LLMs Locally with Docker Model Runner – A Real-World Developer Guide (Part 2)

In Part 1 of this series, I explored how Docker Compose Watch helped accelerate development when working with Python-based AI apps. In this second part, I dive into Docker Model Runner – a powerful new feature that lets developers run large language models (LLMs) locally using OpenAI-compatible APIs, all powered by Docker.

But getting it to work wasn't as plug-and-play as I had hoped. So this tutorial is both a how-to and a real-world troubleshooting log for anyone trying to follow the same path.

What Is Docker Model Runner?

Docker Model Runner is a beta feature (from Docker Desktop 4.40+) that allows you to:

  • Pull open-source LLMs from Docker Hub or Hugging Face
  • Run them locally on your machine
  • Interact with them via OpenAI-compatible endpoints

It removes the need to use separate tools like Ollama or Hugging Face Transformers manually. And best of all: it plugs into Docker Desktop.

What I Tried to Build

I wanted a simple Python app that:

  • Sends a prompt to a local LLM using the /chat/completions API
  • Receives a response, all without touching OpenAI's cloud

Setup Troubles I Faced

Here are the real-world challenges I encountered:

1. Docker Desktop Version Was Too Old

At first, I was running Docker Desktop 4.39. This version showed experimental features like "Docker AI" and "Wasm," but didn't expose Docker Model Runner in the UI.

🔧 Fix: I upgraded to Docker Desktop 4.41, which finally showed "Enable Docker Model Runner" under the Beta Features tab.

2. Pulling the Model Fails Unless You Use the Right Name

Running this:

docker model pull mistral

resulted in:

401 Unauthorized

🔧 Fix: I checked Docker’s official documentation and found that model names are namespaced. I changed it to:

docker model pull ai/mistral

✅ This worked instantly.

3. TCP Port Refused Until I Enabled Host-Side Access

My Python script kept failing with:

ConnectionRefusedError: [WinError 10061] No connection could be made...

Even though the model was running in interactive mode.

🔧 Fix:I opened Docker Desktop → Features in Development → Enable Docker Model Runner
Then I scrolled down to find the checkbox: Enable host-side TCP support
Docker Destop Settings

Once enabled, the model ran with the HTTP API exposed on http://localhost:12434

Final Working Setup

  1. Pull and Run the Model

docker model pull ai/mistral
docker model run ai/mistral

This launches the model and exposes an OpenAI-compatible endpoint.

  1. Test via Python
import requests

API_URL = "http://localhost:12434/engines/v1/chat/completions"
payload = {
    "model": "ai/mistral",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Why is containerization important for deploying AI models?"}
    ]
}
headers = {"Content-Type": "application/json"}

response = requests.post(API_URL, headers=headers, json=payload)
print(response.json()["choices"][0]["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

Output: You’ll receive a well-formed response from the model — all running locally on your machine.

Final Thoughts

Docker Model Runner is incredibly promising — especially for developers who want to:

  • Build apps using LLMs without relying on cloud APIs
  • Save costs and protect sensitive data
  • Learn and prototype GenAI apps offline

But like many beta features, it takes some experimentation. If you're running into issues, check:

  • Docker version (must be 4.40+)
  • Correct model name (use the ai/ namespace)
  • TCP support enabled in the Model Runner settings

In the next article, I’ll explore integrating Docker Model Runner with a front-end GenAI UI.

👉 Stay tuned for Part 3

Top comments (0)