Introduction
If you've spent any time in software development, cloud engineering, or microservices architecture, the name Docker needs no introduction. But for those newer to the ecosystem, here's the short version.
Docker is an open platform for developing, shipping, and running applications. Its core idea is elegant: separate your application from the underlying infrastructure so you can build fast, test consistently, and deploy confidently. By standardizing how code is packaged and delivered, Docker dramatically shrinks the gap between "it works on my machine" and "it works in production."
What is Docker Desktop?
Docker Desktop takes everything Docker offers and wraps it into a single, batteries-included application for macOS, Windows, and Linux. It bundles the Docker Engine, CLI, Docker Compose, Kubernetes, and a visual dashboard giving developers a complete container workflow without ever touching low level OS configuration.
Over the years, Docker Desktop has become the de facto local development environment for millions of engineers worldwide. Version 4.x doubled down on AI workloads, and the latest releases ship with Docker Model Runner as a first class, built in feature accessible directly from the Docker Dashboard or the CLI you already use every day.
What is Docker Model Runner?
Docker Model Runner (DMR) is an inference engine embedded directly into Docker Desktop. It lets you pull, run, and interact with open-source large language models using the same familiar docker CLI no new tools, no configuration headaches, no surprises.
Under the hood, DMR uses llama.cpp as its runtime backend, delivering high performance inference on both CPU and GPU — Metal on Apple Silicon, CUDA on Linux and Windows out of the box.
Models are distributed as OCI compliant artifacts through Docker Hub's ai/ namespace. That means model versioning, access control, and distribution are all handled by the same battle tested infrastructure already powering your container images.
"What Docker did for application packaging, Model Runner does for AI inference one pull command, consistent behavior everywhere."
When to Use Docker Model Runner
How It Works Under the Hood
When you run a model through DMR, Docker Desktop spins up a local HTTP server exposing an OpenAI-compatible REST API including /v1/chat/completions, /v1/completions, and /v1/models. Any application or SDK already speaking the OpenAI protocol works against DMR with zero code changes, making it a drop in local alternative for AI-powered development.
- Install Latest Docker Desktop based on your OS
- Start the Docker Desktop
- Click the Settings icon top Right corner
- Select AI and enable Docker Model Runner, Enable DMR and Host TCP as shown below .
Note: Default TCP port is 12434 , you can change it whatever free port available in your machine , Mine i set it 5018
Next, click the models left side as shown below
Now, click pull and download the model and run it.
Below screenshot shows i pulled or downloaded two open source models
Test the Model within docker desktop itself
Testing GPT-OSS
The docker model subcommand is your primary interface. Let's walk through pulling and running qwen3.5 step by step.
1. Pull a model from Docker Hub
2. List available models ( what models, downloaded locally )
Quick reference cheat sheet
Why Docker Model Runner matters
Using DMR in your applications
Python with the OpenAI SDK
Since DMR speaks the OpenAI protocol, swap the base URL and you're done no model specific library needed:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5018/engines/v1",
api_key="not-needed",
)
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit", "bye"]:
break
response = client.chat.completions.create(
model="gpt-oss", messages=[{"role": "user", "content": user_input}]
)
print(response.choices[0].message.content)
Testing the above code.
Docker Model Runner closes the gap between containerized application development and AI-powered application development. By treating models as OCI(Open Container Initiative) artifacts and exposing a standard OpenAI compatible API, DMR lets you build with local LLMs using the same mental model, the same toolchain, and the same workflows you already use for everything else.
The combination of zero setup inference, hardware acceleration, and Compose integration makes DMR the most practical way to add local AI capabilities to any project whether you're building a RAG pipeline, a coding assistant, or a document summarizer.
Thanks
Sreeni Ramadorai












Top comments (0)