Seenivasa Ramadurai

Posted on Apr 5

Run Open Source AI Models with Docker Model Runner

#ai #docker #machinelearning #opensource

Introduction

If you've spent any time in software development, cloud engineering, or microservices architecture, the name Docker needs no introduction. But for those newer to the ecosystem, here's the short version.

Docker is an open platform for developing, shipping, and running applications. Its core idea is elegant: separate your application from the underlying infrastructure so you can build fast, test consistently, and deploy confidently. By standardizing how code is packaged and delivered, Docker dramatically shrinks the gap between "it works on my machine" and "it works in production."

What is Docker Desktop?

Docker Desktop takes everything Docker offers and wraps it into a single, batteries-included application for macOS, Windows, and Linux. It bundles the Docker Engine, CLI, Docker Compose, Kubernetes, and a visual dashboard giving developers a complete container workflow without ever touching low level OS configuration.

Over the years, Docker Desktop has become the de facto local development environment for millions of engineers worldwide. Version 4.x doubled down on AI workloads, and the latest releases ship with Docker Model Runner as a first class, built in feature accessible directly from the Docker Dashboard or the CLI you already use every day.

What is Docker Model Runner?

Docker Model Runner (DMR) is an inference engine embedded directly into Docker Desktop. It lets you pull, run, and interact with open-source large language models using the same familiar docker CLI no new tools, no configuration headaches, no surprises.

Under the hood, DMR uses llama.cpp as its runtime backend, delivering high performance inference on both CPU and GPU — Metal on Apple Silicon, CUDA on Linux and Windows out of the box.

Models are distributed as OCI compliant artifacts through Docker Hub's ai/ namespace. That means model versioning, access control, and distribution are all handled by the same battle tested infrastructure already powering your container images.

"What Docker did for application packaging, Model Runner does for AI inference one pull command, consistent behavior everywhere."

When to Use Docker Model Runner

How It Works Under the Hood

When you run a model through DMR, Docker Desktop spins up a local HTTP server exposing an OpenAI-compatible REST API including /v1/chat/completions, /v1/completions, and /v1/models. Any application or SDK already speaking the OpenAI protocol works against DMR with zero code changes, making it a drop in local alternative for AI-powered development.

Install Latest Docker Desktop based on your OS
Start the Docker Desktop
Click the Settings icon top Right corner
Select AI and enable Docker Model Runner, Enable DMR and Host TCP as shown below .

Note: Default TCP port is 12434 , you can change it whatever free port available in your machine , Mine i set it 5018

Next, click the models left side as shown below

Now, click pull and download the model and run it.

Below screenshot shows i pulled or downloaded two open source models

Test the Model within docker desktop itself

Testing GPT-OSS

The docker model subcommand is your primary interface. Let's walk through pulling and running qwen3.5 step by step.

1. Pull a model from Docker Hub

2. List available models ( what models, downloaded locally )

Quick reference cheat sheet

Why Docker Model Runner matters

Using DMR in your applications

Python with the OpenAI SDK

Since DMR speaks the OpenAI protocol, swap the base URL and you're done no model specific library needed:


from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5018/engines/v1",
    api_key="not-needed",
)

while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit", "bye"]:
        break
    response = client.chat.completions.create(
        model="gpt-oss", messages=[{"role": "user", "content": user_input}]
    )
    print(response.choices[0].message.content)

Testing the above code.

Docker Model Runner closes the gap between containerized application development and AI-powered application development. By treating models as OCI(Open Container Initiative) artifacts and exposing a standard OpenAI compatible API, DMR lets you build with local LLMs using the same mental model, the same toolchain, and the same workflows you already use for everything else.

The combination of zero setup inference, hardware acceleration, and Compose integration makes DMR the most practical way to add local AI capabilities to any project whether you're building a RAG pipeline, a coding assistant, or a document summarizer.

Thanks
Sreeni Ramadorai

DEV Community