Ali Ibrahim

Posted on Jun 27 • Originally published at blog.agentailor.com

Run Open-Source AI Models Locally with Docker Model Runner

#docker #ai #devops

Introduction

The impact of AI, especially generative AI, on how we build and use software is undeniably changing. While most leading generative AI models and labs remain closed source, open source alternatives are rapidly catching up.

Closed source models often come with everything needed to build and deploy via hosted APIs. Open source models, on the other hand, may offer more control and lower cost, but require more setup, especially during development.

To simplify this, new tools have emerged to help run open source models locally. In this article, we’ll explore one such tool: Docker Model Runner (DMR). We'll cover what it is, how it works, and demonstrate how to run a model like Gemma using OpenAI’s TypeScript SDK.

What is Docker Model Runner ?

Docker Model Runner (DMR) is a new extension to the Docker ecosystem that lets users pull, run, and manage AI models similarly to how Docker handles images. Key features include:

Pull and push models to and from Docker Hub
Package GGUF (GPT-Generated Unified Format) files as OCI Artifacts
Run and interact with models via CLI or Docker Desktop GUI
Manage local model cache and view logs

How It Works ?

Models are pulled from Docker Hub the first time they are used and cached locally. At runtime, models are loaded into memory only when needed, and unloaded when idle. This approach mimics Docker's behavior for container images.

Why It Matters ?

DMR is a game-changer because it leverages Docker’s familiarity and ecosystem. Developers can use AI models with minimal configuration or infrastructure changes, especially helpful for local prototyping or agent development.

Enjoying content like this? Sign up for Agent Briefings - insights on building and scaling AI agents.

How to Use DMR ?

To use Docker Model Runner, you need:

Docker Engine
Docker Desktop (Windows 4.41+ or macOS 4.40+)

This article uses Docker Desktop for Windows, but the process is similar on other platforms.

Installation

If Docker is not installed yet, download it from:

https://www.docker.com/products/docker-desktop/

Configuration

After installing Docker:

Docker Desktop

Open Settings > Beta features
Enable Docker Model Runner

Docker Engine (Linux)

sudo apt-get update
sudo apt-get install docker-model-plugin

Test the Installation

# Check version
docker model version

# Run a model
# If the model isn’t cached locally, it will be pulled
docker model run ai/gemma3:latest

Using Docker Desktop Chat UI

You can also use the Docker Desktop Chat UI to interact with models:

API Endpoints

Models can be exposed via a local TCP port. Enable this in Docker Desktop settings:

On the host machine: use http://localhost:<port>
From inside a container: use http://model-runner.docker.internal/

Supported Endpoints (Llama.cpp style)

GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings

Best part? It’s OpenAI-compatible, so you can use existing OpenAI SDKs to interact with local models!

Docker Compose Integration

DMR works with Docker Compose, making it easy to integrate AI models into multi-container applications.

Note: Requires Docker Compose v2.35+

Example docker-compose.yml:

services:
  chat:
    image: my-chat-app
    depends_on:
      - ai_runner

  ai_runner:
    provider:
      type: model
      options:
        model: ai/gemma3

Demo: Gemma 3 + OpenAI SDK (TypeScript)

Setup

# Create project directory
mkdir docker-runner-demo
cd docker-runner-demo

# Initialize project
pnpm init
pnpm install openai

`package.json`

{
  "name": "docker-model-runner",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "start": "tsx src/main.ts"
  },
  "dependencies": {
    "openai": "^5.6.0",
    "tsx": "^4.20.3"
  }
}

`tsconfig.json`

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true,
    "outDir": "dist"
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules"]
}

Create Entry File

# On Linux or macOS
mkdir src && touch src/main.ts
# On Windows
mkdir src ; New-Item -Path src -Name main.ts

`src/main.ts`

import OpenAI from 'openai'

const client = new OpenAI({
  baseURL: 'http://localhost:12434/engines/llama.cpp/v1',
})

async function main() {
  const completion = await client.chat.completions.create({
    messages: [{ role: 'user', content: 'Write a one-sentence bedtime story about a unicorn' }],
    model: 'ai/gemma3:latest',
  })

  console.log(completion.choices[0].message.content)
}

main().catch(console.error)

Run

pnpm start

Expected Output

As the moonbeams danced, a gentle unicorn drifted off to sleep, dreaming of fields of shimmering wildflowers and the quiet magic of the stars.

Complete Project

Want a more complete app with a UI? Check out the GitHub starter:

Local Agent Using Docker Model Runner

Conclusion

This was a hands-on intro to Docker Model Runner, an easy Docker native way to run and integrate local AI models. It’s especially powerful when paired with the OpenAI-compatible SDKs.

In the next article, I’ll dive deeper into Docker’s MCP catalog and AI Toolkit.

Meanwhile, if you’re building AI agents or want to explore how to integrate AI into your software, check out Agentailor. It’s where I share tools and experiments for agent developers.

Let’s connect:

DEV Community