DEV Community

Cover image for Docker Model Runner Cheatsheet 2025
Ajeet Singh Raina
Ajeet Singh Raina

Posted on • Edited on

Docker Model Runner Cheatsheet 2025

πŸ“‘ Table of Contents

  1. What is Docker Model Runner?
  2. πŸš€ Quick Setup Guide
  3. πŸ“‹ Essential Commands
  4. πŸ”— API Integration
  5. 🐳 Docker Compose Integration

What is Docker Model Runner?

Docker Model Runner is a new feature integrated into Docker Desktop that enables developers to run AI models locally with zero setup complexity. Built into Docker Desktop 4.40+, it brings LLM (Large Language Model) inference directly into your GenAI development workflow.

Key Benefits

  • βœ… No extra infrastructure - Runs natively on your machine
  • βœ… OpenAI-compatible API - Drop-in replacement for OpenAI calls
  • βœ… GPU acceleration - Optimized for Apple Silicon and NVIDIA GPUs
  • βœ… OCI artifacts - Models distributed as OCI artifacts
  • βœ… Host-based execution - Maximum performance, no VM overhead

πŸš€ Quick Setup Guide

Prerequisites

  • Docker Desktop 4.40+ (4.41+ for Windows GPU support)
  • macOS: Apple Silicon (M1/M2/M3) for optimal performance
  • Windows: NVIDIA GPU (for GPU acceleration)
  • Linux: Docker Engine with Model Runner

Enable Docker Model Runner

Docker Desktop (GUI)

  1. Open Docker Desktop Settings
  2. Navigate to Features in development β†’ Beta
  3. Enable "Docker Model Runner"
  4. Apply & Restart

Docker Desktop (CLI)

# Enable Model Runner
docker desktop enable model-runner

# Enable with TCP support (for host access)
docker desktop enable model-runner --tcp 12434

# Check status
docker desktop status
Enter fullscreen mode Exit fullscreen mode

Docker Engine (Linux)

sudo apt-get update
sudo apt-get install docker-model-plugin
Enter fullscreen mode Exit fullscreen mode

πŸ“‹ Essential Commands

Model Management

Pull Models

# Pull latest version
docker model pull ai/smollm2

Enter fullscreen mode Exit fullscreen mode

List Models

# List all local models
docker model ls
Enter fullscreen mode Exit fullscreen mode

Remove Models

# Remove specific model
docker model rm ai/smollm2
Enter fullscreen mode Exit fullscreen mode

Running Models

Interactive Mode

# Quick inference
docker model run ai/smollm2 "Explain Docker in one sentence"
Enter fullscreen mode Exit fullscreen mode

Model Information

# Inspect model details
docker model inspect ai/smollm2
Enter fullscreen mode Exit fullscreen mode

πŸ”— API Integration

OpenAI-Compatible Endpoints

From Containers

# Base URL for container access
http://model-runner.docker.internal/engines/llama.cpp/v1/
Enter fullscreen mode Exit fullscreen mode

From Host (with TCP enabled)

# Base URL for host access
http://localhost:12434/engines/llama.cpp/v1/
Enter fullscreen mode Exit fullscreen mode

Chat Completions API

cURL Example

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful coding assistant."
      },
      {
        "role": "user", 
        "content": "Write a Docker Compose file for a web app"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'
Enter fullscreen mode Exit fullscreen mode

Python Example

import openai

# Configure client for local Model Runner
client = openai.OpenAI(
    base_url="http://model-runner.docker.internal/engines/llama.cpp/v1",
    api_key="not-needed"  # Local inference doesn't need API key
)

# Chat completion
response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain containerization benefits"}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Node.js Example

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://model-runner.docker.internal/engines/llama.cpp/v1',
  apiKey: 'not-needed'
});

async function chatWithModel() {
  const completion = await openai.chat.completions.create({
    model: 'ai/smollm2',
    messages: [
      { role: 'system', content: 'You are a DevOps expert.' },
      { role: 'user', content: 'Best practices for Docker in production?' }
    ],
    temperature: 0.8,
    max_tokens: 300
  });

  console.log(completion.choices[0].message.content);
}
Enter fullscreen mode Exit fullscreen mode

🐳 Docker Compose Integration

services:
  chat:
    image: my-chat-app
    depends_on:
      - ai_runner

  ai_runner:
    provider:
      type: model
      options:
        model: ai/smollm2
Enter fullscreen mode Exit fullscreen mode

🐳 Docker Model Management Endpoints

POST /models/create
GET /models
GET /models/{namespace}/{name}
DELETE /models/{namespace}/{name}
Enter fullscreen mode Exit fullscreen mode

OpenAI Endpoints:

GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings
Enter fullscreen mode Exit fullscreen mode

Top comments (0)