Raju Dandigam

Posted on Jul 2

Running ML Models Locally with Docker Model Runner

#llm #model #docker #ai

Docker Model Runner is designed to make running AI and ML models locally as easy as running any Docker service. It lets you package trained models as containers with consistent REST APIs—no custom server code required. In this guide, we’ll cover everything you need to know to use Docker Model Runner in real-world development workflows, including how to run models locally, configure Docker Desktop, connect from Node.js apps, use Docker Compose for orchestration, and follow best practices.

What Is Docker Model Runner?

Docker Model Runner lets you package your trained model with metadata that tells Docker how to serve it. When you run the resulting image, you get a standardized REST API automatically, with endpoints like /predict and /health. This eliminates the need to write and maintain your own serving code.

Why Use It?

Traditionally, serving ML models required custom web servers, complex dependency management, and inconsistent APIs across teams. Docker Model Runner solves this by:

Providing consistent APIs across all models.
Simplifying local development.
Making models portable across machines and environments.
Reducing maintenance by removing custom server code.

Supported Frameworks

Docker Model Runner supports a wide range of frameworks:

PyTorch
TensorFlow
Hugging Face Transformers
scikit-learn
XGBoost
LightGBM
spaCy
ONNX

This means you can use the same approach for a huge variety of ML workloads.

How It Works in Practice

Step 1: Train your model.
Step 2: Write a model-runner.yaml describing the framework and location of your model.
Step 3: Build your Docker image with this metadata and your model files.
Step 4: Run the container and get a consistent REST API without writing server code.

Running Models Locally with Docker Model Runner

Below is a real example using the ai/smollm2:latest model running locally. This demonstrates how easy it is to list available models and start a local interactive chat session.

Pull the model:
docker model pull ai/smollm2:latest

View the models available:
docker model list

Run the model:
docker model run ai/smollm2:latest

You’ll get an interactive chat session where you can type questions directly to the model.

Give the prompt:
What is Docker?

Docker Desktop Settings for Local Model Running

Need to update the settings in Docker Desktop. Allow TCP host connections for Model Runner via Docker Desktop settings or using CLI options for advanced networking control.

Docker Desktop makes it even easier to manage these models:

Navigate to the Models tab in Docker Desktop.
Browse and manage available local models.
Launch interactive chat interfaces directly from the UI.
Monitor container resource usage and logs.

You can adjust resource allocation in Settings → Resources, making sure your local environment has enough CPU and memory to handle larger models.

Using Docker Models in a Node.js App

You can use Docker Model Runner locally with any language. Here’s how you’d connect to your local model from a simple Node.js app.

Example Express.js Route:

app.post('/generate', async (req, res) => {
  const prompt = req.body.prompt;
  const response = await fetch('http://localhost:5000/predict', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt })
  });
  const data = await response.json();
  res.send(data);
});

Why this is powerful:

Your app code never changes if you swap models.
You can test locally and later deploy the same model container in production.
Changing models is as simple as changing the running container.

Using Docker Compose for Multi-Model Pipelines

You can chain multiple Model Runner services using Docker Compose to build advanced workflows.

Example Use Case: Content Moderation Pipeline

Toxicity Detection MCP → Check user input.
Language Detection MCP → Identify language.
Translation MCP → Normalize to English.
Summarization MCP → Condense for storage.

docker-compose.yml Example:

version: "3.8"
services:
  toxicity-detector:
    image: myorg/toxicity-mcp
    ports:
      - "5001:80"
  language-detector:
    image: myorg/langdetect-mcp
    ports:
      - "5002:80"
  translator:
    image: myorg/translator-mcp
    ports:
      - "5003:80"
  summarizer:
    image: myorg/summarizer-mcp
    ports:
      - "5004:80"

Run:
docker-compose up

Now your app can call each service in sequence for a complete moderation and summarization pipeline.

CI/CD Integration

Define your model-runner.yaml in the repo.
Build Docker images in CI pipelines.
Tag images with version numbers or commit SHAs.
Run Docker Scout or other scanners for CVEs.
Push images to internal or external registries.
Deploy using Compose, Swarm, or Kubernetes.
Include automated health checks against /health.

This ensures your model deployment is as maintainable and secure as any other Microservice.

Best Practices

Always define clear input/output contracts.
Use private registries for internal or proprietary models.
Tag images with semantic versions.
Scan images regularly for vulnerabilities.
Keep model-runner.yaml under version control.
Automate builds and deployments via CI/CD.
Use resource limits in Docker Desktop settings to avoid overloading local environments.
Document how to call /predict and interpret results for consuming teams.

Conclusion

Docker Model Runner isn't just a convenience tool—it's a shift in how teams can think about model serving. Instead of building and maintaining custom servers for every model, you get standardization, portability, and repeatability. Whether you’re running a small LLM locally for testing, deploying to production Kubernetes clusters, or sharing images across teams, Docker Model Runner makes model serving a first-class, manageable part of your software architecture.

DEV Community