Oleg Šelajev

Posted on Jun 20 • Edited on Jun 23

Running LLMs with Docker on Linux: from local to CI

#docker #ai #howto

Earlier this year, Docker released Docker Model Runner, a component integrated into Docker Desktop that allows you to run Large Language Models (LLMs) locally on your machine. Unlike typical container-based execution, Docker Model Runner can leverage the full capabilities of your GPU hardware directly, offering optimal performance. Initially available on macOS and Windows through Docker Desktop, Docker Model Runner is now also available as part of Docker Community Edition (CE). This expansion means you can integrate it seamlessly into your Linux-based continuous integration (CI) pipelines or even use it directly in production.

In this article, we’ll explore how you can install Docker Model Runner on a Linux VM with Docker already available. We'll go through pulling some LLMs, running them, and clarify which URLs you'll use to connect to your models from applications.

Getting a Linux VM

First, we need a Linux VM. To keep things simple, we’ll use Google Cloud Platform’s Shell console, which provides a convenient Linux VM environment right in your browser without needing to provision custom resources.

The VM provided through Cloud Shell isn’t particularly powerful, but it has Docker pre-installed, making it ideal for our demonstration.

To launch it:

Go to your Google Cloud Platform Console.
Click on the Shell icon at the top-right corner.
Authorize the browser if prompted.

Verify Docker installation by running:

$ docker --version
Docker version 28.2.2, build e6534b4

Installing Docker Model Runner

Docker Model Runner on Linux uses standard Docker primitives like containers and volumes to manage GPU passthrough and LLM lifecycle efficiently.

First, install the required plugin package:

sudo apt install docker-model-plugin

After installation, you can confirm everything is set up correctly by running:

docker models list

This command initially pulls necessary infrastructure components. Once complete, it will display any models available locally. Since we haven't downloaded any yet, it'll show an empty list.

Pulling and Running an LLM

Next, let's install a small, resource-efficient model suitable for the Cloud Shell VM. You can choose a model from the Docker AI Hub at hub.docker.com/u/ai. For this demonstration, we'll use a small Qwen model:

docker model pull ai/qwen3:0.6B-Q4_K_M

Once pulled, verify it by running the model interactively:

docker model run ai/qwen3:0.6B-Q4_K_M

You can ask questions and get the typical LLM answers:

Connecting to the Model

Docker Model Runner hosts an inference server that you can connect to using a standard HTTP endpoint. Internally, from Docker containers, the server is accessible via:

http://172.17.0.1:12434/engines/v1

Externally, from your local machine or other environments, use:

http://localhost:12434

For example, to query your model via an OpenAI-compatible API:

curl http://localhost:12434/engines/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "ai/qwen3:0.6B-Q4_K_M",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Please write 100 words about the fall of Rome."}
  ]
}'

The response will be JSON-formatted and include the completion text provided by the model, something like:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The fall of Rome marked the end of the Roman Empire, which had long been a dominant power in the Mediterranean. The decline was driven by internal struggles, including political instability and weakened central authority, alongside external pressures and shifting alliances. The collapse of the Empire had profound effects on Europe, shaping the course of medieval civilization. As Rome faded, its legacy endured through art, religion, and the enduring influence of its legacy on Western culture."
      }
    }
  ],
  "created": 1750411314,
  "model": "ai/qwen3:0.6B-Q4_K_M",
  "usage": {"completion_tokens": 252, "prompt_tokens": 32, "total_tokens": 284}
}

Conclusion

With Docker Model Runner, you can now easily run powerful LLMs locally on Windows, macOS, and Linux, significantly simplifying integration into your development workflows. Whether you're using it for local experimentation or in CI environments Docker Model Runner provides a straightforward solution to add AI to your applications without breaking a sweat.

DEV Community