Using Docker to run Large Language Models (LLMs) locally? Yes, you heard that right. Docker is now much more than just running a container image. W...
For further actions, you may consider blocking this person and/or reporting abuse
The OCI artifact approach for model storage is the underrated gem here - treating models as first-class artifacts rather than bundling them in images is the right call. Version-pinned models, separate from app code, proper layer caching. This is how it should work.
One thing worth watching: Docker Model Runner is currently Mac-only (Apple Silicon). For teams that want the same workflow on Linux or ARM edge hardware, the OpenAI-compatible endpoint pattern generalizes well - llama.cpp exposes the same
/v1/chat/completionsinterface natively, so swapping the base URL is all you need.We built ClawBox (openclawhardware.dev) on exactly this pattern - Jetson Orin Nano running llama.cpp with the same OpenAI-compatible API, so any code you write against Docker Model Runner today works unchanged on our hardware. The "everything local, zero cloud dependency" thesis holds regardless of whether you are on Apple Silicon or NVIDIA CUDA.
The Docker Compose integration you mentioned in the follow-up article is going to be huge for multi-service local AI stacks. LLM service + app service + vector DB, all in one compose file. That is the workflow developers have been waiting for.
Thank you!
How is it better than using LLM Studio?
Performance-wise, they both wrap llama.ccp (so we should expect similar performances), but LM Studio is more mature with multiple engines support and more OS support so far. The Model Runner will get there, but it will require some time.
The integration with the Docker technology (compose but also the Docker Engine for pipelines) is going to come soon, and it will give the model runner an advantage to developers to be well integrated in their development lifecycle.
🚀
Let's go!
Back in the day, Docker did the same thing with Kubernetes. You could just run K8s right there in Docker Desktop - it worked. Very handy for beginners.
I guess it's the same in this case too. 😄
100%. Thank you.
I would say this is true for a low managed machine, or a box that you will just used to expose LLMs, where installing ollama or llmstudio and maintaining the latest versions are not possible.
I'm not sure Ollama will give better perf since both Ollama and Docker Model Runner are running natively on the hardware.
The performances are very similar (of course). One of our captains compared both in a blog, you can check it out: connect.hyland.com/t5/alfresco-blo....
Thanks for the comment!