AI development is moving fast — but one thing that still slows developers down is relying on cloud APIs.
What if you could run powerful LLMs (Large Language Models) locally, just like spinning up a Docker container?
In this blog, I’ll show you how to:
- Run LLMs locally using Docker’s Model Runner
- Pull and run the Qwen 3 model
- Connect it to a .NET console app with Semantic Kernel
Let’s get started! 🚀
For a more in-depth walk-through, be sure to watch my YouTube video, where I demonstrate everything step by step!
🔹 Why Run LLMs Inside Docker?
Docker is already the standard for packaging and running applications. Now, with Docker Model Runner, you can:
- Pull AI models just like Docker images
- Run models locally without cloud APIs
- Expose OpenAI-compatible APIs to integrate seamlessly into apps
This means you can test, prototype, and even deploy AI workloads locally — cost-free and private.
🔹 Step 1: Enable Docker Model Runner
Open Docker Desktop → Settings → AI Features and enable:
- ✅ Docker Model Runner
- ✅ Host-side TCP support (on port
12434
)
This allows models to expose REST APIs you can call from your apps.
🔹 Step 2: Pull and Run Qwen 3 Model
Open your terminal and run:
# Pull the model
docker model pull qwen/qwen-3b
# Run the model
docker model run qwen/qwen-3b
You’ll see an interactive chat session where you can ask questions.
Example:
> Hello
< Hi! How can I assist you today?
🔹 Step 3: Use REST API
The model exposes endpoints on:
http://localhost:12434/v1
For example, to list available models:
curl http://localhost:12434/models
You’ll see details of Qwen 3 and any other models you’ve pulled.
🔹 Step 4: Connect with Semantic Kernel (.NET)
The best part is — you don’t need to rewrite your app.
Semantic Kernel already works with OpenAI APIs, and since Docker exposes the same structure, all you do is change the base URL.
Install packages
dotnet add package Microsoft.SemanticKernel
Create a Console App
using Microsoft.SemanticKernel;
var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
modelId: "qwen/qwen-3b",
apiKey: "dummy", // not required for local
endpoint: new Uri("http://localhost:12434/v1") // Docker endpoint
);
var kernel = builder.Build();
var result = await kernel.InvokePromptAsync("Explain Docker in simple terms.");
Console.WriteLine(result);
That’s it! Your .NET app is now talking to Qwen 3 inside Docker, through Semantic Kernel.
🔹 Why This Matters
- 🖥️ Local-first AI → Run LLMs without internet or billing
- 🔒 Privacy → Your data never leaves your machine
- ⚡ Developer-friendly → Same SDKs, same APIs, just a different base URL
- 🔗 Drop-in replacement → Move between local and cloud seamlessly
✅ Final Thoughts
Running AI models is now as easy as running containers.
With Docker Model Runner, you can:
- Pull Qwen 3
- Run it locally
- Connect it to .NET apps with Semantic Kernel
This opens up endless possibilities for building intelligent apps without cloud dependencies.
👉 What model are you planning to run first with Docker?
Drop a comment — I’d love to know!
Top comments (0)