DEV Community: Pradumna Saraf

Run AI Coding Agents Safely with Docker Sandboxes

Pradumna Saraf — Wed, 03 Jun 2026 13:15:02 +0000

AI agents can run commands, modify files, and download files from untrusted sources directly on a developer machine, which creates a major security risk. There needs to be a way to safely run agents and isolate how it interact with the network, files, host system, etc.

Docker Sandboxes solves this problem by creating isolated microVM environments where AI agents run safely with all the guardrails without affecting the host system. Docker Sandboxes support Claude Code, Codex, Cursor, etc. A complete list of agents can be found here.

Prerequisites

macOS Sonoma (version 14) or later
Apple silicon
Prior experience with the AI agents

Getting Started

Installing the Sandboxes CLI

Sandboxes have their own CLI. To install the sbx CLI on the system, execute the following command. We are using Homebrew, as we are on Mac. For other OSes, look at the documentation.

 brew install docker/tap/sbx

Once you have installed the CLI, execute the login command:

 sbx login

It will open a browser for the Docker OAuth. It's a one-time thing.

Setting the network policy

Since the sandboxes are network-isolated from the host, we can set network policy controls on what a sandbox can access over the network. And this is one of the key things why we are using it.

To set the network policy, we have to execute the following command:

sbx policy reset

You will be prompted to select a default network policy. Depending on how open or strict we are with our agents to have access to the network, we need to choose it.

I will be selecting Balanced, as it is a good starting point to have and going forward, we can modify. Balanced by default allows AI provider APIs, package managers, code hosts, container registries, and common cloud services. And we can extend it by command. We will see later in the section.

If we have chosen Open, it would allow all the traffic without any restriction. And Locked Down will lock all the outgoing traffic, and we need to explicitly allow everything we need. If we want to be really restrictive, Locked Down is the way.

To list which policies are in effect, we can run the below commad:

sbx policy ls

We get the output of all the domains that are allowed.

Authenticating the agent

Before we use any agents, agents need to store the credentials for their model provider to communicate. Most agents work with an API key. And for agents like Claude Code, if you have a Claude subscription, we can sign in with OAuth by doing /login. It is much more convenient, no API keys passing or any upfront setup needed.

We will use the /login, as we have a Claude subscription, but for the providers that have that facility or an API key is more convenient, we can use a secret set sub-command to do that.

For example, for OpenAI, it will look like this:

sbx secret set -g openai # Globally
sbx secret set my-sandbox openai #Project Level

You can set the secrets on the global level or the project level. The global level will be set for all the projects to have access to the same secret, and if we set it at the project level, only that particular project will have access.

I know we still haven't discussed the project inside the sandbox, which is the next step, because we need to authenticate it first. For now, we can set it globally, and later we can remove and change it depending on our needs.

Once you execute the above command, it will prompt you to enter the secret. Enter the secret, and it will save it.

Now, to list all the credentials and their scope, execute the command below:

 sbx secret ls

And to remove a credential:

 sbx secret rm -g openai

The real credential stays on the host; the sandbox sees only a sentinel value for the security model. You can learn more about how credential injection works and how custom secrets work here.

Creating a project and running the sandbox

Whenever you start a sandbox, it will create a project. In simple words, projects act as a separation when we are using multiple agents from various or the same providers.

Now we are all set to create our first project. First, we need to create a directory. Let's do that by executing the command:

mkdir my-project && cd my-project

Then let's finally run a sandbox by executing the command below. As I am using Claude, I will provide Claude as a provider. Depending on your provider, you just need to change the provider name.

sbx run claude

As you run, it will start pulling the agent image, which might take a little longer in the first run. Subsequent runs reuse the cached image and start in seconds.

Now we can give some prompts and see if it's working or not.

And it's working!

To test if it respects the network policy, let's try to prompt to fetch information from a blocked domain by default and one from the allow list.

You can see in the above image that it respected the policy. As I requested to fetch the info from my own website domain, pradumnasaraf.dev, it got a 403 forbidden error, and it was able to fetch from github.com because it's in the default list.

So, it's working as expected!

To see all the sandboxes that are running, execute the following ls command:

sbx ls

Managing the network policy

As we set above, Balanced is the default network policy; we can allow other networks to access the scope as we need. In this way, we are only allowing the domain that we want to access.

To allow a policy, we need to use policy allow like this:

 sbx policy allow network -g pradumnasaraf.dev

And a Policy ID will get printed. The ID can be used for removing the policy completely if we ever want to!

Now, it's allowed, let's try again to fetch details from our domain.

And this time it worked. You can see in the above image that it got 200 and gets all the details. And you can verify that it's in the allow list by doing sbx policy ls.

Above, we set our domain on a global level, but just like previously, you can choose on a project level!

Interactive mode

One of my favourite things is that we can also run Sandbox in interactive mode. And we can do similar things. Like managing projects, attaching the agent, opening the shell and managing the network policy.

That was it. That's how you can run your AI coding agents safely with Docker Sandboxes.

As always, I'm glad you made it to the end. Thank you for your support and reading. I regularly share tips on Twitter. You can connect with me there.

Run Claude Code Locally for Free with Docker Model Runner

Pradumna Saraf — Tue, 12 May 2026 03:56:26 +0000

We know that the Claude Code is phenomenal for development and code. But we can easily run out of tokens, and it becomes quickly expensive as your project becomes more complex. What if we can keep all the good parts about the Claude Code, but use the local models instead of clouds once from Anthropic?

Another reason we want to use the local models is that we have something proprietary or private that we don't want to expose to the cloud models, or we are in flight with no internet connection.

This is where Docker Model Runner is really useful; it helps us run the LLMs very easily locally on our machine, and we will then we will do some configuration to make it work with the Claude Code.

Prerequisites

Before we begin, make sure you have:

Docker Desktop or Docker Engine installed.
Docker Model Runner enabled.
Claude Code is installed and ready to go.

If you're on Docker Desktop, head over to Settings > AI and enable TCP access for Model Runner.

Or, if you prefer the terminal:

docker desktop enable model-runner --tcp 12434

Getting Started

1. Choosing and pulling a local model

There are a load of LLMs to choose from. I'll go with ai/phi4:14B-Q4_K_M, but you can pick whatever fits your machine can handle.

You can find all the models here in the DocketHub AI catalogue. Make sure that whenever you choose a model that is good on the coding side.

To pull the model, execute the command below. Pull time depends on the size of the model.

docker model pull ai/phi4:14B-Q4_K_M

2. Checking the connection

Using docker model sub-commands, we can check various things like the status and the model we have pulled. It's very similar to how we work with Docker images and containers.

Run the command below to check the model status and the list of models we have

docker model status
docker model ls

3. Testing the endpoint

Before we jump to test and use Claude Code, we should confirm that the API is actually responding. We can curl and send the request to the /v1/messages endpoint, and check that.

This is how the curl structure will look. Let's execute it in the terminal.

curl http://localhost:12434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/phi4:14B-Q4_K_M",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

We will get a response like this. I am using jq to better format the output.

4: Pointing Claude Code at the local endpoint

It is very simple. We just need to tell Claude Code to use the local API instead of Anthropic's. We can do it by setting up a variable and a model name.

Set the ANTHROPIC_BASE_URL environment variable to use the Docker Model Runner endpoint and pass the model name we pull with --model. Execute the command below to set that:

ANTHROPIC_BASE_URL=http://localhost:12434 claude --model ai/devstral-small-2

That's about it. Claude Code is now pointing and running against your local model. You can also see the model being used by the Claude Code.

5: Adding the Shell config

As we know, the environment variable ANTHROPIC_BASE_URL is not persistent, and only lives for that session of the terminal. Setting the env variable every time is annoying.

To make it permanent so that every time you openup and new terminal, you have the same setup, we need to add the below shell config (~/.zshrc, ~/.bashrc, etc):

export ANTHROPIC_BASE_URL=http://localhost:12434

After you've done this. Restart your terminal, now the Claude Code will always use your local endpoint when you pass --model.

6. Using the Claude Code

Now, everything is set. Let's run Claude's code. To run with the local model, pass the same model flag and name, like this:

claude --model ai/phi4:14B-Q4_K_M

And we can give some simple tasks to check it's working:

6. Watching the requests flow

If you are a bit nerdy like me, and want to see what is happening under the hood. You can actually watch every request Claude Code is making to your local model:

To do that, execute the command below:

docker model requests --model ai/phi4:14B-Q4_K_M

Again, we used jq for better formatting.

7: What next?

The default context size on most models is fine for small tasks, but Claude Code reads a lot of files. For big project work, you'll want more headroom and a bigger context.

For example, to package gpt-oss with a 32k context window:

docker model pull ai/gpt-oss
docker model package --from ai/gpt-oss --context-size 32000 gpt-oss:32k

Then run Claude Code with the new variant:

claude --model gpt-oss:32k

And this is the game: we keep trying and experimenting with different models and context sizes until we find a perfect model for a task.

That was it. That's how you can run Claude Code completely locally with Docker Model Runner.

Give it a try, and let me know what model works best for you.

As always, I'm glad you made it to the end. Thank you for your support and reading. I regularly share tips on Twitter. You can connect with me there.

Using Profiles with Docker Compose

Pradumna Saraf — Thu, 29 Jan 2026 04:19:39 +0000

Most applications don’t need all Docker Compose services running all the time with the core application, such as development tools, like monitoring and debugging. For example, in a full-stack application, we want the backend, database, and maybe a frontend running by default, and keep monitoring and debugging tools turned off until we need them.

Docker Compose profiles make this easy. It saves us from creating multiple compose files with different configurations and managing them. With this approach, we can achievea single source file as a single source of truth.

Monitoring with Compose profiles

Let's look at an example to understand how we can use profiles with Docker Compose with a real-world example. Let's say we have a full-stack application with a backend, database, and a frontend. We want to run the backend, database, and frontend by default, and keep monitoring and debugging tools turned off until we need them.

services:
  backend:
    image: node:20-alpine
    command: npm run start

  frontend:
    image: node:20-alpine
    command: npm run dev

  db:
    image: mysql:8

  prometheus:
    image: prom/prometheus
    profiles: [monitoring]

  grafana:
    image: grafana/grafana
    profiles: [monitoring]

  phpmyadmin:
    image: phpmyadmin
    depends_on: [db]
    profiles: [debug]

In this setup, the backend, frontend, and database form the core of the application and are started by default because we didn't assign any profiles to them.

Prometheus and Grafana are grouped under the monitoring profile, as we only need them if we want to look at the metrics or performance. About phpMyAdmin, it is grouped under the debug profile, and it’s only necessary when we need to debug database issues.

So, by default, only the core services start, and the monitoring and debugging tools are turned off.

docker compose up

When we need monitoring and debugging, we can start the services by using the --profile flag.

docker compose --profile monitoring up # to start monitoring tools
docker compose --profile debug up # to start debugging tools

We can also combine the profiles to start multiple services at once:

docker compose --profile monitoring --profile debug up

That’s it. This approach keeps your Compose file clean and avoids running unnecessary services.

As always, I'm glad you made it to the end. Thank you for your support and reading. I regularly share tips on Twitter (It will always be Twitter ;)). You can connect with me there.

Improving Container Security with Docker Hardened Images

Pradumna Saraf — Mon, 22 Dec 2025 05:30:21 +0000

Container security remains a significant concern. Base images are bloated and contain unnecessary/or too many tools and packages. Due to this, container images like Node, Ubuntu, etc, have a large attack surface in production. More packages = more CVEs, and it’s hard to track which is going inside and which tool is getting hit by vulnerabilities. And we have recently heard a lot of attacks on various companies and tools.

Yes, security scanners do their job and report these issues, but they don’t reduce the attack surface by themselves. That issue still falls on the image you choose. We need to STOP the habit of “Scan and fix later”, and reduce the risk and make things secure at the image level. To deal with that, Docker made Docker Harden Image (DHI) FREE for everyone (here is a blog release for this). Back when it was released, it was under the paywall, and the Docker team thought security should be available to everyone, and everyone should have access to it and make their application secure.

In this blog, we will look at what DHI is, the problem it solves. Then I will walk you through a demo application to show how to use DHI with your current Dockerfile and workflow. Finally, we will compare in standard image with DHI to get a clear picture of its potential and necessity.

What is a Docker Hardened Image (DHI)?

Docker Hardened Image or DHI (we will be calling “DHI” throughout the blog) is a base image which is Open Source, ultra-minimal, with near-zero CVEs, full transparency (SBOMs and Provenance), and built on top of distros like Alpine and Debian with SLSA Build L3. So, using the DHI will reduce the attack surface, making the production secure by default, and Less noise on the scanning side, and fewer things to manage.

To make the discovery, usability, and transparency simpler. Docker built a dedicated DHI catalogue. There are thousands of Hardened images with various versions for the tool or language. You can visit dhi.io (yes, they went ahead and got this domain ^^, how cool is that).

One of my favourites feature in the whole DHI catalogue thing is the “Tool Included” section on the website. In many images, you will find a dedicated column on the right with a list of tools included in that image. This brings a lot of transparency and ease.

There are a lot of Hardened images in the market, and calling an image “Hardened” actually does not make it hardened. Here is a really nice comparison of DHI vs Others:

Source: Docker.com

Using the Docker Hardened Image

Using the DHI is the same as using the Docker official images. There is no change in how we used to specify the base image in the Dockerfile, and the command to build and run images. It’s the same workflow that we use every day. The only thing that has changed is the base image naming, which now points to a new dhi.io registry.

We will see all in detail in this demo. For that, I have created a simple Node-Express demo repo. You can clone and keep it handy if you want to follow along and test it out. Once you are done with that, first, we need to sign in to the DHI registry.

To do that, execute the command below, and you will be prompted to enter your DockerHub username and password. Use your personal access as your password.

docker login dhi.io

Once you successfully log in. Then we can pull the DHI for Node.js, as our project is using Node. We will be pulling the Node 22 DHI, and the image for that will be dhi.io/node:22. The 22 here refers to the latest version 22.x of Node. If you are using a different version of Node, you can check the catalogue here and chnage accordingly.

To pull, we use the same Docker pull command to pull the image:

 docker pull dhi.io/node:22

Now, let’s modify our Dockerfile to make use of DHI instead of the standard official image. You will find the Dockerfile in the clone report as well.

We have to make a couple of changes to make the application work with DHI and be secure. The current Dockerfile looks like below, it’s a simple and typical Docker file we used:

# Use the official Node.js image
FROM node:22

WORKDIR /app

# Copy package files and install dependencies
COPY package*.json ./
RUN npm install --production

# Copy application code
COPY . .

EXPOSE 3000

ENV PORT=3000
ENV NODE_ENV=production

CMD ["npm", "start"]

The modifications are changing the base image in the FROM statement and using Multi-stage Docker Build (For this particular case, not mandatory while using DHI; we will see why we did that).

After modification, the Dockerfile will have below structure. Let’s understand the modifications in more detail below.

# Build stage - use regular node image for building
FROM node:22 AS builder

WORKDIR /app

COPY package*.json ./
RUN npm install --production

COPY . .

# Production stage - use DHI
FROM dhi.io/node:22

WORKDIR /app

# Copy node_modules and app files from builder
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package*.json ./
COPY --from=builder /app/server.js ./

EXPOSE 3000

ENV PORT=3000
ENV NODE_ENV=production

CMD ["node", "server.js"]

First, coming to why we use Multi-stage builds, yes, using it makes the image minimal and more secure, and that’s the biggest advantage of using that, but in this case, it becomes mandatory because in Node DHI shell is not installed by default, for security, and the RUN will not work when we try to install the npm dependencies.

Which is why in the Builder stage. We used the node:22 image and used RUN to install all the dependencies, and then in the Production stage, in the base image (FROM), we used Node DHI for the 22.x version. Docker made it simple to use the DHI, just by prefixing the base image name with dhi.io/, and it will use DHI instead of the Docker official image (Make sure you first check the availability).

Then, finally, we simply copied the artefacts like node_modules, package.json, and server.js from the builder to production and exposed the port and ran the server. The gist is that nothing has changed. Only the naming convention for the base image has changed.

Comparing and scanning the images

Of course, we can’t end the blog with comparing and getting those numbers. We, developers, love numbers :) So, I built two images. The first image pradumnasaraf/node-without-dhi is built with the first Dockerfile above, and uses node:22. And then we built the pradumnasaraf/node-with-dhi image with the other Dockerfile, which uses the dhi.io/node:22 DHI.

Then I used Docker Scout and ran docker scout quickview for both images to check how vulnerable each image is, and the result is expected, but still shocking. In the screenshot below, the number of High and Medium vulnerabilities the first image contains is magnificent. And DHI has just had 8 Low, that’s a huge leap in overall security!

Note: This is not the end of the security/vulnerability optimisation :) This was just to demo what and how to use DHI. The Dockerfile can be improved further by introducing best practices, such as running containers as a non-root user, tightening permissions, etc.

That was it. That’s how you use Docker Hardened Images to make your container secure. Remember, most of the container security issues start with the base image, and fixing them later is like never fixing them. Docker is here again, with their Hardened image, saving the developers and keeping the experience and the workflow simpler. Again, thanks, team Docker, for making this available to everyone :).

As always, I'm glad and super thankful that you made it to the end. Thank you for your support and reading. I regularly share tips on Twitter. You can connect with me there.

Running AI Models with Docker Compose

Pradumna Saraf — Tue, 19 Aug 2025 04:39:59 +0000

Docker Compose has completely changed the game in how we run and connect a multi-service application. Just execute a single line of command, and everything is up and running, and all the services are well interconnected.

When Docker introduced the Docker Model Runner (Or DMR, we call it internally in Docker), there was a missing piece (at least for me). To use an AI model with a Compose application, we separately need to run the model with DMR and then connect our Compose application service by passing the config of that running model.

But Docker knew this, and it sorted it out by adding the capability to describe an AI model in YAML, compose.yml to run and destroy the AI model on demand. Like we write and do the configuration for services, networks, and volumes. We can do the same for the AI models with models.

Prerequisite

Docker and Docker Compose are installed
Understanding of AI and LLMs

Getting Started

Let’s get started. To have a better understanding of the concept and working, I have created a GitHub project: Pradumnasaraf/Saraf-AI (Yes, it’s my last name “Saraf” and I added “AI” to it :)). It’s a Next.js chat application that communicates with the Docker AI Model with the help of the OpenAI framework. You can clone it down and keep it ready; we will be referencing that many times.

Docker Compose AI models component

First, let’s have a look at the compose.yml. Like we are familiar with the services, volumes, etc, we have defined models as the top-level element. This is the new element for defining AI models.

So what we have done is define a service named saraf-ai that utilises the model llm. We have defined models as an element. And the model definition for llm that references the ai/smollm2 model image.

The complete config can be found in compose.yml in the root of the repo.

# compose.yml
services:
  saraf-ai:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - 3000:3000
    # Models to run
    models:
      - llm

models:
  # Model Name
  llm:
    # Model Image
    model: ai/smollm2

Now we understand how the config looks, but how can our app connect and communicate with this AI model? How are we setting up environment variables like model name, URL and API key, as we will be using the OpenAI specification?

This is where Docker shines!

As we add config to use a model in a service, Docker will auto-generate and inject two environment variables into our service application based on the model name (in our case, llm). So the two variables will be:

LLM_MODEL: Contains the model name.
LLM_URL: Contains the model endpoint to communicate with.

Now we can reference these in our application and use them. If that sounds confusing, you can read more about them here.

Also, if we are using multiple AI models and we want to explicitly define how the variable naming should be. For example, we are defining two models below.

services:
  app:
    image: my-app
    models:
      llm:
        endpoint_var: AI_MODEL_URL
        model_var: AI_MODEL_NAME
      embedding-model:
        endpoint_var: EMBEDDING_URL
        model_var: EMBEDDING_NAME

models:
  llm:
    model: ai/smollm2
  embedding-model:
    model: ai/all-minilm

Now, instead of the default LLM_Model and LLM_URL, the application will be injected with AI_MODEL_URL and AI_MODEL_NAME. And for embedding-model, it will inject EMBEDDING_URL and EMBEDDING_NAME.

Now, let’s look at our Next.js application.

Application config

We have created a Next.js application and are using the OpenAI framework (which is standard in the industry) to communicate with the Docker AI model. And it will automatically pick up those environment variables that Docker injected into the application.

We don’t need apiKey, as it’s not a cloud LLM and quota kind of thing.

Below is the complete code. You will also find the complete code in the src/app/api/chat/route.ts file.

import OpenAI from 'openai';
import { NextResponse } from 'next/server';

const openai = new OpenAI({
  baseURL: process.env.LLM_URL || '',
  apiKey: "key-not-needed"
});

const model = process.env.LLM_MODEL || '';

export async function POST(req: Request) {
  try {
    const { message, messages } = await req.json();

    // Validate input
    if (!message || typeof message !== 'string') {
      return NextResponse.json(
        { error: 'Message is required and must be a string' },
        { status: 400 }
      );
    }

    if (!Array.isArray(messages)) {
      return NextResponse.json(
        { error: 'Messages must be an array' },
        { status: 400 }
      );
    }

    const stream = await openai.chat.completions.create({
      messages: [...messages, { role: 'user', content: message }],
      model,
      stream: true,
      temperature: 0.7,
      max_tokens: 2000,
    });

    return new Response(
      new ReadableStream({
        async start(controller) {
          try {
            for await (const chunk of stream) {
              const text = chunk.choices[0]?.delta?.content || '';
              if (text) {
                controller.enqueue(new TextEncoder().encode(text));
              }
            }
          } catch (streamError) {
            console.error('Streaming error:', streamError);
            controller.error(streamError);
          } finally {
            controller.close();
          }
        },
      }),
      {
        headers: {
          'Content-Type': 'text/plain; charset=utf-8',
          'Cache-Control': 'no-cache',
          'Connection': 'keep-alive',
        },
      }
    );
  } catch (error: unknown) {
    console.error('OpenAI API error:', error);

    const errorMessage = error instanceof Error ? error.message : 'Unknown error';
    const errorStatus = (error as { status?: number })?.status || 500;

    return NextResponse.json(
      { 
        error: 'Failed to get response from AI',
        details: errorMessage,
      },
      { status: errorStatus }
    );
  }
}

Dockerizing the application

Now, let’s Dockerize our application. For that, we will create a Dockerfile.

You will find the Dockerfile file in the root of the project.

# Build stage
FROM node:24-alpine AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code
COPY . .

# Build the application
RUN npm run build

# Production stage
FROM node:24-alpine AS runner

WORKDIR /app

# Create a non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

# Copy built application from builder stage
COPY --from=builder /app/public ./public
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static

# Set ownership to nextjs user
RUN chown -R nextjs:nodejs /app

USER nextjs

EXPOSE 3000

ENV PORT 3000
ENV HOSTNAME "0.0.0.0"

CMD ["node", "server.js"]

We have implemented a couple of best practices, such as multi-stage builds and a non-root user, to make the container image smaller, faster, and more secure.

Once we are done with that, now, let’s run the Compose application by executing docker compose up command in the terminal. You will see a similar output in the terminal as shown in the screenshot.

Now, we can head over to localhost:3000 in our browser and test out the application. You will have a chat window like ChatGPT, type your prompt and ask questions.

Here is a short demo:

That was it. That’s how you can run Running AI Models with Docker Compose.

As always, I'm glad you made it to the end. Thank you for your support and reading. I regularly share tips on Twitter (It will always be Twitter ;)). You can connect with me there.

Run MCP Servers In Seconds With Docker

Pradumna Saraf — Mon, 23 Jun 2025 07:55:10 +0000

Model Context Protocol (MCP) has taken the AI world by storm. It has become the de facto standard for how an AI Agent connect with tools, services, and data. As this is shaping up rapidly, working with different MCP servers, setting them up is still not an easy task, and it requires a learning curve. Docker has a track record of making developers’ lives easier to make, build and ship things faster and again it chimes in to the MCP space, bringing that same clarity, trust, and scalability. That’s exactly what Docker is doing with and introduction of Docker MCP Catalog and Docker MCP Toolkit after the Docker Model Runner (if you haven’t checked it out, here is the link).

In this blog, we will first under what Docker MCP Catalog and MCP Toolkit are. Then we will see step-by-step how we can use Docker MCP Toolkit using Docker Desktop to interact with various tools using MCP Clients offered by Claude, Cursor, etc.

What is Docker MCP Catalog?

Docker MCP Catalog is a trusted collection of MCP servers. Currently has verified tools from 100 verified (and the number keeps bumping while writing this) tools publishers like Stripe, Elastic, Grafana, etc. And the tools are it’s just like container images, that means like traditional pull mechanism, we can pull and use it (or use MCP toolkit for UI perks, more on that later) without any hassle to find and configure it manually.

What is Docker MCP Toolkit?

With Docker MCP Toolkit, with a single click of a button from Docker Desktop, we can spin MCP servers in seconds and connect to our favourite client like Cluade, Cursor, Windsurf, Docker AI Agent, etc. The way it works is that a Gateway MCP Server is created and dynamically exposes enabled tools to compatible clients. This makes it so easy to manage all the tools in one place.

Using Docker MCP Toolkit

Let’s now test Docker MCP Toolkit. Make sure you have the latest version of Docker Desktop. My current version is Docker Desktop (Mac) is 4.43.0 (196668). Once you open it, you will see the MCP Toolkit button on the sidebar. Initially, it was shipped as an extension; now it’s baked into the Docker Desktop itself.

Now let’s install/turn on some MCP servers like curl and Wikipedia. You can search and add it. It’s that simple. It’s really handy to add and remove when needed. No copying and pasting of manual config, and managing them.

Now, let’s connect the Dockerized MCP servers to our MCP clients. I will be using Claude; you can use any according to your preference. We simply need to click on the Connect button, and it will automatically add the Docker configuration to Claude Desktop's MCP server config claude_desktop_config.json file. The same goes for other MCP Clients.

Let’s open Claude and see it. It will be Settings > Developer > MCP_DOCKER. As you can see, it’s running, which means everything is correctly configured. If we click on the Edit Config button, we can see the config and how it works.

Config:

{
   "mcpServers":{
      "MCP_DOCKER":{
         "command":"docker",
         "args":[
            "mcp",
            "gateway",
            "run"
         ]
      }
   }
}

Let’s close the config and open the chat screen on Claude. Now, click on the Search and Tools option to see all the MCP servers, for just, it’s just one, MCP_DOCKER,, having 10 tools. If you are not seeing it, completely close down Claude and re-open it, and it will start showing up.

We can click on the arrow next to 10 to see all the available tools.

Now, let’s test it out.

To test curl, I will ask whether the website is up or not. When you enter the prompt, you might get a pop-up saying “Claude would like to use an external integration”; it is just to determine whether you want to use the MCP tools or not. You can either choose, always allow or allow once, depending on your preference.

Let’s now search for some history so that it uses the Wikipedia tool. As you can see, it is called both search_wikipedia and get_wikipedia and gives the result.

That was it. That’s how you can use MCP Servers with less hassle and focus more on development and solving problems instead of worrying about managing them.

As always, I'm glad you made it to the end. Thank you so much for your support. I regularly share tips on Twitter (It will always be Twitter ;)). You can connect with me there.

Monitoring Go Applications Using Prometheus, Grafana, and Docker

Pradumna Saraf — Mon, 21 Apr 2025 09:28:08 +0000

Monitoring is important for any application. It helps us ensure that our application is running smoothly and allows us to detect any issues before they become critical. Because in real case scenarios, we are running multiple services, and it's hard to test each service and check if it's working. That is why we set up monitoring to make our lives easier.

In the blog, we will create a Golang application that will be monitored using Prometheus and Grafana. We will be using the go-prometheus library to expose metrics from our Golang application. Then will visualise the metrics using Grafana. We will be using Docker and Docker Compose to run our application and the monitoring stack, and connect them.

Prerequisites

A good understanding of Golang
A good understanding of Docker and Docker Compose
A good knowledge of Prometheus and Grafana

Getting Started

For better understanding, we will be breaking the blog into multiple sections.

Create a Golang Application

Let's first create a Golang server. Create a new directory and initialise a new Golang project by go mod init <project-name>. Then create a new file main.go and add the following code to it:

We will be breaking down and understanding each part of the code. Giving the complete code before is for better understanding and having clear pictures of how different pieces of the code are connected.

package main

import (
    "strconv"

    "github.com/gin-gonic/gin"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

// Define metrics
var (
    HttpRequestTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
        Name: "api_http_request_total",
        Help: "Total number of requests processed by the API",
    }, []string{"path", "status"})

    HttpRequestErrorTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
        Name: "api_http_request_error_total",
        Help: "Total number of errors returned by the API",
    }, []string{"path", "status"})
)

// Custom registry (without default Go metrics)
var customRegistry = prometheus.NewRegistry()

// Register metrics with custom registry
func init() {
    customRegistry.MustRegister(HttpRequestTotal, HttpRequestErrorTotal)
}

func main() {
    router := gin.Default()

    // Register /metrics before middleware
    router.GET("/metrics", PrometheusHandler())

    router.Use(RequestMetricsMiddleware())
    router.GET("/health", func(c *gin.Context) {
        c.JSON(200, gin.H{
            "message": "Up and running!",
        })
    })
    router.GET("/v1/users", func(c *gin.Context) {
        c.JSON(200, gin.H{
            "message": "Hello from /v1/users",
        })
    })

    router.Run(":8000")
}

// Custom metrics handler with custom registry
func PrometheusHandler() gin.HandlerFunc {
    h := promhttp.HandlerFor(customRegistry, promhttp.HandlerOpts{})
    return func(c *gin.Context) {
        h.ServeHTTP(c.Writer, c.Request)
    }
}

// Middleware to record incoming requests metrics
func RequestMetricsMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        path := c.Request.URL.Path
        c.Next()
        status := c.Writer.Status()
        if status < 400 {
            HttpRequestTotal.WithLabelValues(path, strconv.Itoa(status)).Inc()
        } else {
            HttpRequestErrorTotal.WithLabelValues(path, strconv.Itoa(status)).Inc()
        }
    }
}

Now, you can execute go mod tidy in the terminal to install all the dependencies we mentioned.

package main

import (
    "strconv"

    "github.com/gin-gonic/gin"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

// Define metrics
var (
    HttpRequestTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
        Name: "api_http_request_total",
        Help: "Total number of requests processed by the API",
    }, []string{"path", "status"})

    HttpRequestErrorTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
        Name: "api_http_request_error_total",
        Help: "Total number of errors returned by the API",
    }, []string{"path", "status"})
)

In the above code, we have imported the required packages. For creating the server, we will use the gin-gonic, and prometheus/client_golang for exposing the metrics. After that, we have created two variables to define the metrics. The first one is HttpRequestTotal which will count the total number of requests processed by the API. The second one is HttpRequestErrorTotal which will count the total number of errors returned by the API. Both of them are of the type CounterVec , which is a type of metric that counts the number of occurrences of an event. We have also defined two labels for both metrics: path and status. The label path will contain the path of the request, and the status label will contain the status code of the response.

// Custom registry (without default Go metrics)
var customRegistry = prometheus.NewRegistry()

// Register metrics with custom registry
func init() {
    customRegistry.MustRegister(HttpRequestTotal, HttpRequestErrorTotal)
}

func main() {
    router := gin.Default()

    // Register /metrics before middleware
    router.GET("/metrics", PrometheusHandler())

    router.Use(RequestMetricsMiddleware())
    router.GET("/health", func(c *gin.Context) {
        c.JSON(200, gin.H{
            "message": "Up and running!",
        })
    })
    router.GET("/v1/users", func(c *gin.Context) {
        c.JSON(200, gin.H{
            "message": "Hello from /v1/users",
        })
    })

    router.Run(":8000")
}

In this section of code, we have created a custom registry to register the metrics with the variable customRegistry. The reason we are creating a custom registry is to avoid registering the default Golang metrics. The default Golang metrics are registered with the default registry, which is used by the promhttp handler. By creating a custom registry, we can register our metrics and avoid the default Go metrics.

We created a new gin router and registered the /metrics endpoint before the middleware. The reason we are registering the /metrics endpoint before the middleware is to ensure that the metrics are collected before the middleware is executed. After that, we have created two endpoints: /health and /v1/users. The endpoint /health will return a JSON response with the message "Up and running!" and the endpoint /v1/users will return a JSON response with the message "Hello from /v1/users". Finally, we have started the server on port 8000.

// Custom metrics handler with custom registry
func PrometheusHandler() gin.HandlerFunc {
    h := promhttp.HandlerFor(customRegistry, promhttp.HandlerOpts{})
    return func(c *gin.Context) {
        h.ServeHTTP(c.Writer, c.Request)
    }
}

// Middleware to record incoming requests metrics
func RequestMetricsMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        path := c.Request.URL.Path
        c.Next()
        status := c.Writer.Status()
        if status < 400 {
            HttpRequestTotal.WithLabelValues(path, strconv.Itoa(status)).Inc()
        } else {
            HttpRequestErrorTotal.WithLabelValues(path, strconv.Itoa(status)).Inc()
        }
    }
}

Lastly, we have created a custom metrics handler with the custom registry. The PrometheusHandler function returns a gin.HandlerFunc value that is used to serve the metrics. The RequestMetricsMiddleware function is a middleware that records the incoming requests’ metrics. It gets the path of the request and the status code of the response and increments the corresponding metric.

The function c.Next() is used to call the next middleware in the chain. After that, we get the status code of the response and check if it is less than 400. If it is, we increment the HttpRequestTotal metric. If it is greater than or equal to 400, we increment the HttpRequestErrorTotal metric. The WithLabelValues function is used to set the label values for the metric. The Inc() function is used to increment the metric by 1.

Run the Application

Now we have created the application. Let's run the application to check if it's registered the metrics correctly. Make sure you are in the root directory of the project and run the following command:

go run main.go

This command will start the server on port 8000. You can check if the server is running by opening your browser and going to http://localhost:8000/health. You should see a JSON response with the message "Up and running!". If you can see the message, then the server is running fine. You can also check the /v1/users endpoint by going to http://localhost:8000/v1/users. You should see a JSON response with the message "Hello from /v1/users".

Now, let's check if the metrics are registered correctly. You can do that by going to http://localhost:8000/metrics. You will see similar output like this:

# HELP api_http_request_error_total Total number of errors returned by the API
# TYPE api_http_request_error_total counter
api_http_request_error_total{path="/",status="404"} 1
api_http_request_error_total{path="//v1/users",status="404"} 1
api_http_request_error_total{path="/favicon.ico",status="404"} 1
# HELP api_http_request_total Total number of requests processed by the API
# TYPE api_http_request_total counter
api_http_request_total{path="/health",status="200"} 2
api_http_request_total{path="/v1/users",status="200"} 1

You will see the metrics that we have defined in the code. The api_http_request_total metric will show the total number of requests processed by the API and the api_http_request_error_total. The metric will show the total number of errors returned by the API. You can also see the labels for both metrics: path and status. The label path will contain the path of the request and the label status will contain the status code of the response.

This validates that our application is working fine and the metrics are registered correctly. Now we will be creating a Dockerfile to run the application in a Docker container. Later we will also be using Docker Compose to run the application and the monitoring stack together.

Dockerize the Application

In the root directory of the project, create a new file called Dockerfile, and add the following code to it:

FROM golang:1.24-alpine AS builder
# Set environment variables
ENV CGO_ENABLED=0 \
    GOOS=linux \
    GOARCH=amd64
# Set working directory inside the container
WORKDIR /build
# Copy go.mod and go.sum files for dependency installation
COPY go.mod go.sum ./
# Download dependencies
RUN go mod download
# Copy the entire application source
COPY . .
# Build the Go binary
RUN go build -o /app .
# Final lightweight stage

FROM alpine:3.17 AS final
# Copy the compiled binary from the builder stage
COPY --from=builder /app /bin/app
# Expose the application's port
EXPOSE 8000
# Run the application
CMD ["bin/app"]

Understanding the Dockerfile

Let's understand the Dockerfile. We will be using a multi-stage build to create a lightweight and secure Docker image. The multi-stage build allows us to separate the build environment from the runtime environment, which results in a smaller final image size. This is especially useful for Go applications, as we can build a static binary and then copy it to a minimal base image.

Build stage:

FROM golang:1.24-alpine AS builder
# Set environment variables
ENV CGO_ENABLED=0 \
    GOOS=linux \
    GOARCH=amd64
# Set working directory inside the container
WORKDIR /build
# Copy go.mod and go.sum files for dependency installation
COPY go.mod go.sum ./
# Download dependencies
RUN go mod download
# Copy the entire application source
COPY . .
# Build the Go binary
RUN go build -o /app .

This stage uses the official Golang Alpine image as the base and sets the necessary environment variables. It also sets the working directory inside the container, copies the go.mod and go.sum files for dependency installation, downloads the dependencies, copies the entire application source, and builds the Go binary.

We use the golang:1.24-alpine image as the base image for the build stage. The CGO_ENABLED=0 environment variable disables CGO, which is useful for building static binaries. We also set the GOOS and GOARCH environment variables to linux and amd64, respectively, to build the binary for the Linux platform.

Final stage:

# Final lightweight stage
FROM alpine:3.17 AS final
# Copy the compiled binary from the builder stage
COPY --from=builder /app /bin/app
# Expose the application's port
EXPOSE 8000
# Run the application
CMD ["bin/app"]

This stage uses the official Alpine image as the base and copies the compiled binary from the build stage. It also exposes the application's port and runs the application.

We use the alpine:3.17 image as the base image for the final stage. We copy the compiled binary from the build stage to the final image. We expose the application's port using the EXPOSE instruction and run the application using the CMD instruction.

Apart from the multi-stage build, the Dockerfile also follows best practices such as using the official images, setting the working directory, and copying only the necessary files to the final image. We can further optimise the Dockerfile by other best practices.

Build the Docker Image

Let's build and run the Docker image. In the root directory of the project, run the following command:

docker build -t go-prom-monitor .

Now that the image is built, we can run the Docker container. Run the following command to run the Docker container:

docker run -d -p 8000:8000 --name go-prom-monitor go-prom-monitor

Now, like we did before, you can check if the server is running by opening your browser and going to /health and /v1/users. You should see the same JSON response as before. You can also check the /metrics endpoint by going to http://localhost:8000/metrics. You should see the same metrics as before.

If you can see the same metrics, then our application inside the Docker container is running as expected. And we are good to go with the next step. Now we will be creating a Docker Compose file to run the application and the monitoring stack together.

Connecting the Application with Prometheus and Grafana

Before jumping into the Docker Compose file, why even we are bothering to use Docker Compose? We can run Prometheus and Grafana separately and connect them to the application. But it's all manual, and there can be chances of errors. So, using Docker Compose, we can convert all the services into a single command and obtain more Infrastructure as code. This will help us in the future to scale the application and add more services to it.

Let's get into it.

In the root directory of the project, create a new file called compose.yml (Yes, the new conversion is compose.yml. You are welcome.) and add the following code to it:

services:
  api:
    container_name: go-api
    build:
      context: .
      dockerfile: Dockerfile
    image: go-api:latest
    ports:
      - 8000:8000
    networks:
      - go-network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
    develop:
      watch:
        - path: .
          action: rebuild

  prometheus:
    container_name: prometheus
    image: prom/prometheus:v2.55.0
    volumes:
      - ./Docker/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - 9090:9090
    networks:
      - go-network

  grafana:
    container_name: grafana
    image: grafana/grafana:11.3.0
    volumes:
      - ./Docker/grafana.yml:/etc/grafana/provisioning/datasources/datasource.yaml
      - grafana-data:/var/lib/grafana
    ports:
      - 3000:3000
    networks:
      - go-network
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=password

volumes:
  grafana-data:

networks:
  go-network:
    driver: bridge

Also, create a new directory Docker in the root directory of the project. Inside the Docker directory, create two new files called prometheus.yml and grafana.yml.

Add the following code to the prometheus.yml file:

global:
    scrape_interval: 10s
    evaluation_interval: 10s
scrape_configs:
    - job_name: myapp
        static_configs:
            - targets: ["api:8000"]

And add the following code to the grafana.yml file:

apiVersion: 1
datasources:
- name: Prometheus (Main)
    type: prometheus
    url: http://prometheus:9090
    isDefault: true

We will understand why we have created the Docker directory and the prometheus.yml and grafana.yml files in the next section. For clarity, the directory structure of the project should look like this:

├── Docker
│   ├── grafana.yml
│   └── prometheus.yml
├── Dockerfile
├── compose.yml
├── go.mod
├── go.sum
└── main.go

Understanding the Docker Compose File

The Docker Compose file consists of three services:

Golang application service: This service builds the Golang application using the Dockerfile and runs it in a container. It exposes the application's port 8000 and connects to the go-network network. It also defines a health check to monitor the application's health. We have also used healthcheck to monitor the health of the application. The health check runs every 30 seconds and retries 5 times if the health check fails. The health check uses the curl command to check the /health endpoint of the application. Apart from the health check, we have also added a develop section to watch the changes in the application's source code and rebuild the application using the Docker Compose Watch feature.
Prometheus service: This service runs the Prometheus server in a container. It uses the official Prometheus image prom/prometheus:v2.55.0. It exposes the Prometheus server on a port 9090 and connects to the go-network network. We have also mounted the prometheus.yml file from the Docker directory that is present in the root directory of our project. The prometheus.yml file contains the Prometheus configuration to scrape the metrics from the Golang application. This is how we connect the Prometheus server to the Golang application.

global:
  scrape_interval: 10s
  evaluation_interval: 10s

scrape_configs:
  - job_name: myapp
    static_configs:
      - targets: ["api:8000"]

In the prometheus.yml file, we have defined a job myapp to scrape the metrics from the Golang application. The targets field specifies the target to scrape the metrics from. In this case, the target is the Golang application running on port 8000. The api is the service name of the Golang application in the Docker Compose file. The Prometheus server will scrape the metrics from the Golang application every 10 seconds.

Grafana service: This service runs the Grafana server in a container. It uses the official Grafana image grafana/grafana:11.3.0. It exposes the Grafana server on a port 3000 and connects to the go-network network. We have also mounted the grafana.yml file from the Docker directory that is present in the root directory of your project. The grafana.yml file contains the Grafana configuration to add the Prometheus data source. This is how we connect the Grafana server to the Prometheus server. In the environment variables, we have set the Grafana admin user and password, which will be used to log in to the Grafana dashboard.

apiVersion: 1
datasources:
- name: Prometheus (Main)
  type: prometheus
  url: http://prometheus:9090
  isDefault: true

In the grafana.yml file, we have defined a Prometheus data source named Prometheus (Main). The type field specifies the type of the data source, which is prometheus. The url field specifies the URL of the Prometheus server to fetch the metrics from. In this case, the URL is http://prometheus:9090. prometheus is the service name of the Prometheus server in the Docker Compose file. The isDefault field specifies whether the data source is the default data source in Grafana.

Apart from the services, the Docker Compose file also defines a volume grafana-data to persist the Grafana data and a network go-network to connect the services. We have created a custom network go-network to connect the services. The driver: bridge field specifies the network driver to use for the network.

Running the services with Docker Compose

Now that we have created the Docker Compose file, we can run the services using Docker Compose. In the root directory of the project, run the following command:

docker compose up

We will see a similar output in the terminal:

 ✔ Network go-prometheus-monitoring_go-network  Created                                                           0.0s 
 ✔ Container grafana                            Created                                                           0.3s 
 ✔ Container go-api                             Created                                                           0.2s 
 ✔ Container prometheus                         Created                                                           0.3s 
Attaching to go-api, grafana, prometheus
go-api      | [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
go-api      | 
go-api      | [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
go-api      |  - using env:     export GIN_MODE=release
go-api      |  - using code:    gin.SetMode(gin.ReleaseMode)
go-api      | 
go-api      | [GIN-debug] GET    /metrics                  --> main.PrometheusHandler.func1 (3 handlers)
go-api      | [GIN-debug] GET    /health                   --> main.main.func1 (4 handlers)
go-api      | [GIN-debug] GET    /v1/users                 --> main.main.func2 (4 handlers)
go-api      | [GIN-debug] [WARNING] You trusted all proxies, this is NOT safe. We recommend you to set a value.
go-api      | Please check https://pkg.go.dev/github.com/gin-gonic/gin#readme-don-t-trust-all-proxies for details.
go-api      | [GIN-debug] Listening and serving HTTP on :8000
prometheus  | ts=2025-03-15T05:57:06.676Z caller=main.go:627 level=info msg="No time or size retention was set so using the default time retention" duration=15d
prometheus  | ts=2025-03-15T05:57:06.678Z caller=main.go:671 level=info msg="Starting Prometheus Server" mode=server version="(version=2.55.0, branch=HEAD, revision=91d80252c3e528728b0f88d254dd720f6be07cb8)"
grafana     | logger=settings t=2025-03-15T05:57:06.865335506Z level=info msg="Config overridden from command line" arg="default.log.mode=console"
grafana     | logger=settings t=2025-03-15T05:57:06.865337131Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_DATA=/var/lib/grafana"
grafana     | logger=ngalert.state.manager t=2025-03-15T05:57:07.088956839Z level=info msg="State
.
.
grafana     | logger=plugin.angulardetectorsprovider.dynamic t=2025-03-15T05:57:07.530317298Z level=info msg="Patterns update finished" duration=440.489125ms

The services will start running, and we can access the Golang application at http://localhost:8000, Prometheus at http://localhost:9090/health, and Grafana at http://localhost:3000. We should see the three services running: go-api, prometheus, and grafana.

We can also check the services logs using the docker compose logs command. This will show us the logs of all the services running in the Docker Compose file. We can also check the logs of a specific service by using the docker compose logs <service-name> command. For example, to check the logs of the Golang application, we can run the following command:

docker compose logs api

That was it for running the services using Docker Compose. Next, we will be looking at how we can develop the application using Docker Compose.

Developing the Application using Docker Compose

Now, if we make any changes to our Golang application locally, it needs to reflect in the container, right? To do that, one approach is to use the --build flag in Docker Compose after making changes in the code. This will rebuild all the services that have the build instruction in the compose.yml file, in our case, the api service (Golang application).

docker compose up --build

But this is not the best approach. This is not efficient. Every time we make a change in the code, we need to rebuild manually. This is not a good flow for development.

The better approach is to use Docker Compose Watch. Docker, almost a year back, added a new feature called Docker Compose Watch. This feature allows watching the changes in the application's source code and rebuilding/restarting the application using Docker Compose. More like a hot reload feature. And if you look closely, we have added a develop section in the Docker Compose file.

services:
  api:
    container_name: go-api
    build:
      context: .
      dockerfile: Dockerfile
    image: go-api:latest
    ports:
      - 8000:8000
    networks:
      - go-network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
    develop: # This is the develop section
      watch:
        - path: .
          action: rebuild

Now, if we modify our main.go or any other file in the project, the api service will be rebuilt automatically. We will see the following output in the terminal:

Rebuilding service(s) ["api"] after changes were detected...
[+] Building 8.1s (15/15) FINISHED                                                                                                        docker:desktop-linux
 => [api internal] load build definition from Dockerfile                                                                                                  0.0s
 => => transferring dockerfile: 704B                                                                                                                      0.0s
 => [api internal] load metadata for docker.io/library/alpine:3.17                                                                                        1.1s
  .                             
 => => exporting manifest list sha256:89ebc86fd51e27c1da440dc20858ff55fe42211a1930c2d51bbdce09f430c7f1                                                    0.0s
 => => naming to docker.io/library/go-api:latest                                                                                                          0.0s
 => => unpacking to docker.io/library/go-api:latest                                                                                                       0.0s
 => [api] resolving provenance for metadata file                                                                                                          0.0s
service(s) ["api"] successfully built

That's it for the development flow. Next, we will be looking at how to access the Grafana dashboard and visualise the metrics that we are registering in the Golang application.

Accessing the Grafana Dashboard

Now that we have our application running, head over to the Grafana dashboard to visualise the metrics we are registering. Open your browser and navigate to http://localhost:3000. We will be greeted with the Grafana login page. The login credentials are the ones provided in the Compose file.

Once we are logged in, we can create a new dashboard. While creating a dashboard, you will notice that the default data source is Prometheus. This is because we have already configured the data source in the grafana.yml file.

We can use different panels to visualise the metrics. This guide doesn't go into details of Grafana. We can refer to the Grafana documentation for more information. There is a Bar Gauge panel to visualise the total number of requests from different endpoints. We used the api_http_request_total and api_http_request_error_total metrics to get the data.

We created this panel to visualise the total number of requests from different endpoints to compare the successful and failed requests. For all the good requests, the bar will be green, and for all the failed requests, the bar will be red. Plus, it will also show from which endpoint the request is coming, whether it's a successful request or a failed request. If you want to get the dashboard JSON, you can visit this repo here. You will also find the complete code for the Golang application, Dockerfile and Docker Compose file we created in this blog.

That's it! You have successfully created a Golang application that is monitored using Prometheus and Grafana. You have also learned how to Dockerize the application and run it using Docker Compose. You can now use this setup to monitor your Golang applications in production.

That’s it about the Blog. As always, I'm glad you made it to the end—thank you so much for your support. I regularly share tips on Twitter. You can connect with me there.

Docker Can Run LLMs Locally. Wait, What!?

Pradumna Saraf — Mon, 07 Apr 2025 05:35:16 +0000

Using Docker to run Large Language Models (LLMs) locally? Yes, you heard that right. Docker is now much more than just running a container image. With Docker Model Runner, you can run and interact with LLMs locally.

It’s a no-brainer that we’ve seen a huge shift in development towards AI and GenAI. And it’s not easy to develop a GenAI-powered application, considering all the hassle—from cost to setup. As always, Docker steps in and does what it’s known for: making GenAI development easier so developers can build and ship products and projects faster. We can run AI models on our machines natively! Yes, it runs models outside of containers. Right now, Docker Model Runner is in Beta and available for Docker Desktop for Mac with Apple Silicon, requiring Docker Desktop version 4.40 or later.

In this blog, we will explore the benefits of the Docker Model Runner and how to use it in various forms. Let’s get straight in!

Benefits of Docker Model Runner

Developer Flow: One of the most important aspects as a developer that we don’t like is the context switching and using 100 different tools, and Docker, used by almost every other developer, make things easy and reduces the learning curve.
GPU Acceleration: Docker Desktop runs llama.cpp directly on your host machine. The inference server can access Apple's Metal AP, which allows direct access to the hardware GPU acceleration on Apple Silicon.
OCI Artifcats: Store AI models as OCI artifcats instead of storing them as Docker Images. This saves disk space and reduces the extraction of everything. Also, this will improve compatibility and adaptability as it’s an industry-standard format.
Everything Local: You don’t need to face the hassle of Cloud LLMs API Key, rate limiting, latency, etc, while binding products locally and paying those expensive bills. Another big aspect is data privacy and security comes on top of it. Models are dynamically loaded into memory by llama.cpp when needed.

In Action

Make sure you have Docker Desktop v4.40 or above installed in your system. Once you have that, make sure you have enabled the Enable Docker Model Runner by going to settings > Features in development. You can also check Enable host-side TCP support to communicate form your localhost (we will see a demo below for that).

Once you are done. Click on Apply & restart, and we are all set. To test it’s working, open any terminal any type docker model, you will see the output of all the available commands and this verifies everything is working as expected.

So, to intrext with the LLMs, we have two methods (as of now, stay tuned) from the CLI or the API (OpenAI-compatible). The CLI is pretty straightforward on the API front. We can interact with API either from inside a running container or from the localhost. Let’s look at these in much more detail.

From the CLI

If you have used docker cli (which almost every developer has who ever worked with the container) and used the commands, like docker pull, docker run, etc, the docker model uses the same pattern, only there is sub command addition which is the model keyword, so to pull a model we will do docker model pull <model name> or to run a pulled model docker model run <model name>. It makes things so much easier because we don’t need to learn whole new wording for a new tool.

Here are all the commands that are currently supported. Some more are coming soon (some are my favourites, too). Stay tuned!

Now, to run a mode, we first need to pull it. So, for example, we will run llama3.2. You will find all available models on the Docker Hub’s GenAI Catalog. So, open the terminal and run docker model pull ai/llama3.2. It will take some time to pull it depending on the Model size and your internet bandwidth. Once you pull it, run the docker model run ai/llama3.2, and it will start an inactive chat like you have a normal chatbot or ChatGPT, and once you are done, you can use /bye it to exit the interactive chat mode. Here is a screenshot:

From the API (OpenAI)

One of the fantastic things about Model Runner is that it implements OpenAI-compatible endpoints. We can interact with the API in many ways, like inside a running container or from the host machine using TCP or Unix Sockets.

We will see examples of different ways, but before that, here are the available endpoints. The endpoints will remain the same whether we interact with the API from inside a container or from the host. Only the host will change.

# OpenAI endpoints
    GET /engines/llama.cpp/v1/models
    GET /engines/llama.cpp/v1/models/{namespace}/{name}
    POST /engines/llama.cpp/v1/chat/completions
    POST /engines/llama.cpp/v1/completions
    POST /engines/llama.cpp/v1/embeddings
    Note: You can also omit llama.cpp.

From Inside the Container

From inside the container, we will use http://model-runner.docker.internal it as the base URL, and we can hit any endpoint mentioned above. For example, we will hit /engines/llama.cpp/v1/chat/completions the endpoint to do a chat.

We will be using the curl. You can see it uses the same schema structure as OpenAI API. Make sure you have already pulled the model that you are trying to use.

    curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/llama3.2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 100 words about the docker compose."
            }
        ]
    }'

So, to test it out, that it works from inside the running container, I am running the jonlabelle/network-tools image in an interactive mode and then using the above curl command to talk to the API. And it worked.

As you can see, below is the response I got. The response is in JSON format, including the generated message, token usage, model details, and response timing. Just like the standard.

From the Host

As I mentioned previously, to interact with the A, you must be sure you have enabled the TCP. You can verify it’s working by visiting the localhost:12434. You will see a message saying Docker Model Runner. The service is running.

In this, we will have http://localhost:12434 as the base URL and the same endpoints will be followed. The same goes for the curl command; we will just replace the base URL, and everything will remain the same.

    curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/llama3.2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 100 words about the docker compose."
            }
        ]
    }'

Let’s try it out by running it in our terminal:

It will return the same JSON format response as the other one, including the generated message, token usage, model details, and response timing.

With this TCP support, we are not just limited to interacting with the applications that are running inside our container but anywhere.

That’s it about the Blog. You can learn more about the Docker Model Runner from the official docs here. And keep an eye out for the Docker announcements; there will be a lot more coming. As always, I'm glad you made it to the end—thank you so much for your support. I regularly share tips on Twitter (It will always be Twitter ;)). You can connect with me there.

Using ARM-based GitHub Actions Runners for Workflows

Pradumna Saraf — Wed, 29 Jan 2025 04:50:32 +0000

As we have observed a significant transition toward ARM-based CPUs, such as Apple’s M series and Snapdragon's X, it's essential to build, test, and deploy the product and software in a multi-architecture environment to replicate the exact behaviour experienced by an end user.

GitHub recently announced that Linux ARM-based (arm64) GitHub Actions are now available as hosted runners for free in public repositories. You can read the official announcement here. Previously, developers had to rely on virtualization for Actions runs, which was cumbersome. To use it, we have to set the value for runs-on: as ubuntu-24.04-arm or ubuntu-22.04-arm based on which version of Ubuntu we are going to use based on our needs.

Let's see in Action

We will create a basic workflow to print "Hello World". First, create a GitHub repo and make sure you are at the root. Then create a dir name .github inside that, create a dir called workflows, and inside that create a YAML file with any name, we will name it hello.yaml. The complete file path will look like this .github/workflows/hello.yaml. Now paste the below configuration.

name: Hello World
on:
  push:
    branches:
      - main

jobs:
  hello:
    runs-on: ubuntu-24.04-arm

    steps:
      - name: Print Hello World
        run: 'echo "Hello World"'

Now, let's commit the changes and head to the Actions tab to check the progress.

As we can see, our action ran successfully without any issues. For more real-world workflows we can switch the runner label and it will use arm base runners.

On a personal note, I find ARM-based runners much faster, complementing their nature. It may vary depending on the task and the computation power it needs. That's come to the end of this blog. As usual, glad you made it to the end—thank you so much for your support. I regularly share tips on Twitter. You can connect with me there.

Publishing Multi-Arch Docker images to GHCR using Buildx and GitHub Actions

Pradumna Saraf — Fri, 20 Dec 2024 07:00:51 +0000

The industry has seen a huge shift in machines towards using ARM base CPUs like Apple Silicon to Snapdragon X from X86, and it's become essential to build images that support multiple architectures and run containers that are compatible and aligned with that architecture without facing any bottlenecks.

Using Docker Buildx, we can very easily build multi-platform container images. All builds executed via buildx run with the Moby Buildkit builder engine. You can read more in detail here.

In the blog, we will learn how to automate the process of building a Multi-Arch image and pushing it to GitHub Container Registry (GHCR) using a GitHub workflow/Actions when there is a change in the repo. Also, I recently published a similar blog for publishing the image to DockerHub. You can read it here:

Publishing Multi-Arch Docker image to DockerHub using Buildx and GitHub Actions

Pradumna Saraf ・ Oct 23 '24

#docker #devops #development #github

Prerequisite

A good understanding of Docker
A decent understanding of GitHub Actions

Getting started

Before starting, I assume you have already Dockerized your project, created a Dockerfile, and pushed that to GitHub. In case, you haven't done one yet and still want to try the process out, you can create a GitHub repo with a minimal Dockerfile in the root that prints "Hello World" by running an echo command using Alpine as the base image. Dockerfile syntax for it:

FROM alpine:3.20
CMD ["echo", "Hello World!"]

Once done we are all set to write a workflow. Make sure you are on the root of the project, create a dir name .github inside that create a dir called workflows and inside that create a YAML file with any name, we will name it ghcr.yaml. The complete file path will look like this .github/workflows/ghcr.yaml. Now paste the below configuration. Don't commit it yet, first, we break down and understand the below configuration.

name: Build and Push Image to GHCR

on:
  release:
    types: [published]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Log in to the Container registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build and push Docker image
        uses: docker/build-push-action@v6
        with:
          context: .
          file: ./Dockerfile
          push: true
          tags: | 
            ghcr.io/pradumnasaraf/devops:latest
          platforms: linux/amd64,linux/arm64,linux/arm/v7

In this section, we are triggering the workflow when a release is created. We can modify the on: trigger according to our release flow, like triggering the workflow when a tag is pushed, etc. Then we created some environment variables REGISTRY and IMAGE_NAME for reusability in the workflow.

Then we are using Ubuntu as a runner and checking out the repo code. And giving it content read to read the content from the repo and give the workflow the write permission to publish a package (In GitHub we called it packages). It's a registry for hosting and managing packages, including containers and other dependencies)

on:
  release:
    types: [published]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

In this part, we are checking out the repo code and then log into GHCR so that the workflow has the necessary rights and permission to push the image to the Registry. Then we set up the Docker Buildx. Buildx is the real deal that will help us build the Multi-Arch images from the same Dockerfile.

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Log in to the Container registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

This is the final step of the workflow here we are building the image and pushing it to GHCR. We can set a context and file, here the Dockerfile is in the root with the name Dockerfile.

We can set multiple tags apart from the latest one, For example, we can automate unique image versioning by pulling the git tag pushed to trigger this workflow. So, if we push a Git tag with 1.2.3, the image would be something like pradumnasaraf/devops:1.2.3.

Lastly, we are we are providing for which platforms we need to build it for. We can give the values for platforms by comma separation. Here we are building for linux/amd64,linux/arm64 and linux/arm/v7.

      - name: Build and push Docker image
        uses: docker/build-push-action@v6
        with:
          context: .
          file: ./Dockerfile
          push: true
          tags: | 
            ghcr.io/pradumnasaraf/devops:latest
          platforms: linux/amd64,linux/arm64,linux/arm/v7

That's it, that was all about the explanation workflow. Now commit the changes. Based on the type of trigger you set this workflow will run and push the images to DockerHub.

I have created a release on my DevOps repo with v2.3.3. Now, It will push an image with the latest as well as 2.3.3. It is getting the version number from the package.json using an action to extract it. You can do this kind of workaround to make it more seamless and powerful.

Now, go back to your repo and under the Packages section you will see your package (image) got published with the name you provided, with a little container icon.

If you not now seeing the Packages section, turn it on from the About setting of the repo. And if it's turned on, the workflow runs successfully and you get a message No packages published, head over to your GitHub profile page, click on the Packages tab then click on the package name. It will ask you to link with a repo and link it with the repo you use to create the workflow. Sometimes due to mismatching of the repo and image name the package doesn't show automatically on the repo.

Once you get the package linked to your repo, click on the package name, now you see will the image with all the architecture we provided.

Now the great part is that if someone pulls an image, for eg docker pull ghcr.io/pradumnasaraf/devops:2.3.3 docker will pull the image for that architecture only we don't need to explicitly mention.

That's come to the end of this blog. As usual, glad you made it to the end—thank you so much for your support. I regularly share Docker tips on Twitter. You can connect with me there.

Rate Limiting a Golang API using Redis

Pradumna Saraf — Tue, 12 Nov 2024 07:31:02 +0000

To put Rate Limiting in simpler words, it is a technique in which we limit the number of requests a user or client can make to an API within a given time frame. You might have encountered in the past getting a "rate limit exceeded" message when you tried to access a weather or a joke API. The are a lot of arguments around why to rate limit an API, but some important ones are to make fair use of it, make it secure, safeguard resources from overload, etc.

In this blog, we will create an HTTP server with Golang using the Gin framework apply a rate limit functionality to an endpoint using Redis and store the total count of the requests made by an IP made to the server in a timeframe. And if it exceeds the limit, we set, we will give an error message.

In case you have no idea what Gin and Redis are. Gin is a web framework written in Golang. It helps to create a simple and fast server without writing a lot of code. Redis it's an in-memory and key-value data store that can be used as a database or for caching capabilities.

Prerequisite

Familiarity with Golang, Gin and Redis
A Redis instance (We can use Docker or a remote machine)

Getting Started

To Initialize the project run go mod init <github path> for eg, go mod init github.com/Pradumnasaraf/go-redis.

Then let's create a simple HTTP server with Gin Framework then we apply the logic for rate limiting it. You can copy the code below. It's very basic. The server will reply with a message when we hit the /message endpoint.

After you copy the below code, run go mod tidy to automatically install the packages we have imported.

package main

import (
    "github.com/gin-gonic/gin"
)

func main() {
    r := gin.Default()
    r.GET("/message", func(c *gin.Context) {
        c.JSON(200, gin.H{
            "message": "You can make more requests",
        })
    })
    r.Run(":8081") //listen and serve on localhost:8081
}

We can run the server by executing go run main.go in the terminal and see this message in the terminal.

To test it, we can go to localhost:8081/message we will see this message in the browser.

Now our server is running, let's set up a rate limit functionality for the /message route. We will use the go-redis/redis_rate package. Thanks to the creator of this package, we don't need to write the logic for handling and checking the limit from scratch. It will do all the heavy lifting for us.

Below is the complete code after implementing the rate-limiting functionality. We will understand each bit of it. Just gave the complete code early to avoid any confusion and to understand how different pieces work together.

Once you copy the code run go mod tidy to install all the imported packages. Let's now jump and understand the code (Below the code snippet).

package main

import (
    "context"
    "errors"
    "net/http"
    "github.com/gin-gonic/gin"
    "github.com/go-redis/redis_rate/v10"
    "github.com/redis/go-redis/v9"
)

var (
    rdb     *redis.Client
    limiter *redis_rate.Limiter
)

func initRedis() {
    rdb = redis.NewClient(&redis.Options{
        Addr: "localhost:6379",
    })
    limiter = redis_rate.NewLimiter(rdb)
}

func main() {
    // Initialize Redis client and rate limiter once
    initRedis()
    defer rdb.Close()

    r := gin.Default()
    r.GET("/message", func(c *gin.Context) {
        err := rateLimiter(c.ClientIP())
        if err != nil {
            c.JSON(http.StatusTooManyRequests, gin.H{
                "message": "you have hit the limit",
            })
            return
        }
        c.JSON(http.StatusOK, gin.H{
            "message": "You can make more requests",
        })
    })
    r.Run(":8081")
}

func rateLimiter(clientIP string) error {
    ctx := context.Background()

    res, err := limiter.Allow(ctx, clientIP, redis_rate.PerMinute(10))
    if err != nil {
        return err
    }
    if res.Remaining == 0 {
        return errors.New("Rate limit exceeded")
    }

    return nil
}

Let's first directly jump to the initRedis() function. This will create an instance of a Redis client and rate limiter once when the application starts. This way, we don't need to create a new instance every time. We created global variables, rdb to store the redis instance and limiter to store the limter instance.

Now let's understand the rateLimiter() function. This function asks for an argument that is the request's IP address, which we can obtain via c.ClientIP() in the main function. And we return an error if the limit is hit otherwise keep it nil. Most of the code is boilerplate we took from the official GitHub repo. The key functionality to look closer into here is the limiter. Allow() function. Addr: takes the URL path value for the Redis instance. I am using Docker to run it locally. You can use anything, make sure you replace the URL accordingly.

res, err := limiter.Allow(ctx, clientIP, redis_rate.PerMinute(10))

It takes three arguments, the first is ctx, the second one is Key, Key (key for a value) for the Redis Database, and the third one is the the limit. So, the function stores the clientIP address as a key and the default limit as the value and reduces it when a request is made. The reason for this structure is that the Redis database needs unique identification and a unique key for storing key-value pairs kind of data, and every IP address is unique in its way, this is why we are using IP addresses instead of usernames, etc. The 3rd argument redis_rate.PerMinute(10) can be modified as per our need, we can set limit PerSecond, PerHour, etc, and set the value inside parentheses for how many requests can be made per minute/second/hour. In our case, it's 10 per minute. Yes, it's that simple to set.

At last, we are checking if there is a remaining quota of not by res.Remaining. If it's zero we will return an error with the message otherwise we'll return nil. For eg, you can also do res.Limit.Rate to check the limit rate, etc. You can play around and dig deeper into that. One thing to note here is, that this is just an example of how to bring these two pieces together, as we have a single route we are not using any middleware, what if when we have 10s or 100s of routes?

Now coming the main() function:

func main() {
    // Initialize Redis client and rate limiter once
    initRedis()
    defer rdb.Close()

    r := gin.Default()
    r.GET("/message", func(c *gin.Context) {
        err := rateLimiter(c.ClientIP())
        if err != nil {
            c.JSON(http.StatusTooManyRequests, gin.H{
                "message": "you have hit the limit",
            })
            return
        }
        c.JSON(http.StatusOK, gin.H{
            "message": "You can make more requests",
        })
    })
    r.Run(":8081")
}

Everything is almost the same in the main() function. We called the initRedis() function to initialize the Redis client and rate limiter and then close the redis client using defer once the application exits. In the /message route, every time the route gets hit, we call the rateLimit() function and pass it a ClientIP address and store the return value (error) value in the err variable. If there is an error we will return a 429, that is, http.StatusTooManyRequests, and a message "message": "You have hit the limit". If the person has a remaining limit and the rateLimit() returns no error it will work normally, as it did earlier and serve the request.

That was all the explanation. Let's now test the working. Re-run the server by executing the same command. For the 1st time, we will see the same message we got earlier. Now refresh your browser 10 times (As we set a limit of 10 per minute), and you will see the error message in the browser.

We can also verify this by seeing the logs in the terminal. Gin offers great logging out of the box. After a minute it will restore our limit quota.

That's come to the end of this blog, I hope you enjoy reading as much as I enjoy writing. I am glad you made it to the end—thank you so much for your support. I also talk regularly about Golang and other stuff like Open Source and Docker on X (Twitter). You can connect me over there.

My Hacktobefest 2024 Experience

Pradumna Saraf — Fri, 01 Nov 2024 07:53:27 +0000

Hacktoberfest has a special place in my tech journey because my tech and open source journey started during Hacktoberfest 2021 (it's been three years in tech and 4th year participating in Hacktoberfest, WOW!). This year, I was late to the party and registered for Hacktoberfest on 18 October, as I was stuck with some life problems.

So, like every other year, I participated as a Maintainer and a Contributor. To make this article more fun, I will share my combined experience for both and break down this article into two sections As a Contributor and As a Maintainer (Sorry, Dev team, I jammed in both things in one).

As a Contributor

Like last year I was looking for less popular projects that aren't popular, offering no swag or anything in return but need help, real help. I was so specific about the area of contribution because I wanted to upskill and test my knowledge in those things. I think this is what I like about open source most, there are so many pieces tied together to form a project, and we can choose which one we want to pick based on our interests. I tweeted the same as well.

// Detect dark theme var iframe = document.getElementById('tweet-1847299159914942946-428'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1847299159914942946&theme=dark" }

So, I found a couple of repos by looking on Twitter and Hacktoberfest Discord that fit what I was looking for and helped them with the GitHub Actions and CI stuff, which improved their workflow for better maintaining the quality of code. Below is a snapshot of Pull Resuests I made last week.

The two best takeaways as a contributor I got from this Hacktoberfest are:

There are a lot of projects that need help but don't have that much money, popularity and stars that can help them come under the limelight of contributors and people. We as contributors should try to find them and help them with the knowledge we have. There are many solopreneurs out there in the open source community, working hard to make an impact, and needed support.
You can have a monumental impact on small and less known projects compared to big and utmost starts. There is nothing wrong with contributing to big projects, but with time, they become mature and the sheer volume of the contributors makes it difficult to cope. And, if you are new, you can have a tough time making your Pull Request work. A balance of both I think is great. I personally feel you learn a lot more in new and evolving projects as they tend to iterate and open to new ideas. So, they welcome contributions.

As a Maintainer

This year, I was unable to make my repo "ready" for Hacktoberfest, but I got some PRs on the non-technical front on the DevOps repo. In case you didn't know about this DevOps repo, it is one of the most popular open source repos in the world to learn DevOps with a whopping 2.8k+ GitHub stars.

As a maintainer this year, my takeaway was that even though we think the project is "complete" and "picture perfect" there is so much room to improve, and make it better when you learn from contributors' knowledge, value and the perspective they bring to the table.

2024 was fun. Looking forward to contributing even more next year. Happy Open Source!