DEV Community: Amol Kavitkar

Building an Advanced AI Agent: A Step-by-Step Guide to Integrating MCP Servers with LangGraph

Amol Kavitkar — Tue, 19 Aug 2025 12:57:03 +0000

In this tutorial, we’ll walk through the process of building a sophisticated, tool-using AI agent. We’ll leverage the power of LangGraph to orchestrate the agent’s logic and integrate external services via the MCP. This approach allows you to create agents that can interact with a wide range of external systems, from simple APIs to complex, stateful services.

Note: This is an project core backbone, code snippets given are ref. working models which needs to be little polished/enhanced to build as per requirement but using this as core structure and putting it in copilot/windsurf/cursor can quickly build your project
Also in last section I have provided considerations for taking this to production scale

Core Concepts

Before we dive in, let’s clarify the key technologies we’ll be using:

LangGraph: A library for building stateful, multi-actor applications with LLMs. It allows us to define our agent’s behavior as a graph, making it easy to manage complex control flows.
MCP : A standardized protocol for communication between an AI agent and external tools or services. MCP servers expose their functionality as a set of tools that the agent can discover and use.

Step 1: Project Structure

A well-organized project structure is key to a maintainable application. Here’s a recommended layout:

/your-project
| — streamlit_app.py # Main application entry point (UI)
| — chatbot.py # Core LangGraph agent logic
| — mcp_tools.py # Logic for loading tools from MCP servers
| — custom_tools.py # Definitions for simple, local tools
| — tools_config.json # Configuration for all tools and MCP servers
| — .env # Environment variables (API keys, etc.)
| — … (other files)

Step 2: Configuring Tools with `tools_config.json`

This file is the central hub for defining your agent’s capabilities. It specifies which tools are available, how to connect to MCP servers, and any necessary configurations.

Here’s an example tools_config.json:

{
 "tools": [
   {
     "name": "tavily_search",
     "description": "Search the web using Tavily API",
     "type": "tavily",
     "enabled": true
   },
   {
     "name": "calculator",
     "description": "Performs basic mathematical operations",
     "type": "custom",
     "function": "calculator",
     "enabled": true
   }
 ],
 "mcp_servers": {
     "grafana-mcp": {
        "command": "docker",
        "args": [
           "run",
           "-i",
           " - rm",
           "-e", "GRAFANA_URL=https://grafana.xxxxxx.com/",
           "-e", "GRAFANA_API_KEY=<auth_key>",
           "mcp/grafana",
           " - transport=stdio"
         ],
         "type": "stdio"
      },
      "some-remote-service": {
         "url": "https://example.com/mcp-sse",
         "type": "sse"
      }
   }
}

In this configuration:

We define a tavily_search tool for web searches and a calculator tool, which will be a custom Python function.
We specify two MCP servers:
- grafana is run to get metrics information via stdio.
- some-remote-service is a remote service accessed via Server-Sent Events (SSE).

In real we have two more MCP servers developed internally for product usecase, we can add as many MCP as needed but keep it few so that model doesn’t load up with too many tools and context can be limited.
As its all about giving right context to the model in prompts as well as in MCP tools definitions

Step 3: Loading MCP Tools

Now, let’s write the Python code to load the tools from our MCP servers. We’ll use the langchain_mcp_adapters library for this. In mcp_tools.py

# mcp_tools.py
import json
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_mcp_adapters.sessions import StdioConnection, SSEConnection

async def load_mcp_servers_from_config(config_path: str):
    with open(config_path, 'r') as f:
        config = json.load(f)
    all_tools = []
    mcp_servers = config.get("mcp_servers", {})
    for name, server_config in mcp_servers.items():
        server_type = server_config.get("type", "stdio")
        connection = None
        if server_type == "stdio":
            connection = StdioConnection(
            command=server_config.get("command"),
            args=server_config.get("args", []),
            )
        elif server_type == "sse":
            connection = SSEConnection(url=server_config.get("url"))
        if connection:
             tools = await load_mcp_tools(session=None, connection=connection)
             all_tools.extend(tools)
    return all_tools

This code reads the mcp_servers section of our config, creates the appropriate connection (StdioConnection or SSEConnection), and then uses load_mcp_tools to fetch the tools from each server.

Handling Sync and Async MCP Tools

Tools loaded from MCP servers might be asynchronous, but LangGraph may need to invoke them in a synchronous context. To handle this, we wrap each MCP tool in a StructuredTool that provides both synchronous (func) and asynchronous (coroutine) entry points.

First, we need a synchronous function to load the tools, as the chatbot’s initialization is synchronous. We create a simple wrapper using asyncio.run():

# mcp_tools.py
import asyncio
# … (async load_mcp_servers_from_config function)
def load_mcp_servers_sync(config_path: str):
   """Synchronous wrapper for loading MCP servers."""
   return asyncio.run(load_mcp_servers_from_config(config_path))

Step 4: Building the LangGraph Agent

This is where we define the core logic of our agent. In chatbot.py, we’ll create a LangGraphChatbot class that builds and runs the graph.

First, define the state of our graph:

# chatbot.py
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

Next, create the LangGraphChatbot class. The __init__ method will set up the LLM and load all the tools (both custom and MCP).

we are showcasing with ollama as local llm but this can be extended with any chat-model, we have multiplexed with bedrock further down, which can be converted to OpenAI as well easily

# chatbot.py
from langchain_ollama import ChatOllama
from langgraph.prebuilt import ToolNode
from custom_tools import CUSTOM_FUNCTIONS # Assuming you have this map
from mcp_tools import load_mcp_servers_from_config

class LangGraphChatbot:
    def __init__(self, tools_config_path: str):
        self.llm = ChatOllama(model="llama3") # Or any other LLM

        # load tools will load tools from standard tools defined such as tavily search 
        # also it will load all mcp tools from mcp servers
        self.tools = self._load_all_tools(tools_config_path)

        self.llm_with_tools = self.llm.bind_tools(self.tools)
        self.graph = self._build_graph()

    def _load_all_tools(self, config_path: str):
        # … (logic to load custom tools and call load_mcp_servers_from_config)
        pass

Next, in chatbot.py, we load the tools and wrap them:

# chatbot.py
from langchain_core.tools import StructuredTool
class LangGraphChatbot:
 # …
    def _load_mcp_tools(self, config_path: str):
       mcp_tools = load_mcp_servers_sync(config_path)
       return [self._create_wrapped_tool(tool) for tool in mcp_tools]

    def _create_wrapped_tool(self, tool):
       """Create a tool that works in both sync and async contexts."""
       return StructuredTool(
           name=tool.name,
           description=tool.description,
           args_schema=tool.args_schema,
           func=self._make_sync_wrapper(tool), # For sync calls
           coroutine=self._make_async_wrapper(tool) # For async calls
       )

    def _make_sync_wrapper(self, tool):
       def sync_wrapper(**kwargs):
           # If the tool is async-only, run its async method in a new event loop
           if not hasattr(tool, "invoke"):
               return asyncio.run(tool.ainvoke(kwargs))
           return tool.invoke(kwargs)
       return sync_wrapper

    def _make_async_wrapper(self, tool):
       async def async_wrapper(**kwargs):
           # Directly call the async method
           return await tool.ainvoke(kwargs)
       return async_wrapper

This setup ensures that no matter how LangGraph decides to call a tool, our wrapper will execute it correctly, either by calling its native method or by bridging the sync/async gap.

Now, let’s build the graph itself. We’ll define a more advanced node for calling the LLM, which injects a system prompt to guide the model’s behavior. We will also integrate conversation memory.

The LLM-Calling Node with a System Prompt

Instead of a simple LLM call, we’ll create a dedicated function that dynamically adds a system message to the conversation. This message instructs the LLM on how to behave, especially regarding tool usage, ensuring it uses your specialized MCP tools when needed.

# chatbot.py

class LangGraphChatbot:
    # ... (previous code)

    def _tool_calling_llm(self, state: State):
        """Node function for calling LLM with a system prompt for tools."""
        messages = state["messages"]

        # Create a list of available tool names for the system message
        tool_names = [tool.name for tool in self.tools]

        # Add system message to explicitly instruct tool usage
        system_msg = (
            f"""You are an AI assistant with access to specialized tools.\n"
            f"When asked about specific resources, you MUST use the appropriate tool.\n"
            f"Available tools: {', '.join(tool_names)}"""
        )

        # Combine system message with conversation history
        messages_with_prompt = [("system", system_msg)] + messages

        # Invoke the LLM with the modified messages
        response = self.llm_with_tools.invoke(messages_with_prompt)
        return {"messages": [response]}

Building the Graph with Memory

With our new LLM-calling node, we can build the graph. The structure remains similar, but we’ll use our new _tool_calling_llm function for the llm node. Crucially, we’ll also add a checkpointer during compilation to enable conversation memory.

Now, let’s build the graph itself.

# chatbot.py

from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langgraph.checkpoint.memory import MemorySaver

class LangGraphChatbot:
    # ... (previous code)

    def _build_graph(self):
        graph_builder = StateGraph(State)

        # Add our custom LLM node and the standard tool node
        graph_builder.add_node("llm", self._tool_calling_llm)
        graph_builder.add_node("tools", ToolNode(self.tools))

        graph_builder.add_edge(START, "llm")
        graph_builder.add_conditional_edges(
            "llm",
            tools_condition,
            {
                "tools": "tools",
                "__end__": END
            }
        )
        graph_builder.add_edge("tools", "llm")

        # Compile the graph with a memory checkpointer
        self.memory = MemorySaver()
        return graph_builder.compile(checkpointer=self.memory)

This updated graph now intelligently guides the LLM with a system prompt and preserves conversation history across turns, making the agent much more robust and stateful.

Step 5: Integrating Multiple LLM Providers (AWS Bedrock and Ollama)

A powerful feature of this architecture is the ability to easily switch between different LLM providers. This allows you to use powerful cloud-based models from services like AWS Bedrock for production scenarios, while leveraging local models via Ollama for development, testing, and offline use. This flexibility is achieved by reading an environment variable to decide which LLM to initialize.

Let’s add this logic to our LangGraphChatbot’s __init__ method:

# chatbot.py
import os
from dotenv import load_dotenv
from langchain_aws import ChatBedrockConverse
from langchain_ollama import ChatOllama

class LangGraphChatbot:
    def __init__(self, tools_config_path: str):
        load_dotenv()
        # Determine which LLM to use from environment variables
        llm_provider = os.getenv("LLM_PROVIDER", "ollama").lower()
        if llm_provider == "bedrock":
             # Initialize AWS Bedrock LLM
             model_id = os.getenv("BEDROCK_MODEL_ID", "anthropic.claude-3-sonnet-v1:0")
             self.llm = ChatBedrockConverse(
                 model_id=model_id,
                 region_name=os.getenv("AWS_REGION"),
                 credentials_profile_name=os.getenv("AWS_CREDENTIALS_PROFILE")
             )
             print(f"🤖 Initialized AWS Bedrock with model: {model_id}")
        elif llm_provider == "ollama":
             # Initialize Ollama LLM for local models
             ollama_model = os.getenv("OLLAMA_MODEL", "llama3")
             self.llm = ChatOllama(
                 model=ollama_model,
                 base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
             )
             print(f"🤖 Initialized Ollama with model: {ollama_model}")
        else:
             raise ValueError(f"Unsupported LLM provider: {llm_provider}")
        self.tools = self._load_all_tools(tools_config_path)
        self.llm_with_tools = self.llm.bind_tools(self.tools)
        self.graph = self._build_graph()

     # … rest of the class

With this setup, you can switch from a local llama3 model to a powerful Claude 3 Sonnet model on Bedrock by simply changing the LLM_PROVIDER environment variable from ”ollama” to ”bedrock”.

Step 6: Creating the UI with Streamlit

Finally, let’s create a simple UI in streamlit_app.py to interact with our agent.

# streamlit_app.py

import streamlit as st
from chatbot import LangGraphChatbot

st.title("My AI Agent")

# Initialize the chatbot
chatbot = LangGraphChatbot(tools_config_path="tools_config.json")

if "messages" not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

if prompt := st.chat_input("What would you like to do?"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        response = chatbot.graph.invoke({"messages": [("user", prompt)]})
        assistant_response = response["messages"][-1].content
        st.markdown(assistant_response)

    st.session_state.messages.append({"role": "assistant", "content": assistant_response})

This code creates a basic chat interface. When the user enters a prompt, it’s sent to our LangGraph agent, and the response is displayed on the screen.

Step 7: Deploying with Docker and a Start Script

To streamline deployment, we’ll use Docker Compose to manage our application’s services, including the Streamlit UI, the LLM server, and multiple MCP servers. A powerful start.sh script will serve as the main entry point for managing the entire stack.

A Comprehensive `docker-compose.yml`

This docker-compose.yml defines the Streamlit app and multiple MCP servers, each configured to communicate via stdio. We use Docker Compose profiles to selectively run only the services we need.

version: '3.8'

services:
  # --- Application UI and Local LLM --- 
  streamlit_app:
    build: .
    container_name: ai_agent_ui
    ports: ["8501:8501"]
    environment:
      - LLM_PROVIDER=ollama
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on: [ollama]
    volumes: [.:/app]
    networks: [mcp-network]
    profiles: ["app"]

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports: ["11434:11434"]
    volumes: [./ollama_data:/root/.ollama]
    networks: [mcp-network]
    profiles: ["app"]

  # --- Example MCP Tool Server (stdio communication) ---
  grafana-mcp:
    image: mcp/grafana:latest
    container_name: grafana-mcp
    environment:
      - GRAFANA_URL=https://grafana.example.com/
      - GRAFANA_API_KEY=${GRAFANA_API_KEY} # Loaded from .env file
    stdin_open: true  # Required for stdio MCP
    tty: true         # Required for stdio MCP
    profiles: ["mcp"]
    networks: [mcp-network]

networks:
  mcp-network:
    driver: bridge

The `start.sh` Management Script

Instead of running docker-compose commands manually, we use a start.sh script that provides a simple, high-level interface for managing the application. It handles environment checks, configuration validation, and service management.

Make the script executable with chmod +x start.sh.

Key Commands:

./start.sh web: The primary command. It checks your environment, validates configs, and starts the Streamlit web UI.



./start.sh chat: Starts the interactive terminal-based chat client.

./start.sh mcp-start: Starts all MCP server containers defined in the docker-compose.yml under the mcp profile.

./start.sh mcp-stop: Stops all running MCP server containers.

./start.sh mcp-status: Checks the status and health of the MCP servers.

./start.sh mcp-test: Runs communication tests against the MCP servers to ensure they are responsive.

./start.sh check: Checks system requirements, such as the availability of your configured LLM provider (Ollama or AWS Bedrock).

./start.sh validate: Runs a script to validate your tools_config.json and other configuration files.

Scale Considerations

Taking a LangGraph agent from a prototype to a production system requires careful consideration of scalability, reliability, and performance. Here are key areas to focus on:

Memory and State Management: For production, the default in-memory MemorySaver is insufficient. You should switch to a persistent, scalable check-pointer backend like langgraph.checkpoint.sqlite.SqliteSaver or a custom implementation using Redis or a PostgreSQL database. This ensures that conversation state is not lost between application restarts and can be shared across multiple instances.
Context Window Optimization: LLMs have finite context windows. As conversations grow, you must implement strategies to prune the context sent to the model. This can be done by summarizing older messages, using a sliding window approach, or implementing more advanced techniques like vector-based retrieval of relevant past messages to keep the context concise and effective.
Concurrency and Asynchronous Execution: To handle multiple users simultaneously, your application must be asynchronous. LangGraph is designed for this, but you need to deploy it on an ASGI server like Uvicorn, potentially managed by Gunicorn with multiple worker processes. Ensure all your custom tools and I/O operations are non-blocking to maximize throughput.
Deployment Model: While Streamlit is excellent for demos, production applications should typically run the LangGraph agent as a standalone web service (e.g., using FastAPI). This service exposes an API that any frontend client can interact with, allowing you to scale the agent logic independently of the user interface and place it behind a load balancer.
Observability and Tracing: Production systems require robust monitoring. Integrating a tool like LangSmith is highly recommended for tracing the execution of your graphs, debugging issues, and monitoring performance metrics like latency, cost, and token usage. Structured logging is also essential for tracking the agent’s behavior and diagnosing errors.
Robust Tool Handling: Production-grade tools need to be resilient. Implement proper error handling, timeouts, and retry mechanisms (e.g., with exponential backoff) for any external API calls. This prevents a single failing tool from causing the entire agent to fail.

Conclusion

By following the core structure, you can build a powerful and extensible AI agent. The combination of LangGraph for orchestration and MCP for tool integration provides a robust framework for creating agents that can interact with a wide variety of external systems. From here, you can add more tools, integrate different LLMs, and build more complex agent behaviors.

Life of a Developer in the World of AI

Amol Kavitkar — Sat, 14 Jun 2025 20:20:40 +0000

The world of technology is evolving at an unprecedented pace, driven largely by advancements in Artificial Intelligence (AI). As this transformation continues, the role of developers is undergoing a fundamental shift. It's no longer just about writing code—it's about designing solutions, thinking holistically, and piecing together complex systems in a world where execution has become faster than ever. Let's explore how developers navigate this new reality.

Design First, Execute Faster

In the AI-driven world, execution has become exponentially faster due to automation, prebuilt libraries, and powerful tools. However, the real challenge lies in having a larger vision—one that considers scalability, long-term impact, and the big picture. The ability to design first and execute faster is the key to success, but it demands a nuanced approach.
While smaller components of a product can be designed and executed quickly using a fail-fast approach, this method works effectively only when it is part of a broader, well-defined strategy. Developers must focus on designing products with a broader vision and scale, breaking them down into smaller, actionable steps. Each step can then be executed rapidly and iteratively, using fail-fast principles to refine and improve.
This mindset ensures that while developers move quickly on smaller pieces, they don't lose sight of the larger goal. A well-designed roadmap allows for agility in execution without compromising the overall vision. It's not just about moving fast—it's about moving smart and ensuring that every small win contributes to the bigger picture.

Designing Solutions: The Developer's Primary Skill

In the AI era, solution design has emerged as the most critical skill for developers. It's not just about writing code anymore; it's about understanding the bigger picture. Developers must focus on:

What to Design: Identifying the core problem and defining clear, scalable solutions.
How to Design: Choosing the right models, frameworks, and architectural patterns that align with the problem.
Connecting the Dots: Piecing together disparate components—such as AI models, cloud services, and APIs—to create a cohesive and functional system. This shift to solution-oriented thinking requires developers to possess not only technical expertise but also the ability to collaborate with stakeholders, understand business needs, and anticipate future challenges.

The Ever-Relevant Foundations: Data Structures and Algorithms

While the tools and technologies around AI are constantly evolving, some fundamentals never go out of style. Data structures and algorithms remain as relevant as ever. They form the bedrock of problem-solving and efficiency, enabling developers to optimize systems and handle large-scale data processing—an essential skill in the AI-driven world.
Even as AI automates certain tasks, understanding these core concepts allows developers to build better, faster, and more innovative solutions.

The Role of Coding in the AI Era

Surprisingly, coding itself is no longer the centerpiece of a developer's skillset. With advancements in low-code and no-code platforms, pre-trained AI models, and comprehensive libraries, the need to write extensive code has diminished. However, knowing programming languages is still crucial. Why? Because understanding code enables developers to:

Debug and fine-tune automated systems.
Customize AI models and frameworks to fit specific requirements.
Communicate effectively with tools and systems. Thus, while coding is not the primary focus, being fluent in programming languages remains an essential skill for developers in the AI world.

The Myth of "Everything is MCP"

The rise of AI and automation has brought a new model into focus—MCP (Model, Control, Protocol). While MCPs are powerful tools for abstraction and standardization, there's a growing misconception that everything can (or should) be converted into MCP. This is not the solution.
Blindly migrating from everything microservices to everything MCP is not a sustainable approach. While MCPs work well for certain use cases, they are not a one-size-fits-all solution. Developers must adopt a hybrid understanding, recognizing the need for balance between microservices, APIs, and MCPs.
Instead of running away from one model to fully embrace another, the focus should be on understanding the specific requirements of a system and tailoring the architecture accordingly. A hybrid approach ensures flexibility, scalability, and efficiency, without forcing a rigid structure onto every problem.

Evolving as a Developer

The life of a developer in the AI-driven world is about adaptability and evolution. It's about stepping back, thinking critically, and designing solutions that leverage the full potential of AI and other technologies.
Execution is no longer the bottleneck; instead, having a larger vision, breaking it into actionable steps, and executing those steps with speed and agility is where the real value lies. At the same time, developers must avoid the trap of oversimplifying systems by forcing everything into a single model like MCP. Balance and understanding are key.
In this fast-paced world, developers who master solution design, embrace the ever-relevant fundamentals, and approach architecture with flexibility and foresight will thrive. AI isn't replacing developers—it's empowering them to think bigger, innovate smarter, and build the future.

How to Build and Run K3s on macOS with Multipass and k3d

Amol Kavitkar — Thu, 08 May 2025 18:43:04 +0000

Step by step guide for Developer’s using Multipass and k3d

K3s is a lightweight Kubernetes distribution designed for resource-constrained environments, making it ideal for local development or edge computing. However, building K3s from source on macOS isn’t straightforward, since K3s requires a Linux environment to compile. In this guide, we’ll walk through how to build K3s from source using Canonical’s Multipass, transfer the resulting image to your Mac, and then run it using k3d.

Prerequisites:

Before we begin, ensure that Homebrew is installed on your macOS system. You can install all required dependencies using Homebrew.

🔧 Install Multipass

Multipass is a fast and easy way to spin up lightweight Linux virtual machines (VMs) on macOS. We’ll use it to create a Linux environment where K3s can be built. Install it via Homebrew:

brew install multipass

Verify Installation: After the installation is complete, you can verify it by checking the version

multipass version

Learn More : For additional details about Multipass and alternative installation methods, visit the Multipass Installation Guide.

🔧 Install K3d

Install k3d, a lightweight wrapper for running K3s clusters in Docker, using Homebrew:

brew install k3d

Verify Installation: After the installation is complete, you can verify it by checking the version

~ k3d version k3d version v5.8.3 k3s version v1.31.5-k3s1 (default)

🚀 Launch Multipass and Set Up the Build Environment

Now, let’s create a Linux VM using Multipass to serve as the build environment for K3s.

Launch a Multipass Instance: Run the following command to create a Multipass VM named k3sServer with 2 CPUs, 3GB of memory, and 20GB of disk space.

multipass launch --name k3sServer --cpus 2 --memory 3G --disk 20G

Login to the Multipass Instance: Once the VM is up, log into it using:

multipass shell k3sServer

You’ll now be inside the Multipass shell, and You’ll be greeted with the prompt:

ubuntu@k3sServer:~$

Install Required Tools Inside VM:

To build K3s, you’ll need Docker and make installed inside the Multipass VM.

Install Docker: Run the following commands to install Docker:

sudo apt update sudo apt install docker.io

Docker Configuration (Optional): If you encounter issues with docker build, ensure the Docker daemon is properly configured. You may need to add a DNS configuration as below:

~ cat /etc/docker/daemon.json { "dns": ["172.17.0.1", "8.8.8.8"] }

Install make: Install the make utility:

sudo apt install make

🧪 Clone and Build K3s

Now that the environment is ready, let’s proceed with building K3s.

Clone the K3s Repository: Clone the official K3s GitHub repository

git clone --depth 1 https://github.com/k3s-io/k3s.git

Navigate to the Repository: Go into the K3s directory

cd k3s

Prepare the Build Environment: Run the following commands to download dependencies and generate required files. Run these commands one by one if you want to check what each step is doing

sudo mkdir -p build/data && make download && make generate

Build K3s: Finally, build K3s with the following command (The SKIP_VALIDATE=true flag skips some validation steps, making the build process faster.)

sudo SKIP_VALIDATE=true make

Build Documentation: Build instructions are available at the official repo:
👉 https://github.com/k3s-io/k3s/blob/master/BUILDING.md

📦 Verify Docker Images

Once the build is complete, you can verify the generated Docker images.
List Docker Images: Take note of the rancher/k3s image you just built

sudo docker images
REPOSITORY                         TAG                          IMAGE ID       SIZE
rancher/k3s                        v1.33.0-k3s-c2efae3e-arm64   9ae40bc58195   227MB
k3s                                master                       622b36d40a23   1.25GB
...

Identify the K3s image (e.g., rancher/k3s:v1.33.0-k3s-c2efae3e-arm64) for the next step.

🔁 Transfer Image to macOS Host

To use the built K3s image on your macOS host, transfer the Docker image from the Multipass instance to Mac

Save the Docker Image: Inside the Multipass shell, save the K3s Docker image as a compressed .tar.gz file

sudo docker save rancher/k3s:v1.33.0-k3s-c2efae3e-arm64 | gzip > rk3s.tar.gz

Exit the Multipass Shell: Exit the Multipass instance or user can use another terminal for next step

Transfer the Image to macOS: Use the multipass transfer command to copy the tar file to your macOS host

multipass transfer k3sServer:rk3s.tar.gz ~/Downloads

Load the Image into Docker: On your macOS host, load the image into Docker

docker load -i ~/Downloads/rk3s.tar.gz

🌐 Create a K3s Cluster with k3d

Create Cluster: Now create a Kubernetes cluster using the built image you just loaded

k3d cluster create --image rancher/k3s:v1.33.0-k3s-c2efae3e-arm64

If you run into any issues, use the --verbose or --trace flags to get more details during cluster creation.

Verify Cluster:

~ k3d cluster list
NAME          SERVERS   AGENTS   LOADBALANCER
k3s-default   1/1       0/0      true

Check Pods:


~ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-697968c856-gbvc8                  1/1     Running     0          25h
kube-system   helm-install-traefik-crd-vlrvr            0/1     Completed   0          25h
kube-system   helm-install-traefik-j5tm8                0/1     Completed   1          25h
kube-system   local-path-provisioner-774c6665dc-pt44v   1/1     Running     0          25h
kube-system   metrics-server-6f4c6675d5-6j47v           1/1     Running     0          25h
kube-system   svclb-traefik-a74de106-kzddc              2/2     Running     0          25h
kube-system   traefik-c98fdf6fb-gc2f5                   1/1     Running     0          25h

✅ Conclusion

You’ve just built K3s from source in a Linux VM, transferred the custom Docker image to your macOS host, and used it to spin up a Kubernetes cluster via k3d. This setup gives you full control over the version and build of K3s you’re using — great for testing new features, debugging, or contributing upstream.

With K3s now running in a Kubernetes cluster on your macOS host, you are ready to:
• Experiment with lightweight Kubernetes for development or testing environments.
• Deploy containerized workloads and explore Kubernetes features.
• Build further expertise with Kubernetes, multi-cloud setups, or edge computing.

If you encounter any issues, refer to the official K3s Build Guide or the k3d Documentation.