In this tutorial, we’ll walk through the process of building a sophisticated, tool-using AI agent. We’ll leverage the power of LangGraph to orchestrate the agent’s logic and integrate external services via the MCP. This approach allows you to create agents that can interact with a wide range of external systems, from simple APIs to complex, stateful services.
Note: This is an project core backbone, code snippets given are ref. working models which needs to be little polished/enhanced to build as per requirement but using this as core structure and putting it in copilot/windsurf/cursor can quickly build your project
Also in last section I have provided considerations for taking this to production scale
Core Concepts
Before we dive in, let’s clarify the key technologies we’ll be using:
LangGraph: A library for building stateful, multi-actor applications with LLMs. It allows us to define our agent’s behavior as a graph, making it easy to manage complex control flows.
MCP : A standardized protocol for communication between an AI agent and external tools or services. MCP servers expose their functionality as a set of tools that the agent can discover and use.
Step 1: Project Structure
A well-organized project structure is key to a maintainable application. Here’s a recommended layout:
/your-project
| — streamlit_app.py # Main application entry point (UI)
| — chatbot.py # Core LangGraph agent logic
| — mcp_tools.py # Logic for loading tools from MCP servers
| — custom_tools.py # Definitions for simple, local tools
| — tools_config.json # Configuration for all tools and MCP servers
| — .env # Environment variables (API keys, etc.)
| — … (other files)
  
  
  Step 2: Configuring Tools with tools_config.json
This file is the central hub for defining your agent’s capabilities. It specifies which tools are available, how to connect to MCP servers, and any necessary configurations.
Here’s an example tools_config.json:
{
 "tools": [
   {
     "name": "tavily_search",
     "description": "Search the web using Tavily API",
     "type": "tavily",
     "enabled": true
   },
   {
     "name": "calculator",
     "description": "Performs basic mathematical operations",
     "type": "custom",
     "function": "calculator",
     "enabled": true
   }
 ],
 "mcp_servers": {
     "grafana-mcp": {
        "command": "docker",
        "args": [
           "run",
           "-i",
           " - rm",
           "-e", "GRAFANA_URL=https://grafana.xxxxxx.com/",
           "-e", "GRAFANA_API_KEY=<auth_key>",
           "mcp/grafana",
           " - transport=stdio"
         ],
         "type": "stdio"
      },
      "some-remote-service": {
         "url": "https://example.com/mcp-sse",
         "type": "sse"
      }
   }
} 
In this configuration:
- We define a - tavily_searchtool for web searches and a- calculatortool, which will be a custom Python function.
- 
We specify two MCP servers: - 
grafanais run to get metrics information viastdio.
- 
some-remote-serviceis a remote service accessed via Server-Sent Events (SSE).
 
- 
In real we have two more MCP servers developed internally for product usecase, we can add as many MCP as needed but keep it few so that model doesn’t load up with too many tools and context can be limited.
As its all about giving right context to the model in prompts as well as in MCP tools definitions
Step 3: Loading MCP Tools
Now, let’s write the Python code to load the tools from our MCP servers. We’ll use the langchain_mcp_adapters library for this. In mcp_tools.py
# mcp_tools.py
import json
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_mcp_adapters.sessions import StdioConnection, SSEConnection
async def load_mcp_servers_from_config(config_path: str):
    with open(config_path, 'r') as f:
        config = json.load(f)
    all_tools = []
    mcp_servers = config.get("mcp_servers", {})
    for name, server_config in mcp_servers.items():
        server_type = server_config.get("type", "stdio")
        connection = None
        if server_type == "stdio":
            connection = StdioConnection(
            command=server_config.get("command"),
            args=server_config.get("args", []),
            )
        elif server_type == "sse":
            connection = SSEConnection(url=server_config.get("url"))
        if connection:
             tools = await load_mcp_tools(session=None, connection=connection)
             all_tools.extend(tools)
    return all_tools
This code reads the mcp_servers section of our config, creates the appropriate connection (StdioConnection or SSEConnection), and then uses load_mcp_tools to fetch the tools from each server.
Handling Sync and Async MCP Tools
Tools loaded from MCP servers might be asynchronous, but LangGraph may need to invoke them in a synchronous context. To handle this, we wrap each MCP tool in a StructuredTool that provides both synchronous (func) and asynchronous (coroutine) entry points.
First, we need a synchronous function to load the tools, as the chatbot’s initialization is synchronous. We create a simple wrapper using asyncio.run():
# mcp_tools.py
import asyncio
# … (async load_mcp_servers_from_config function)
def load_mcp_servers_sync(config_path: str):
   """Synchronous wrapper for loading MCP servers."""
   return asyncio.run(load_mcp_servers_from_config(config_path))
Step 4: Building the LangGraph Agent
This is where we define the core logic of our agent. In chatbot.py, we’ll create a LangGraphChatbot class that builds and runs the graph.
First, define the state of our graph:
# chatbot.py
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
class State(TypedDict):
    messages: Annotated[list, add_messages]
Next, create the LangGraphChatbot class. The __init__ method will set up the LLM and load all the tools (both custom and MCP).  
we are showcasing with ollama as local llm but this can be extended with any chat-model, we have multiplexed with bedrock further down, which can be converted to OpenAI as well easily
# chatbot.py
from langchain_ollama import ChatOllama
from langgraph.prebuilt import ToolNode
from custom_tools import CUSTOM_FUNCTIONS # Assuming you have this map
from mcp_tools import load_mcp_servers_from_config
class LangGraphChatbot:
    def __init__(self, tools_config_path: str):
        self.llm = ChatOllama(model="llama3") # Or any other LLM
        # load tools will load tools from standard tools defined such as tavily search 
        # also it will load all mcp tools from mcp servers
        self.tools = self._load_all_tools(tools_config_path)
        self.llm_with_tools = self.llm.bind_tools(self.tools)
        self.graph = self._build_graph()
    def _load_all_tools(self, config_path: str):
        # … (logic to load custom tools and call load_mcp_servers_from_config)
        pass
Next, in chatbot.py, we load the tools and wrap them:
# chatbot.py
from langchain_core.tools import StructuredTool
class LangGraphChatbot:
 # …
    def _load_mcp_tools(self, config_path: str):
       mcp_tools = load_mcp_servers_sync(config_path)
       return [self._create_wrapped_tool(tool) for tool in mcp_tools]
    def _create_wrapped_tool(self, tool):
       """Create a tool that works in both sync and async contexts."""
       return StructuredTool(
           name=tool.name,
           description=tool.description,
           args_schema=tool.args_schema,
           func=self._make_sync_wrapper(tool), # For sync calls
           coroutine=self._make_async_wrapper(tool) # For async calls
       )
    def _make_sync_wrapper(self, tool):
       def sync_wrapper(**kwargs):
           # If the tool is async-only, run its async method in a new event loop
           if not hasattr(tool, "invoke"):
               return asyncio.run(tool.ainvoke(kwargs))
           return tool.invoke(kwargs)
       return sync_wrapper
    def _make_async_wrapper(self, tool):
       async def async_wrapper(**kwargs):
           # Directly call the async method
           return await tool.ainvoke(kwargs)
       return async_wrapper
This setup ensures that no matter how LangGraph decides to call a tool, our wrapper will execute it correctly, either by calling its native method or by bridging the sync/async gap.
Now, let’s build the graph itself. We’ll define a more advanced node for calling the LLM, which injects a system prompt to guide the model’s behavior. We will also integrate conversation memory.
The LLM-Calling Node with a System Prompt
Instead of a simple LLM call, we’ll create a dedicated function that dynamically adds a system message to the conversation. This message instructs the LLM on how to behave, especially regarding tool usage, ensuring it uses your specialized MCP tools when needed.
# chatbot.py
class LangGraphChatbot:
    # ... (previous code)
    def _tool_calling_llm(self, state: State):
        """Node function for calling LLM with a system prompt for tools."""
        messages = state["messages"]
        # Create a list of available tool names for the system message
        tool_names = [tool.name for tool in self.tools]
        # Add system message to explicitly instruct tool usage
        system_msg = (
            f"""You are an AI assistant with access to specialized tools.\n"
            f"When asked about specific resources, you MUST use the appropriate tool.\n"
            f"Available tools: {', '.join(tool_names)}"""
        )
        # Combine system message with conversation history
        messages_with_prompt = [("system", system_msg)] + messages
        # Invoke the LLM with the modified messages
        response = self.llm_with_tools.invoke(messages_with_prompt)
        return {"messages": [response]}
Building the Graph with Memory
With our new LLM-calling node, we can build the graph. The structure remains similar, but we’ll use our new _tool_calling_llm function for the llm node. Crucially, we’ll also add a checkpointer during compilation to enable conversation memory.
Now, let’s build the graph itself.
# chatbot.py
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langgraph.checkpoint.memory import MemorySaver
class LangGraphChatbot:
    # ... (previous code)
    def _build_graph(self):
        graph_builder = StateGraph(State)
        # Add our custom LLM node and the standard tool node
        graph_builder.add_node("llm", self._tool_calling_llm)
        graph_builder.add_node("tools", ToolNode(self.tools))
        graph_builder.add_edge(START, "llm")
        graph_builder.add_conditional_edges(
            "llm",
            tools_condition,
            {
                "tools": "tools",
                "__end__": END
            }
        )
        graph_builder.add_edge("tools", "llm")
        # Compile the graph with a memory checkpointer
        self.memory = MemorySaver()
        return graph_builder.compile(checkpointer=self.memory)
This updated graph now intelligently guides the LLM with a system prompt and preserves conversation history across turns, making the agent much more robust and stateful.
Step 5: Integrating Multiple LLM Providers (AWS Bedrock and Ollama)
A powerful feature of this architecture is the ability to easily switch between different LLM providers. This allows you to use powerful cloud-based models from services like AWS Bedrock for production scenarios, while leveraging local models via Ollama for development, testing, and offline use. This flexibility is achieved by reading an environment variable to decide which LLM to initialize.
Let’s add this logic to our LangGraphChatbot’s __init__ method:
# chatbot.py
import os
from dotenv import load_dotenv
from langchain_aws import ChatBedrockConverse
from langchain_ollama import ChatOllama
class LangGraphChatbot:
    def __init__(self, tools_config_path: str):
        load_dotenv()
        # Determine which LLM to use from environment variables
        llm_provider = os.getenv("LLM_PROVIDER", "ollama").lower()
        if llm_provider == "bedrock":
             # Initialize AWS Bedrock LLM
             model_id = os.getenv("BEDROCK_MODEL_ID", "anthropic.claude-3-sonnet-v1:0")
             self.llm = ChatBedrockConverse(
                 model_id=model_id,
                 region_name=os.getenv("AWS_REGION"),
                 credentials_profile_name=os.getenv("AWS_CREDENTIALS_PROFILE")
             )
             print(f"🤖 Initialized AWS Bedrock with model: {model_id}")
        elif llm_provider == "ollama":
             # Initialize Ollama LLM for local models
             ollama_model = os.getenv("OLLAMA_MODEL", "llama3")
             self.llm = ChatOllama(
                 model=ollama_model,
                 base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
             )
             print(f"🤖 Initialized Ollama with model: {ollama_model}")
        else:
             raise ValueError(f"Unsupported LLM provider: {llm_provider}")
        self.tools = self._load_all_tools(tools_config_path)
        self.llm_with_tools = self.llm.bind_tools(self.tools)
        self.graph = self._build_graph()
     # … rest of the class
With this setup, you can switch from a local llama3 model to a powerful Claude 3 Sonnet model on Bedrock by simply changing the LLM_PROVIDER environment variable from ”ollama” to ”bedrock”.
Step 6: Creating the UI with Streamlit
Finally, let’s create a simple UI in streamlit_app.py to interact with our agent.
# streamlit_app.py
import streamlit as st
from chatbot import LangGraphChatbot
st.title("My AI Agent")
# Initialize the chatbot
chatbot = LangGraphChatbot(tools_config_path="tools_config.json")
if "messages" not in st.session_state:
    st.session_state.messages = []
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
if prompt := st.chat_input("What would you like to do?"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    with st.chat_message("assistant"):
        response = chatbot.graph.invoke({"messages": [("user", prompt)]})
        assistant_response = response["messages"][-1].content
        st.markdown(assistant_response)
    st.session_state.messages.append({"role": "assistant", "content": assistant_response})
This code creates a basic chat interface. When the user enters a prompt, it’s sent to our LangGraph agent, and the response is displayed on the screen.
Step 7: Deploying with Docker and a Start Script
To streamline deployment, we’ll use Docker Compose to manage our application’s services, including the Streamlit UI, the LLM server, and multiple MCP servers. A powerful start.sh script will serve as the main entry point for managing the entire stack.
  
  
  A Comprehensive docker-compose.yml
This docker-compose.yml defines the Streamlit app and multiple MCP servers, each configured to communicate via stdio. We use Docker Compose profiles to selectively run only the services we need.
version: '3.8'
services:
  # --- Application UI and Local LLM --- 
  streamlit_app:
    build: .
    container_name: ai_agent_ui
    ports: ["8501:8501"]
    environment:
      - LLM_PROVIDER=ollama
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on: [ollama]
    volumes: [.:/app]
    networks: [mcp-network]
    profiles: ["app"]
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports: ["11434:11434"]
    volumes: [./ollama_data:/root/.ollama]
    networks: [mcp-network]
    profiles: ["app"]
  # --- Example MCP Tool Server (stdio communication) ---
  grafana-mcp:
    image: mcp/grafana:latest
    container_name: grafana-mcp
    environment:
      - GRAFANA_URL=https://grafana.example.com/
      - GRAFANA_API_KEY=${GRAFANA_API_KEY} # Loaded from .env file
    stdin_open: true  # Required for stdio MCP
    tty: true         # Required for stdio MCP
    profiles: ["mcp"]
    networks: [mcp-network]
networks:
  mcp-network:
    driver: bridge
  
  
  The start.sh Management Script
Instead of running docker-compose commands manually, we use a start.sh script that provides a simple, high-level interface for managing the application. It handles environment checks, configuration validation, and service management.
Make the script executable with chmod +x start.sh.
Key Commands:
./start.sh web: The primary command. It checks your environment, validates configs, and starts the Streamlit web UI.
- 
./start.sh chat: Starts the interactive terminal-based chat client.
- 
./start.sh mcp-start: Starts all MCP server containers defined in thedocker-compose.ymlunder themcpprofile.
- 
./start.sh mcp-stop: Stops all running MCP server containers.
- 
./start.sh mcp-status: Checks the status and health of the MCP servers.
- 
./start.sh mcp-test: Runs communication tests against the MCP servers to ensure they are responsive.
- 
./start.sh check: Checks system requirements, such as the availability of your configured LLM provider (Ollama or AWS Bedrock).
- 
./start.sh validate: Runs a script to validate yourtools_config.jsonand other configuration files.
Scale Considerations
Taking a LangGraph agent from a prototype to a production system requires careful consideration of scalability, reliability, and performance. Here are key areas to focus on:
- 
Memory and State Management: For production, the default in-memory MemorySaveris insufficient. You should switch to a persistent, scalable check-pointer backend likelanggraph.checkpoint.sqlite.SqliteSaveror a custom implementation using Redis or a PostgreSQL database. This ensures that conversation state is not lost between application restarts and can be shared across multiple instances.
- Context Window Optimization: LLMs have finite context windows. As conversations grow, you must implement strategies to prune the context sent to the model. This can be done by summarizing older messages, using a sliding window approach, or implementing more advanced techniques like vector-based retrieval of relevant past messages to keep the context concise and effective.
- Concurrency and Asynchronous Execution: To handle multiple users simultaneously, your application must be asynchronous. LangGraph is designed for this, but you need to deploy it on an ASGI server like Uvicorn, potentially managed by Gunicorn with multiple worker processes. Ensure all your custom tools and I/O operations are non-blocking to maximize throughput.
- Deployment Model: While Streamlit is excellent for demos, production applications should typically run the LangGraph agent as a standalone web service (e.g., using FastAPI). This service exposes an API that any frontend client can interact with, allowing you to scale the agent logic independently of the user interface and place it behind a load balancer.
- Observability and Tracing: Production systems require robust monitoring. Integrating a tool like LangSmith is highly recommended for tracing the execution of your graphs, debugging issues, and monitoring performance metrics like latency, cost, and token usage. Structured logging is also essential for tracking the agent’s behavior and diagnosing errors.
- Robust Tool Handling: Production-grade tools need to be resilient. Implement proper error handling, timeouts, and retry mechanisms (e.g., with exponential backoff) for any external API calls. This prevents a single failing tool from causing the entire agent to fail.
Conclusion
By following the core structure, you can build a powerful and extensible AI agent. The combination of LangGraph for orchestration and MCP for tool integration provides a robust framework for creating agents that can interact with a wide variety of external systems. From here, you can add more tools, integrate different LLMs, and build more complex agent behaviors.
 

 
    
Top comments (0)