DEV Community

Cover image for Long Term Memory + RAG + MCP + LangGraph = The Key To Powerful Agentic AI
Gao Dalie (Ilyass)
Gao Dalie (Ilyass)

Posted on

Long Term Memory + RAG + MCP + LangGraph = The Key To Powerful Agentic AI

In this story, I have a super quick tutorial showing you how to create a multi-agent chatbot using LangGraph, MCP, RAG, and long-term memory to build a powerful agent chatbot for your business or personal use.

This AI agent is the most powerful one I have ever built. You can use RAG to answer questions by looking up information in dictionaries and other documents.

Just as we answer difficult questions by looking up information in books or on the internet, the MCP server serves as the “hands and feet” of the AI. It uses a human analogy, even if the brain (the AI agent) thinks, “Get me that book,” the book cannot be retrieved unless the hand (MCP) actually moves. The MCP server acts as a bridge that converts the AI’s “thoughts” into actual “actions.”

One of the big problems with agents is communication. It worked fine at first, but the more I used it, the worse it got. It didn’t learn from past mistakes. It kept making the same mistakes. But with this Powerful AI agent, we solved all the major pain points of the AI community.

If this is your first time watching me, I highly recommend checking out my previous stories. I created a video about the Latest AI technology, which has become a big hit in the AI community.

So, let me give you a quick demo of a live chatbot to show you what I mean.

Check a Video

Before I ask the question, I will load a memory. where I have past conversation and I will ask the chatbot a question: “Find the latest information about Large Language Models”

If you take a look at how the Agent generates the output, you’ll see that the multi-AI agent system we are building uses Google’s generative AI model (Gemini) and is a rudimentary multi-AI agent system in which a web search agent and a file operation agent work together autonomously under a manager agent that interacts with the user and issues instructions to specialised agents.

Just as humans work in teams, AI agents also work together, utilising their respective areas of expertise.

The three agents featured in the game are:

Supervisor (Manager): The brains of the team. They understand instructions from users, plan the entire task, decide which worker should do what, when, and give precise instructions.

Web Surfer (Worker): A professional information gatherer. Searches the web using keywords instructed by the Supervisor, gathers the necessary information, and reports it.

File Operator (Worker): A master of organisation and record keeping. Follows the Supervisor’s instructions to write information to files and read from existing files.

By having these agents work together, complex tasks that combine web searches and file operations, such as “find information about any product and compile it into a CSV file,” can be automatically executed with just a single user command.

In this example, the tasks performed by the specialised agents are limited to web searches and file operations. Still, by increasing the number of specialised agents and assigning them personas and tools, it becomes possible to expand functionality according to the use case flexibly.

For example, by utilising MCP, it is possible to implement the following agents that function as workers, allowing for the automation of more complex and practical tasks.

Let’s start coding :

Let us now explore step by step and unravel the answer to how to create a LangGraph, RAG, MCP, and Long Term Memory. We will install the libraries that support the model. For this, we will do a pip install requirements.

I would like to inform you that the code I shared here is only a part of my code. If you would like the full folder, you can find it on my Patreon. This code took me a considerable amount of time, and this agent is the most powerful and advanced agent I have built. All the techniques are in my folder.


pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

The next step is the usual one: We will import the relevant libraries, the significance of which will become evident as we proceed and perform some basic configuration.

import streamlit as st
import json
import os
import logging
import uuid
import asyncio
import warnings
from dotenv import load_dotenv, find_dotenv
from typing import List, TypedDict

# LangChain and LangGraph components
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage, ToolMessage, messages_to_dict, messages_from_dict
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import ToolNode
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_mcp_adapters.client import MultiServerMCPClient
mcp_config.json
Enter fullscreen mode Exit fullscreen mode

Agents use “tools” to perform specific “actions” such as web searches or file operations. This system uses tools via a mechanism called Model-Context-Protocol (MCP). mcp_config.jsonFiles are configuration files that define which tools to launch and how.

Create a file directly under the project folder, mcp_config.JSON and write the following content in it.

web-search: Tool server settings for performing web searches. npx Start the Playwright MCP server.

file-system: Toolserver settings for reading and writing files.

args/path/to/your/project/multi-agent-system/outputPlease change the part to suit your environment. This is the absolute path of the folder where the file operator agent is allowed to read and write files. For example, output creates a folder in the project and specifies its path. Please note that if you specify the path incorrectly, file operations will not be possible.

{
    “mcpServers”: {
      “web-search”: {
        “command”: “npx”,
        “args”: [
          “@playwright/mcp@latest”
        ],
        “transport”: “stdio”
      },
      “file-system”: {
        “command”: “npx”,
        “args”: [
          “-y”,
          “@modelcontextprotocol/server-filesystem”,
          “/path/to/your/project/multi-agent-system/output”
        ],
        “transport”: “stdio”
      }
    }
}
Enter fullscreen mode Exit fullscreen mode

Let‘s sanitize_schema to walk dictionaries and lists recursively, remove unwanted keys like additionalProperties and $schema, and normalise a type field that can be a list by selecting the first non-NULL value and uppercasing it, applying the same cleaning to every nested value; I added save_conversation(session_id, messages) which guards against empty inputs, builds a path under CONVERSATION_HISTORY_DIR, converts message objects to plain dictionaries with messages_to_dict, and writes them as UTF-8 JSON with ensure_ascii=False and indent=2;

I implemented load_conversation(session_id) to return an empty list if the file is missing, otherwise load the JSON and turn it back into message objects with messages_from_dict, returning an empty list on JSONDecodeError or TypeError to fail gracefully;

I built list_conversations() to scan the directory for .json files, pull each file’s modification time, load its messages, pick the first human message that isn’t an internal instruction to use as a short title (truncated with an ellipsis at 40 characters), and collect {id, title, mtime} entries while skipping files that have errors, and finally sorted the list by modification time descending, and I added delete_conversation(session_id) to safely remove the corresponding JSON file if it exists.

def sanitize_schema(item):
    “”“Sanitize MCP tool schema for LangChain compatibility”“”
    if isinstance(item, dict):
        item.pop(’additionalProperties’, None)
        item.pop(’$schema’, None)
        if ‘type’ in item and isinstance(item[’type’], list):
            non_null_types = [t for t in item[’type’] if str(t).upper() != ‘NULL’]
            item[’type’] = str(non_null_types[0]).upper() if non_null_types else None
        for key, value in item.items():
            item[key] = sanitize_schema(value)
    elif isinstance(item, list):
        return [sanitize_schema(i) for i in item]
    return item

def save_conversation(session_id: str, messages: List[BaseMessage]):
    “”“Save conversation to JSON file”“”
    if not session_id or not messages:
        return
    file_path = os.path.join(CONVERSATION_HISTORY_DIR, f”{session_id}.json”)
    with open(file_path, “w”, encoding=”utf-8”) as f:
        json.dump(messages_to_dict(messages), f, ensure_ascii=False, indent=2)

def load_conversation(session_id: str) -> List[BaseMessage]:
    “”“Load conversation from JSON file”“”
    file_path = os.path.join(CONVERSATION_HISTORY_DIR, f”{session_id}.json”)
    if not os.path.exists(file_path):
        return []
    with open(file_path, “r”, encoding=”utf-8”) as f:
        try:
            data = json.load(f)
            return messages_from_dict(data)
        except (json.JSONDecodeError, TypeError):
            return []

def list_conversations() -> List[dict]:
    “”“Get list of saved conversations”“”
    conversations = []
    for filename in os.listdir(CONVERSATION_HISTORY_DIR):
        if filename.endswith(”.json”):
            session_id = filename[:-5]
            file_path = os.path.join(CONVERSATION_HISTORY_DIR, filename)
            try:
                mtime = os.path.getmtime(file_path)
                messages = load_conversation(session_id)
                # Get first user message as conversation title
                first_user_message = next(
                    (m.content for m in messages
                     if isinstance(m, HumanMessage) and m.additional_kwargs.get(”role”) != “internal_instruction”),
                    “New conversation”
                )
                title = first_user_message[:40] + “...” if len(first_user_message) > 40 else first_user_message
                conversations.append({”id”: session_id, “title”: title, “mtime”: mtime})
            except Exception:
                continue
    conversations.sort(key=lambda x: x[”mtime”], reverse=True)
    return conversations

def delete_conversation(session_id: str):
    “”“Delete conversation file”“”
    file_path = os.path.join(CONVERSATION_HISTORY_DIR, f”{session_id}.json”)
    if os.path.exists(file_path):
        os.remove(file_path)
Enter fullscreen mode Exit fullscreen mode

I made a pair of helper functions to spin up a worker agent and a supervisor agent that coordinate tasks in a multi-agent setup: I wrote create_worker to take a language model, a list of tools, and a system prompt, then build a ChatPromptTemplate consisting of a system role message plus a placeholder for past conversation history, and finally, return a pipeline that connects this prompt with the LLM bound to its tools;

I then built create_supervisor to orchestrate the workers by defining a long system prompt that explains the manager’s responsibilities—analysing the user’s request, breaking it into subtasks, deciding which worker acts next, passing along prior results for continuity, finishing when complete, and retrying if a worker fails—while also dynamically listing the available workers;

I created an output_schema that forces the supervisor to respond with a structured object containing a next field (worker name or FINISH) and a content field (instructions or final user response); and finally, I constructed a ChatPromptTemplate for the supervisor, then bound the LLM with this schema using bind_tools and tool_choice=”supervisor_decision”, returning both the prompt and the configured LLM so they can drive the agent loop together.

def create_worker(llm: ChatGoogleGenerativeAI, tools: list, system_prompt: str):
    “”“Create a worker agent with specific role”“”
    prompt = ChatPromptTemplate.from_messages([
        (”system”, system_prompt),
        MessagesPlaceholder(variable_name=”messages”),
    ])
    return prompt | llm.bind_tools(tools)

def create_supervisor(llm: ChatGoogleGenerativeAI, worker_names: List[str]):
    “”“Create supervisor that manages tasks and directs workers”“”
    system_prompt = (
        “You are the manager of an AI team. Your job is to supervise your worker team to achieve user requests.\\n”
        “Carefully review the entire conversation history (user requests, workers’ previous results, etc.).\\n\\n”
        “Follow these steps:\\n”
        “1. **Task Analysis**: Consider the steps needed to fulfill the user’s request. Multiple workers may need to collaborate. “
        “For example, ‘WebSurfer’ collects information that ‘FileOperator’ writes to a file.\\n”
        “2. **Decide Next Action**: Based on analysis, determine the next action:\\n”
        “   - **Worker Instructions**: When assigning a task to a worker, specify the worker name in `next` and detailed instructions in `content`. “
        “**Important: Include previous workers’ output results in the next worker’s instructions.** This enables information flow between workers.\\n”
        “   - **Direct User Response**: When all tasks are complete or for simple responses not requiring workers, “
        “set `next` to ‘FINISH’ and provide the final response in `content`.\\n”
        “   - **Recovery from Failure**: If a worker fails, review conversation history, modify instructions and retry, or try a different approach.\\n\\n”
        f”Available workers:\\n{chr(10).join(f’- {name}’ for name in worker_names)}”
    )

    output_schema = {
        “title”: “supervisor_decision”,
        “type”: “object”,
        “properties”: {
            “next”: {”type”: “string”, “description”: f”Next worker name ({’, ‘.join(worker_names)} or FINISH)”},
            “content”: {”type”: “string”, “description”: “Instructions for worker or final response to user”}
        },
        “required”: [”next”, “content”]
    }

    prompt = ChatPromptTemplate.from_messages([
        (”system”, system_prompt),
        MessagesPlaceholder(variable_name=”messages”),
    ])

    llm_with_tool = llm.bind_tools(tools=[output_schema], tool_choice=”supervisor_decision”)
    return prompt, llm_with_tool
Enter fullscreen mode Exit fullscreen mode

I developed a supervisor_node function to serve as the brain of the supervisor agent, guiding the workflow and recording its reasoning: I started by logging that the supervisor node is running, then built a chain by piping the supervisor_prompt into the supervisor_llm and invoked it with the current conversation history (state[”messages”]);

I extracted the usage_metadata from the response to calculate and log the cost of running the supervisor model; I then pulled out the supervisor’s structured decision from the first tool call, capturing both the content (instructions or final message) and the next action (worker name or FINISH), and printed a debug statement with those values; I created an AIMessage that reflects the supervisor’s reasoning, formatting it as an instruction when directing a worker, and as plain content when finishing.

If the decision wasn’t FINISHI generated an internal HumanMessage flagged with role=”internal_instruction” to pass along to the worker, returning an updated state with both the supervisor’s comment and the worker’s instruction, along with the next action, but if the decision was FINISHI just appended the supervisor’s comment and returned the state with next=” FINISH”

Finally, I wrapped everything in a try/except block to catch errors, log them, and gracefully return an error AIMessage with next=” FINISH”, So the flow doesn’t break.

def supervisor_node(state: AgentState):
        “”“Supervisor node that decides what to do next and records its thinking”“”
        logger.info(”--- Supervisor Node ---”)

        try:
            chain = supervisor_prompt | supervisor_llm
            response_message = chain.invoke({”messages”: state[”messages”]})

            # Calculate and log costs
            usage_metadata = response_message.response_metadata.get(”usage_metadata”, {})
            costs = calculate_cost(usage_metadata, supervisor_model_name)
            logger.info(f”Cost (Supervisor): ${costs[’total’]:.6f}”)

            # Extract supervisor decision
            tool_call = response_message.tool_calls[0]
            supervisor_output = tool_call[’args’]
            logger.info(f”Supervisor Decision: {supervisor_output}”)

            content = supervisor_output.get(”content”, “”)
            next_action = supervisor_output.get(”next”, “FINISH”)

            print(f”DEBUG Supervisor Decision: next=’{next_action}’, content=’{content}’”)

            # Create supervisor’s thinking message for UI
            supervisor_comment_content = content if next_action == “FINISH” else f”【Instruction to {next_action}】\\n{content}”
            supervisor_comment = AIMessage(content=supervisor_comment_content, name=”Supervisor”)

            if next_action != “FINISH”:
                # Internal instruction for worker
                instruction_for_worker = HumanMessage(
                    content=content,
                    additional_kwargs={”role”: “internal_instruction”}
                )
                return {
                    “messages”: state[”messages”] + [supervisor_comment, instruction_for_worker],
                    “next”: next_action
                }
            else:
                return {
                    “messages”: state[”messages”] + [supervisor_comment],
                    “next”: next_action
                }
        except Exception as e:
            logger.error(f”Supervisor error: {e}”)
            error_response = AIMessage(content=f”I encountered an error while processing your request: {str(e)}”, name=”Supervisor”)
            return {”messages”: state[”messages”] + [error_response], “next”: “FINISH”}
Enter fullscreen mode Exit fullscreen mode

I made a worker_node and its supporting routing logic to let workers execute tasks, call tools, and feed results back into the multi-agent loop with robust error handling: I began worker_node by looking up the assigned worker’s name from state[”next”], logging which worker is running, and invoking the worker with the conversation history while enforcing a recursion limit of 10; I added debug prints to show whether the worker response included tool calls and which tools were triggered; I wrapped cost calculation in a safe try/except, logging the model’s cost when usage_metadata was available and warned otherwise;

I checked whether the response carried meaningful content or tool_calls, and if neither was present, I replaced it with an apologetic fallback AIMessage So the system never returns empty output; I also ensured the response carried the correct worker name and appended it to the message history in the returned state; I surrounded the entire block in a try/except so any exception is caught and turned into an error message from the worker instead of crashing.

Then I created _tool_node as a ToolNode(tools) instance and wrapped it in an async custom_tool_node that executes tool calls via ainvoke and appends results back into the state; finally, I defined routing helpers: after_worker_router, which checks if the worker’s last message included tool calls and routes either to “tools” or back to the “supervisor”, and supervisor_router, which inspects the supervisor’s next decision and routes either to the specified worker or to END If no further action is required.

def worker_node(state: AgentState):
        “”“Worker node that executes assigned tasks with error handling”“”
        worker_name = state[”next”]
        worker = workers[worker_name]
        logger.info(f”--- Worker Node: {worker_name} ---”)

        try:

            response = worker.invoke({”messages”: state[’messages’]}, {”recursion_limit”:10})

            print(f”DEBUG Worker {worker_name} response has tool_calls: {hasattr(response, ‘tool_calls’) and bool(response.tool_calls)}”)
            if hasattr(response, ‘tool_calls’) and response.tool_calls:
                print(f”DEBUG Tool calls: {[tc[’name’] for tc in response.tool_calls]}”)
            # Calculate costs safely
            try:
                usage_metadata = response.response_metadata.get(”usage_metadata”, {})
                costs = calculate_cost(usage_metadata, worker_model_name)
                logger.info(f”Cost ({worker_name}): ${costs[’total’]:.6f}”)
            except:
                logger.warning(”Could not calculate costs”)

            # Check if response has content or tool calls
            has_content = bool(response.content)
            has_tool_calls = hasattr(response, ‘tool_calls’) and bool(response.tool_calls)

            if not has_content and not has_tool_calls:
                error_message = “I apologize, but I encountered a technical issue and couldn’t complete the task. Please try rephrasing your request.”
                response = AIMessage(content=error_message, name=worker_name)

            # Ensure response has a name
            response.name = worker_name
            return {”messages”: state[”messages”] + [response]}

        except Exception as e:
            logger.error(f”Worker {worker_name} exception: {e}”)
            error_message = “I encountered an error while processing your request. Please try again or rephrase your question.”
            error_response = AIMessage(content=error_message, name=worker_name)
            return {”messages”: state[”messages”] + [error_response]}

    # Tool execution node
    _tool_node = ToolNode(tools)

    async def custom_tool_node(state: AgentState):
        “”“Node that executes tools called by workers”“”
        tool_results = await _tool_node.ainvoke(state)
        return {”messages”: state[”messages”] + tool_results[”messages”]}

    # --- Routing Functions ---

    def after_worker_router(state: AgentState) -> str:
        “”“Router that decides where to go after worker execution”“”
        last_message = state[”messages”][-1]
        if hasattr(last_message, “tool_calls”) and last_message.tool_calls:
            return “tools”
        return “supervisor”

    def supervisor_router(state: AgentState) -> str:
        “”“Router that decides where to go after supervisor decision”“”
        next_val = state.get(”next”)
        if not next_val or next_val == “FINISH”:
            return END
        return next_val
Enter fullscreen mode Exit fullscreen mode

I made a workflow orchestration graph that connects the supervisor, workers, and tools into a single state machine: I started by initialising a StateGraph with the AgentState type, then added nodes for the supervisor, the tools, and each worker dynamically by looping over the workers dictionary;

I set up conditional edges for each worker using after_worker_router, so that after completing a task, the flow either routes to “tools” If tool calls are present or back to “supervisor” Otherwise, I defined a direct edge from “tools” back to “supervisor” To ensure tool results are always reviewed, I then configured the supervisor’s routing with supervisor_router,

So its decisions can branch to specific workers or end the workflow when tasks are complete; I marked the supervisor as the entry point by adding an edge from START to “supervisor”, ensuring all requests begin under its control; finally, I compiled the workflow with a MemorySaver Checkpointer to persist conversation state across steps, returned the resulting app, and logged that graph initialisation had completed.

workflow = StateGraph(AgentState)

    # Add nodes
    workflow.add_node(”supervisor”, supervisor_node)
    workflow.add_node(”tools”, custom_tool_node)
    for name in workers:
        workflow.add_node(name, worker_node)

    # Add conditional edges for workers
    for name in workers:
        workflow.add_conditional_edges(
            name,
            after_worker_router,
            {”tools”: “tools”, “supervisor”: “supervisor”}
        )

    # Tools always return to supervisor
    workflow.add_edge(”tools”, “supervisor”)

    # Supervisor conditional routing
    workflow.add_conditional_edges(
        “supervisor”,
        supervisor_router,
        {**{name: name for name in workers}, END: END}
    )

    # Start with supervisor
    workflow.add_edge(START, “supervisor”)

    # Compile with memory
    memory = MemorySaver()
    app = workflow.compile(checkpointer=memory)

    logger.info(”Graph initialization completed.”)
    return app
Enter fullscreen mode Exit fullscreen mode

Conclusion :

The combination of AI agents is expected to change the way we work and conduct business dramatically. The role of AI will change dramatically from its current role as a “teacher” to a “reliable partner,” and even “someone who acts on our behalf.”

The following abilities are considered particularly important for making effective use of AI agents:

“The power to ask questions”: The ability to define problems and give clear instructions and requirements to AI.

“Ability to confirm and decide”: The ability to evaluate the results generated by AI and make final decisions.

“Ability to assign work through multitasking”: The ability to appropriately use multiple AI agents and allocate tasks efficiently.

🧙‍♂️ I am an AI Generative expert! If you want to collaborate on a project, drop an inquiry here or book a 1-on-1 Consulting Call With Me.

I would highly appreciate it if you

❣ Join my Patreon: https://www.patreon.com/GaoDalie_AI

Top comments (0)