Building an Adaptive RAG Agent with LangGraph: Dynamic Routing and Stateful Memory

#ai #langchain #python #machinelearning

Building an Adaptive RAG Agent with LangGraph: Dynamic Routing and Stateful Memory

Building a basic "Retrieve and Generate" (RAG) pipeline takes about ten lines of code these days. But what happens when a user asks a simple greeting? Your system wastes compute querying a vector database. What happens on turn five of a conversation when the user says, "Wait, explain that second point again?" A naive RAG system suffers from amnesia and fails entirely.

To build a production-grade AI assistant, you need more than a linear chain. You need a stateful, decision-making agent.

Here is how I engineered an Adaptive RAG Assistant using LangGraph to handle dynamic search routing and stateful memory injection, completely eliminating context amnesia.

1. The Core Problem: Linear Chains vs. State Machines

Standard LangChain workflows are Directed Acyclic Graphs (DAGs). Data flows from A -> B -> C. But real human conversation is cyclical. We loop back, we clarify, and we change topics.

I migrated the architecture to LangGraph because it treats the LLM workflow as a state machine. By defining a global State object that gets passed between nodes, the application can loop, make decisions, and retain context over time.

Here is the foundation of the graph state:

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator

# The graph state that persists across all nodes
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    context: str
    routing_decision: str

2. The Brain: Dynamic Routing Strategy

Not every query requires a massive vector database search. To optimize latency and compute, I built a routing node that evaluates the user's query and assigns it to one of three strategies:

Light Search: For greetings or general knowledge ("Hello," "What is Python?"). Bypasses the retriever entirely and uses the LLM's internal knowledge.
Standard Search: For direct factual questions. Triggers a standard semantic search against the vector store.
Deep Search: For complex, multi-hop queries. Triggers an agentic loop that might query the database multiple times to synthesize an answer.

Here is what that routing logic looks like in the graph:

def route_query(state: AgentState):
    query = state["messages"][-1].content

    # Prompting the LLM to act as a router
    router_prompt = f"Analyze this query and classify the required search depth: 'Light', 'Standard', or 'Deep'. Query: {query}"
    decision = llm.invoke(router_prompt).content.strip()

    return {"routing_decision": decision}

# Defining the LangGraph conditional edges
workflow.add_conditional_edges(
    "router_node",
    lambda x: x["routing_decision"],
    {
        "Light": "llm_direct_node",
        "Standard": "vector_search_node",
        "Deep": "agentic_research_node"
    }
)

3. Curing Amnesia: Stateful Memory Injection

The most frustrating part of interacting with a standard RAG bot is its inability to remember the previous message.

Because LangGraph inherently passes the AgentState object through the execution graph, I structured the messages key to append every new interaction natively using operator.add.

When the workflow routes to the retrieval node, it doesn't just embed the user's latest message. It injects the last 3 turns of conversation into a contextualizer prompt.

def retrieve_and_inject(state: AgentState):
    # Extract chat history
    chat_history = state["messages"][:-1]
    latest_query = state["messages"][-1].content

    # Rewrite the query based on conversation history
    contextualized_query = contextualize_llm.invoke(
        f"History: {chat_history}\nLatest: {latest_query}\nRewrite query for vector search:"
    ).content

    # Perform retrieval using the rewritten, context-aware query
    docs = vector_store.similarity_search(contextualized_query, k=4)
    context_str = "\n".join([d.page_content for d in docs])

    return {"context": context_str}

If the user says, "Tell me about LangGraph," and then follows up with, "How does it compare to LangChain?", the retriever understands that "it" refers to LangGraph, pulling the correct documents from the vector space.

The Takeaway

If you are building an AI application meant for real users, you have to move past naive linear chains.

By leveraging LangGraph for stateful orchestration, you can build systems that actually think about how to answer a question before they start searching, saving compute and creating a vastly superior, context-aware user experience.

Top comments (1)

klement Gunndu • Mar 17

The three-tier routing strategy is smart — the light search bypass alone probably cuts 80% of unnecessary retriever calls. The state machine framing for LangGraph over plain DAGs clicked for me.