Sai Raghavendra

Posted on Apr 7

Why Your LangGraph Agents Fail in Production — And the Architecture That Fixes It

#ai #langchain #python #programming

If you've built a LangGraph agent from a tutorial, you already know the feeling.

It works perfectly in the notebook. Clean output. Agents routing correctly. Everything looks great.

Then you try to actually ship it.

Suddenly you're dealing with agents that forget context between sessions, no way to see what's happening inside the graph, no clean API to call it from your app, and a setup process that breaks on every fresh machine.

This isn't a LangGraph problem. LangGraph is excellent. It's a gap between tutorial code and production code — and almost nobody talks about what actually needs to change.

This article covers exactly that.

What's Missing From Every LangGraph Tutorial

Here's what most tutorials give you:

from langgraph.graph import StateGraph

graph = StateGraph(AgentState)
graph.add_node("agent", agent_fn)
graph.add_edge(START, "agent")
graph.add_edge("agent", END)
app = graph.compile()

result = app.invoke({"messages": [...]})
print(result)

It works. But it's missing everything you need to ship:

No memory — every run starts from zero, no context from previous sessions
No observability — you can't see what's happening inside the graph when it fails
No API — you can't call this from a frontend or another service
No deployment — you can't run this on a server without rewriting it
No error handling — one bad LLM response breaks the whole chain

Let's fix all of that.

The Production Architecture

Here's what a production-ready multi-agent LangGraph system actually looks like:

┌─────────────────────────────────────────┐
│              FastAPI Layer              │
│         /run    /stream    /health      │
└─────────────────┬───────────────────────┘
                  │
┌─────────────────▼───────────────────────┐
│           LangGraph StateGraph          │
│                                         │
│   ┌──────────┐     ┌─────────────────┐  │
│   │Supervisor│────▶│ Researcher Agent│  │
│   │  Agent   │     └─────────────────┘  │
│   │          │     ┌─────────────────┐  │
│   │          │────▶│  Drafter Agent  │  │
│   └──────────┘     └─────────────────┘  │
└─────────────────┬───────────────────────┘
                  │
     ┌────────────┴────────────┐
     │                         │
┌────▼──────┐         ┌───────▼──────┐
│ ChromaDB  │         │  LangSmith   │
│  Memory   │         │   Tracing    │
└───────────┘         └──────────────┘

Four layers. Each one solves a real production problem.

Layer 1: Typed State Schema

The first thing that breaks in production is untyped state. When agents pass data between each other with no schema, you get silent failures and impossible-to-debug errors.

Fix it with a typed schema from the start:

# core/state.py
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    task: str
    context: str
    output: str
    error: str | None
    retry_count: int

Every field is typed. Every agent knows exactly what it's getting and what it needs to return. No surprises.

Layer 2: Supervisor + Worker Pattern

The supervisor decides which worker handles the task. Workers focus on doing one thing well.

# core/supervisor.py
from langchain_core.messages import HumanMessage
from core.state import AgentState

def supervisor_node(state: AgentState) -> AgentState:
    task = state["task"]

    # Route based on task type
    if "research" in task.lower() or "find" in task.lower():
        return {**state, "next": "researcher"}
    elif "write" in task.lower() or "draft" in task.lower():
        return {**state, "next": "drafter"}
    else:
        return {**state, "next": "researcher"}  # default

# core/graph.py
from langgraph.graph import StateGraph, START, END
from core.state import AgentState
from core.supervisor import supervisor_node
from core.workers.researcher import researcher_node
from core.workers.drafter import drafter_node

def build_graph():
    graph = StateGraph(AgentState)

    graph.add_node("supervisor", supervisor_node)
    graph.add_node("researcher", researcher_node)
    graph.add_node("drafter", drafter_node)

    graph.add_edge(START, "supervisor")
    graph.add_conditional_edges(
        "supervisor",
        lambda state: state["next"],
        {"researcher": "researcher", "drafter": "drafter"}
    )
    graph.add_edge("researcher", END)
    graph.add_edge("drafter", END)

    return graph.compile()

Clean. Testable. Each node is a pure function.

Layer 3: Long-Term Memory with ChromaDB

This is the biggest gap in tutorial code. Agents that forget everything between sessions are useless for real applications.

# memory/long_term.py
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

def get_memory_store():
    embeddings = OpenAIEmbeddings()

    return Chroma(
        collection_name="agent_memory",
        embedding_function=embeddings,
        persist_directory="./chroma_db"
    )

def save_to_memory(store, content: str, metadata: dict):
    store.add_texts(
        texts=[content],
        metadatas=[metadata]
    )

def retrieve_from_memory(store, query: str, k: int = 3):
    results = store.similarity_search(query, k=k)
    return [doc.page_content for doc in results]

Now your agents remember what happened in previous sessions. Context persists. Users don't have to repeat themselves.

Layer 4: LangSmith Observability

You cannot debug what you cannot see. LangSmith gives you full visibility into every node execution, every LLM call, every token.

# observability/langsmith_setup.py
import os

def setup_langsmith():
    os.environ["LANGCHAIN_TRACING_V2"] = "true"
    os.environ["LANGCHAIN_PROJECT"] = os.getenv(
        "LANGSMITH_PROJECT", 
        "langgraph-production-kit"
    )
    # LANGCHAIN_API_KEY set in .env

Call this once at startup. Every graph execution is automatically traced. When something breaks in production, you open LangSmith and see exactly which node failed and why.

Layer 5: FastAPI Wrapper

Your agent needs to be callable from the outside world.

# api/main.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from core.graph import build_graph
from observability.langsmith_setup import setup_langsmith

setup_langsmith()
app = FastAPI()
graph = build_graph()

class RunRequest(BaseModel):
    task: str
    context: str = ""

@app.post("/run")
async def run_agent(request: RunRequest):
    result = graph.invoke({
        "task": request.task,
        "context": request.context,
        "messages": [],
        "output": "",
        "error": None,
        "retry_count": 0
    })
    return {"output": result["output"], "error": result["error"]}

@app.post("/stream")
async def stream_agent(request: RunRequest):
    async def generate():
        async for chunk in graph.astream({
            "task": request.task,
            "context": request.context,
            "messages": [],
            "output": "",
            "error": None,
            "retry_count": 0
        }):
            yield f"data: {str(chunk)}\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

@app.get("/health")
def health():
    return {"status": "ok"}

Now any frontend, mobile app, or service can call your agent over HTTP.

Layer 6: Docker Compose

The setup that works on your machine needs to work everywhere.

# docker-compose.yml
version: '3.8'

services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
      - LANGCHAIN_TRACING_V2=true
    volumes:
      - ./chroma_db:/app/chroma_db
    command: uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8001:8000"
    volumes:
      - ./chroma_data:/chroma/chroma

One command from any machine:

docker-compose up

Your agent is running at localhost:8000. No dependency issues. No "works on my machine."

The Free Lite Version

I've packaged the core architecture above — supervisor + worker agents, typed state, FastAPI — into a free starter repo.

No Docker. No ChromaDB. No LangSmith. Just the clean foundation to understand the pattern.

→ Get the free lite version on GitHub

The Full Production Kit

If you're building something real and need everything — memory, observability, Docker, streaming, error handling — I packaged the complete production version.

What's included:

Supervisor + researcher + drafter agents
ChromaDB long-term memory
LangSmith tracing wired in
FastAPI with /run, /stream, /health
Docker Compose cold start
Error handling + retry logic
Full README — setup under 10 minutes

→ LangGraph Multi-Agent Production Starter Kit on Gumroad

Summary

The gap between a tutorial agent and a production agent comes down to six things:

Typed state schema — no silent failures
Supervisor + worker pattern — clean routing, testable nodes
Long-term memory — ChromaDB for persistent context
Observability — LangSmith so you can actually debug
FastAPI wrapper — callable from anywhere
Docker Compose — runs the same everywhere

None of these are hard individually. The value is having them all wired together correctly from the start.

That's what the kit gives you.

Built by an AI Engineer who used this architecture in production for a legal AI platform serving law firms in Virginia. Questions? Drop them in the comments.

DEV Community