If you've built a LangGraph agent from a tutorial, you already know the feeling.
It works perfectly in the notebook. Clean output. Agents routing correctly. Everything looks great.
Then you try to actually ship it.
Suddenly you're dealing with agents that forget context between sessions, no way to see what's happening inside the graph, no clean API to call it from your app, and a setup process that breaks on every fresh machine.
This isn't a LangGraph problem. LangGraph is excellent. It's a gap between tutorial code and production code — and almost nobody talks about what actually needs to change.
This article covers exactly that.
What's Missing From Every LangGraph Tutorial
Here's what most tutorials give you:
from langgraph.graph import StateGraph
graph = StateGraph(AgentState)
graph.add_node("agent", agent_fn)
graph.add_edge(START, "agent")
graph.add_edge("agent", END)
app = graph.compile()
result = app.invoke({"messages": [...]})
print(result)
It works. But it's missing everything you need to ship:
- No memory — every run starts from zero, no context from previous sessions
- No observability — you can't see what's happening inside the graph when it fails
- No API — you can't call this from a frontend or another service
- No deployment — you can't run this on a server without rewriting it
- No error handling — one bad LLM response breaks the whole chain
Let's fix all of that.
The Production Architecture
Here's what a production-ready multi-agent LangGraph system actually looks like:
┌─────────────────────────────────────────┐
│ FastAPI Layer │
│ /run /stream /health │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ LangGraph StateGraph │
│ │
│ ┌──────────┐ ┌─────────────────┐ │
│ │Supervisor│────▶│ Researcher Agent│ │
│ │ Agent │ └─────────────────┘ │
│ │ │ ┌─────────────────┐ │
│ │ │────▶│ Drafter Agent │ │
│ └──────────┘ └─────────────────┘ │
└─────────────────┬───────────────────────┘
│
┌────────────┴────────────┐
│ │
┌────▼──────┐ ┌───────▼──────┐
│ ChromaDB │ │ LangSmith │
│ Memory │ │ Tracing │
└───────────┘ └──────────────┘
Four layers. Each one solves a real production problem.
Layer 1: Typed State Schema
The first thing that breaks in production is untyped state. When agents pass data between each other with no schema, you get silent failures and impossible-to-debug errors.
Fix it with a typed schema from the start:
# core/state.py
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
task: str
context: str
output: str
error: str | None
retry_count: int
Every field is typed. Every agent knows exactly what it's getting and what it needs to return. No surprises.
Layer 2: Supervisor + Worker Pattern
The supervisor decides which worker handles the task. Workers focus on doing one thing well.
# core/supervisor.py
from langchain_core.messages import HumanMessage
from core.state import AgentState
def supervisor_node(state: AgentState) -> AgentState:
task = state["task"]
# Route based on task type
if "research" in task.lower() or "find" in task.lower():
return {**state, "next": "researcher"}
elif "write" in task.lower() or "draft" in task.lower():
return {**state, "next": "drafter"}
else:
return {**state, "next": "researcher"} # default
# core/graph.py
from langgraph.graph import StateGraph, START, END
from core.state import AgentState
from core.supervisor import supervisor_node
from core.workers.researcher import researcher_node
from core.workers.drafter import drafter_node
def build_graph():
graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("researcher", researcher_node)
graph.add_node("drafter", drafter_node)
graph.add_edge(START, "supervisor")
graph.add_conditional_edges(
"supervisor",
lambda state: state["next"],
{"researcher": "researcher", "drafter": "drafter"}
)
graph.add_edge("researcher", END)
graph.add_edge("drafter", END)
return graph.compile()
Clean. Testable. Each node is a pure function.
Layer 3: Long-Term Memory with ChromaDB
This is the biggest gap in tutorial code. Agents that forget everything between sessions are useless for real applications.
# memory/long_term.py
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
def get_memory_store():
embeddings = OpenAIEmbeddings()
return Chroma(
collection_name="agent_memory",
embedding_function=embeddings,
persist_directory="./chroma_db"
)
def save_to_memory(store, content: str, metadata: dict):
store.add_texts(
texts=[content],
metadatas=[metadata]
)
def retrieve_from_memory(store, query: str, k: int = 3):
results = store.similarity_search(query, k=k)
return [doc.page_content for doc in results]
Now your agents remember what happened in previous sessions. Context persists. Users don't have to repeat themselves.
Layer 4: LangSmith Observability
You cannot debug what you cannot see. LangSmith gives you full visibility into every node execution, every LLM call, every token.
# observability/langsmith_setup.py
import os
def setup_langsmith():
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = os.getenv(
"LANGSMITH_PROJECT",
"langgraph-production-kit"
)
# LANGCHAIN_API_KEY set in .env
Call this once at startup. Every graph execution is automatically traced. When something breaks in production, you open LangSmith and see exactly which node failed and why.
Layer 5: FastAPI Wrapper
Your agent needs to be callable from the outside world.
# api/main.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from core.graph import build_graph
from observability.langsmith_setup import setup_langsmith
setup_langsmith()
app = FastAPI()
graph = build_graph()
class RunRequest(BaseModel):
task: str
context: str = ""
@app.post("/run")
async def run_agent(request: RunRequest):
result = graph.invoke({
"task": request.task,
"context": request.context,
"messages": [],
"output": "",
"error": None,
"retry_count": 0
})
return {"output": result["output"], "error": result["error"]}
@app.post("/stream")
async def stream_agent(request: RunRequest):
async def generate():
async for chunk in graph.astream({
"task": request.task,
"context": request.context,
"messages": [],
"output": "",
"error": None,
"retry_count": 0
}):
yield f"data: {str(chunk)}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
@app.get("/health")
def health():
return {"status": "ok"}
Now any frontend, mobile app, or service can call your agent over HTTP.
Layer 6: Docker Compose
The setup that works on your machine needs to work everywhere.
# docker-compose.yml
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
- LANGCHAIN_TRACING_V2=true
volumes:
- ./chroma_db:/app/chroma_db
command: uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
chromadb:
image: chromadb/chroma:latest
ports:
- "8001:8000"
volumes:
- ./chroma_data:/chroma/chroma
One command from any machine:
docker-compose up
Your agent is running at localhost:8000. No dependency issues. No "works on my machine."
The Free Lite Version
I've packaged the core architecture above — supervisor + worker agents, typed state, FastAPI — into a free starter repo.
No Docker. No ChromaDB. No LangSmith. Just the clean foundation to understand the pattern.
→ Get the free lite version on GitHub
The Full Production Kit
If you're building something real and need everything — memory, observability, Docker, streaming, error handling — I packaged the complete production version.
What's included:
- Supervisor + researcher + drafter agents
- ChromaDB long-term memory
- LangSmith tracing wired in
- FastAPI with
/run,/stream,/health - Docker Compose cold start
- Error handling + retry logic
- Full README — setup under 10 minutes
→ LangGraph Multi-Agent Production Starter Kit on Gumroad
Summary
The gap between a tutorial agent and a production agent comes down to six things:
- Typed state schema — no silent failures
- Supervisor + worker pattern — clean routing, testable nodes
- Long-term memory — ChromaDB for persistent context
- Observability — LangSmith so you can actually debug
- FastAPI wrapper — callable from anywhere
- Docker Compose — runs the same everywhere
None of these are hard individually. The value is having them all wired together correctly from the start.
That's what the kit gives you.
Built by an AI Engineer who used this architecture in production for a legal AI platform serving law firms in Virginia. Questions? Drop them in the comments.
Top comments (0)