LangGraph State Machine Patterns — How to Stop AI Agents from Going Off-Script

#webdev #programming #python #ai

LangGraph State Machine Patterns — How to Stop AI Agents from Going Off-Script

The "Agent Does Something Unexpected" Problem

LLM-powered agents that run outside their intended scope — infinite loops, unauthorized operations, non-interruptible processes. This happens when agent state management is implicit rather than defined.

LangGraph solves this with a graph-based state machine that makes state transitions explicit. Built on LangChain, it lets you define exactly which states exist and how the agent moves between them.

Core Concepts

Concept	Description
State	Data structure the agent holds (TypedDict)
Node	Processing unit that transforms state (a function)
Edge	Transition rules between nodes (supports conditional branching)
Checkpoint	State snapshot for resume and rollback

Pattern 1: Linear Chain

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list[str]
    result: str

def analyze(state: AgentState) -> AgentState:
    result = llm.invoke(state["messages"])
    return {"result": result.content}

def validate(state: AgentState) -> AgentState:
    if len(state["result"]) < 10:
        return {"result": "ERROR: result too short"}
    return state

graph = StateGraph(AgentState)
graph.add_node("analyze", analyze)
graph.add_node("validate", validate)
graph.add_edge("analyze", "validate")
graph.add_edge("validate", END)
graph.set_entry_point("analyze")

app = graph.compile()
result = app.invoke({"messages": ["Analyze this"], "result": ""})

Pattern 2: Conditional Router

Branch based on LLM output:

class RoutingState(TypedDict):
    query: str
    route: str
    answer: str

def router(state: RoutingState) -> str:
    if "SQL" in state["query"] or "database" in state["query"]:
        return "sql_node"
    elif "code" in state["query"]:
        return "code_node"
    else:
        return "general_node"

graph = StateGraph(RoutingState)
graph.add_node("router_node", classify_query)
graph.add_node("sql_node", handle_sql)
graph.add_node("code_node", handle_code)
graph.add_node("general_node", handle_general)

graph.add_conditional_edges(
    "router_node",
    router,
    {
        "sql_node": "sql_node",
        "code_node": "code_node",
        "general_node": "general_node"
    }
)

The agent can only go to nodes you've explicitly defined. Out-of-scope destinations are structurally impossible.

Pattern 3: Loop Control (Max Attempts)

Prevent infinite loops with a counter in state:

class RetryState(TypedDict):
    task: str
    result: str
    attempts: int
    max_attempts: int

def should_retry(state: RetryState) -> str:
    if state["attempts"] >= state["max_attempts"]:
        return "give_up"
    if "ERROR" in state["result"]:
        return "retry"
    return "done"

graph = StateGraph(RetryState)
graph.add_node("execute", execute_task)
graph.add_node("give_up", handle_failure)

graph.add_conditional_edges(
    "execute",
    should_retry,
    {
        "retry": "execute",  # Loop back
        "done": END,
        "give_up": "give_up"
    }
)

max_attempts in State means the caller controls the limit — no code changes needed to adjust retry behavior.

Pattern 4: Checkpointing for Long Tasks

Pause and resume long-running agent tasks:

from langgraph.checkpoint.sqlite import SqliteSaver

checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
app = graph.compile(checkpointer=checkpointer)

# Thread ID scopes state to a specific user + task
config = {"configurable": {"thread_id": "user-123-task-456"}}
result = app.invoke(initial_state, config=config)

# Resume from where it left off
result2 = app.invoke(None, config=config)  # None resumes from checkpoint

Combined with Supabase Edge Functions: use a request ID as the thread ID and state automatically isolates per-user, per-task without extra logic.

Real Application: CS Auto-Reply

class CSState(TypedDict):
    ticket: str
    category: str
    response: str
    escalated: bool

def classify(state): ...  # FAQ / bug / feature request
def auto_reply(state): ...  # Auto-reply if FAQ
def escalate(state): ...   # Route bugs and feature requests to humans

def should_escalate(state: CSState) -> str:
    if state["category"] in ["bug", "feature"]:
        return "escalate"
    return "auto_reply"

The state machine structure provides a hard guarantee: bugs and feature requests always reach a human. The agent cannot accidentally auto-close a bug report.

Caveats

Python only: No Deno/TypeScript support — better suited for Python services than Supabase Edge Functions
LangChain dependency: Subject to LangChain version changes
Async: Supported, but async checkpointing uses a different class
Visualization: graph.get_graph().draw_mermaid() outputs a flow diagram

Summary

LangGraph applies state machine discipline to AI agents. Instead of "let the agent figure it out," you define exactly which states exist and how transitions happen.

Three patterns cover most production needs: