Ilya Rubtsov

Posted on Mar 15

Building Conversational AI Agents That Remember: LangGraph, Postgres Checkpointing, and the Future of Financial UX

#llm #agents #langgraph #ai

How interrupt/resume graph topology turns stateless LLMs into stateful financial advisors — and why this changes everything for CFO-facing AI products.

The Problem Nobody Talks About

Every demo of a financial AI agent looks the same: the user asks a question, the agent answers, end of story. One shot. One turn. The agent forgets you exist the moment the response is sent.

But real financial conversations don't work that way.

A CFO doesn't ask a single question and walk away. She starts with "What drove the variance in OPEX this quarter?", gets an answer, then drills down: "Break that out by department." Then pivots: "OK, run a scenario where we delay the European expansion by one quarter - what happens to our cash runway?" Each question builds on the last. Context accumulates. The agent needs to remember where the conversation has been, what analyses it has already run, and what the user cares about.

This is the gap between AI demos and AI products. And closing it requires a fundamentally different architecture.

I recently had the opportunity to build a conversational AI agent with multi-turn memory, interrupt/resume capabilities, and persistent state stored in Postgres. The patterns I discovered apply directly to financial AI, and I believe they represent a UX paradigm shift for how CFOs and finance teams will interact with AI systems.

This article walks through the architecture, the core ideas, and the implications for financial products.

Why Stateless Agents Fail in Finance

Most agent frameworks treat each invocation as independent. The user sends a message, the agent processes it, returns a response, and the entire computational graph - along with all intermediate state - evaporates.

For simple Q&A, this works. For financial workflows, it's a disaster. Consider what a real financial conversation looks like:

Turn 1: "What was our revenue growth rate last quarter?"
Turn 2: "How does that compare to our three closest competitors?"
Turn 3: "Pull the gross margin trends for the same period."
Turn 4: "Based on all of this, draft a board commentary paragraph."

By turn 4, the agent needs to remember the revenue figures from turn 1, the competitive data from turn 2, and the margin analysis from turn 3. Without persistent state, each turn starts from scratch. The user is forced to repeat context, re-upload documents, and re-explain what they're trying to accomplish.

This isn't just an inconvenience — it's a fundamental UX failure that prevents AI from replacing the iterative, conversational workflow that finance professionals actually use.

The Core Idea: Graphs That Pause and Resume

The solution relies on three primitives from LangGraph working together:

A looping graph topology where the agent responds, waits for human input, and loops back
interrupt() to suspend execution mid-graph and persist state
A Postgres checkpointer that saves the full graph state to a database at every suspension point

Here's the conversation lifecycle in plain terms:

User sends message
        ↓
   Agent processes message + full history
        ↓
   Agent responds, decides it needs more input
        ↓
   interrupt() is called
   Full state → serialized to Postgres
        ↓
   ... minutes, hours, days pass ...
        ↓
   User sends a follow-up message
        ↓
   Graph resumes from the Postgres checkpoint
   New message is injected into conversation history
        ↓
   Agent processes everything (old + new context)
        ↓
   (cycle repeats until conversation is resolved)

The critical insight: the graph doesn't terminate between turns. It suspends. The entire state — message history, turn counter, intermediate results, routing decisions — is serialized to Postgres. When the user comes back, the graph resumes exactly where it left off.

Let's build this step by step.

Step 1: Define What the Agent Remembers

The first decision is what to persist across turns. LangGraph uses a TypedDict as the state schema:

from typing import Annotated, TypedDict, Literal
from langchain_core.messages import BaseMessage
from langgraph.graph import add_messages


class ChatState(TypedDict):
    # Conversation history — new messages are appended automatically
    messages: Annotated[list[BaseMessage], add_messages]

    # Whether the agent needs more input from the user
    awaiting_input: bool

    # How many turns the conversation has gone through
    turn: int

The add_messages annotation is a LangGraph reducer — it tells the framework to append new messages to the existing list rather than overwriting it. This is how conversation history accumulates across turns without any manual bookkeeping.

awaiting_input is the flag the LLM sets when it decides it needs more information from the user. It drives the routing logic that determines whether to suspend the graph or end the conversation.

This is a minimal example. In a real financial agent, you'd add fields for accumulated analysis results, which specialized tools have been called, and any structured data the agent has gathered. The principle is the same: everything the agent needs to remember goes into the state, and the checkpointer handles persistence automatically.

Step 2: Build the Looping Graph

The graph creates a cycle between two nodes — the agent and a "human gate" that suspends execution:

from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from langgraph.constants import END
from langgraph.graph import StateGraph
from langgraph.types import interrupt
from langchain_core.messages import AIMessage, SystemMessage


async def agent_node(state: ChatState) -> dict:
    """
    The agent node. It receives the full conversation history,
    reasons over it, and decides whether to continue or wait
    for more input.
    """
    # In production, you'd use with_structured_output() here
    # to get a typed response with an explicit awaiting_input flag.
    # For simplicity, this example uses a plain LLM call.
    response = await llm.ainvoke(
        [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
    )

    # Determine if we need more input (simplified logic)
    needs_input = "?" in response.content  # naive heuristic for demo

    return {
        "messages": [AIMessage(content=response.content)],
        "awaiting_input": needs_input,
        "turn": state["turn"] + 1,
    }


async def human_gate(state: ChatState) -> dict:
    """
    Suspends the graph and waits for the user's next message.

    interrupt() does three things:
    1. Triggers the checkpointer to save full state to Postgres
    2. Halts execution of the graph
    3. Returns the user's new message when the graph resumes
    """
    user_message = interrupt("Waiting for user")
    return {
        "messages": [user_message],
        "awaiting_input": False,
    }


def route(state: ChatState) -> str:
    """Send to human gate if the agent wants more input, otherwise end."""
    return "human_gate" if state["awaiting_input"] else "end"


# Assemble the graph
builder = StateGraph(ChatState)
builder.add_node("agent", agent_node)
builder.add_node("human_gate", human_gate)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", route, {"human_gate": "human_gate", "end": END})
builder.add_edge("human_gate", "agent")

# Compile with checkpointer — this is what makes interrupt() work
checkpointer = AsyncPostgresSaver.from_conn_string("postgresql://...")
await checkpointer.setup()  # creates checkpoint tables (idempotent)
graph = builder.compile(checkpointer=checkpointer)

This creates the following topology:

entry → agent → [awaiting_input=True]  → human_gate → (back to agent)
              → [awaiting_input=False] → END

Without the checkpointer, interrupt() would raise an error — there's nowhere to persist the state. The checkpointer is not optional infrastructure; it's a structural requirement of the interrupt/resume pattern.

Step 3: Drive the Conversation

On the application side, you invoke the graph with a thread_id that identifies the conversation:

from langchain_core.messages import HumanMessage

thread_config = {
    "configurable": {"thread_id": "conversation-001"}
}

# First turn — start the conversation
result = await graph.ainvoke(
    {
        "messages": [HumanMessage(content="What was our OPEX last quarter?")],
        "awaiting_input": False,
        "turn": 0,
    },
    config=thread_config,
)

# ... time passes, user comes back ...

# Second turn — resume with the same thread_id
result = await graph.ainvoke(
    {"messages": [HumanMessage(content="Break that out by department")]},
    config=thread_config,
)

# Third turn — still the same thread, full history available
result = await graph.ainvoke(
    {"messages": [HumanMessage(content="Draft a board paragraph from this")]},
    config=thread_config,
)

Same thread_id = same conversation = resume from the last checkpoint. The graph loads the full state from Postgres before processing each new message. By turn 3, the agent has the OPEX figures from turn 1, the departmental breakdown from turn 2, and the full reasoning chain — all without the user repeating anything.

Step 4: Add Specialized Sub-Agents

The pattern becomes truly powerful when the conversational agent can delegate to specialized agents. Instead of one monolithic LLM doing everything, you have an orchestrator that routes to domain experts:

async def revenue_agent(state: ChatState) -> dict:
    """Specialized agent for revenue analysis."""
    analysis = await run_revenue_analysis(state["messages"])
    return {"messages": [AIMessage(content=analysis)]}


async def forecast_agent(state: ChatState) -> dict:
    """Specialized agent for scenario modeling."""
    forecast = await run_forecast_model(state["messages"])
    return {"messages": [AIMessage(content=forecast)]}


# Extended routing
def route(state: ChatState) -> str:
    if state.get("next_agent") == "revenue":
        return "revenue_agent"
    if state.get("next_agent") == "forecast":
        return "forecast_agent"
    if state["awaiting_input"]:
        return "human_gate"
    return "end"


# Sub-agents return to the orchestrator
builder.add_edge("revenue_agent", "agent")
builder.add_edge("forecast_agent", "agent")

Now the conversation flow becomes:

User: "Compare our margins to competitors"
  → agent decides: need margin data first
  → routes to revenue_agent
  → revenue_agent returns results into state
  → agent synthesizes, responds to user
  → interrupt() → state saved to Postgres

User: "Now model what happens if we cut R&D by 10%"
  → graph resumes from checkpoint
  → agent decides: need forecast model
  → routes to forecast_agent
  → forecast_agent runs scenario, returns results
  → agent combines revenue analysis + forecast
  → responds with comprehensive answer

The user experiences a natural conversation. Behind the scenes, multiple specialized agents are being orchestrated, their results accumulated in state, and the entire history persisted across turns. Each sub-agent can use different tools, different prompts, even different LLM models — the conversational agent just cares about results.

The Financial AI Implications

This architecture isn't just a technical pattern — it's a UX paradigm shift for financial AI products. Here's why it matters.

From Q&A Interfaces to Collaborative Conversations

Today's financial AI tools are essentially search engines with natural language wrappers. You ask a question, you get an answer. The interaction model is transactional.

The interrupt/resume pattern enables a fundamentally different model: conversations. A CFO can start an analysis, drill down into anomalies, pivot to scenario modeling, and build up to a complex deliverable — a board presentation, a variance analysis, a budget recommendation — over multiple turns. The AI maintains full context throughout.

This mirrors how CFOs actually work with their FP&A teams. You don't hand your analyst a single question and wait for a report. You have a conversation. You iterate. You refine. The conversation is the interface.

Asynchronous Financial Workflows

Not every financial question has an instant answer. Some analyses require running complex models, querying multiple data sources, or waiting for market data feeds. With the interrupt/resume pattern, the agent can say "I'm running the Monte Carlo simulation on your revenue scenarios — I'll notify you when results are ready" and checkpoint its state. When the computation finishes, the conversation resumes where it left off.

This opens the door to financial AI that handles genuinely complex workflows: multi-day budget review processes, iterative forecast refinement, or collaborative analysis sessions where the CFO and the AI work through a problem over the course of a week.

Audit Trail by Architecture

Every checkpoint is a serialized snapshot of the full conversation state at a specific point in time. This means you get a complete, immutable audit trail of every decision, every analysis, and every piece of data the agent considered — as a natural byproduct of the architecture. In financial services, where regulatory compliance demands traceability, this isn't a feature. It's table stakes.

You can query the checkpoint history for any conversation thread and reconstruct exactly what the agent knew, what it recommended, and why — at any point in the conversation. No additional logging infrastructure required.

Multi-Agent Financial Intelligence

The sub-agent pattern maps naturally to how finance teams are organized. You build specialized agents for different domains — revenue analysis, cost allocation, cash flow forecasting, competitive intelligence, regulatory compliance — and let the conversational agent route between them based on what the user is asking about.

Each agent maintains its own domain expertise while the orchestrator maintains conversational context. The result is an AI system that mirrors the organizational structure of a finance team: specialized expertise coordinated by a generalist who understands the big picture and remembers the full conversation.

Practical Lessons

Building this pattern for production taught me several things I wouldn't have learned from documentation alone.

The checkpointer is not optional. It's tempting to think of persistence as a nice-to-have that you'll add later. It's not. Without interrupt() + checkpointer, you simply cannot build multi-turn conversational agents. The entire architecture depends on the graph's ability to suspend and resume with full state intact. Start with the checkpointer from day one.

Use structured output for routing. Don't try to parse routing decisions out of free-text LLM output. Use with_structured_output() to get a typed response object with explicit fields like awaiting_input: bool and next_agent: str | None. Free-text parsing is fragile and leads to subtle bugs that only surface in production conversations.

Track conversation status explicitly. You need a way to distinguish "the agent is actively processing" from "the agent is waiting for the user to respond." A distinct PAUSED status in your task or conversation model gives you this, and enables operational features like timeout cleanup, stale conversation alerts, and accurate status indicators in the UI.

State accumulation is the killer feature. The ability to accumulate analysis results across turns means the agent's context grows richer with every interaction. By the end of a 10-turn conversation, the agent has a comprehensive picture of the analysis the user is building — the revenue data from turn 1, the competitive benchmarks from turn 4, the scenario models from turn 7. No stateless agent can achieve this.

Keep the graph topology simple. It's tempting to build elaborate conditional routing with dozens of edges. Resist this. A clean loop — agent → human gate → agent, with sub-agents branching off and returning to the orchestrator — handles the vast majority of conversational workflows. Complexity in the graph is complexity in debugging.

What This Means for the Future of Financial AI

The industry is converging on a model where AI financial assistants are not tools you query but collaborators you converse with. The technical infrastructure to support this — persistent state, interrupt/resume, multi-agent orchestration — is now mature enough for production.

I believe the next generation of CFO-facing AI products will be built on these patterns. Not single-shot Q&A systems, but stateful conversational agents that remember your context, orchestrate specialized analyses, and evolve their understanding of your business over time.

The companies that figure this out first will have a decisive advantage. Not because the underlying LLMs are better, but because the architecture around them — the state management, the orchestration, the persistence — creates an experience that feels like working with an exceptionally capable colleague rather than querying a database with natural language.

The technology is ready. The question is who builds the product.

I'm a CFO and AI Solutions Architect with 20+ years in fintech and banking. I build production agentic systems at the intersection of finance and AI. If you're working on similar problems — particularly conversational AI for enterprise finance — let's connect on LinkedIn.

DEV Community