DEV Community: Ilya Rubtsov

Building Conversational AI Agents That Remember: LangGraph, Postgres Checkpointing, and the Future of Financial UX

Ilya Rubtsov — Sun, 15 Mar 2026 18:44:49 +0000

How interrupt/resume graph topology turns stateless LLMs into stateful financial advisors — and why this changes everything for CFO-facing AI products.

The Problem Nobody Talks About

Every demo of a financial AI agent looks the same: the user asks a question, the agent answers, end of story. One shot. One turn. The agent forgets you exist the moment the response is sent.

But real financial conversations don't work that way.

A CFO doesn't ask a single question and walk away. She starts with "What drove the variance in OPEX this quarter?", gets an answer, then drills down: "Break that out by department." Then pivots: "OK, run a scenario where we delay the European expansion by one quarter - what happens to our cash runway?" Each question builds on the last. Context accumulates. The agent needs to remember where the conversation has been, what analyses it has already run, and what the user cares about.

This is the gap between AI demos and AI products. And closing it requires a fundamentally different architecture.

I recently had the opportunity to build a conversational AI agent with multi-turn memory, interrupt/resume capabilities, and persistent state stored in Postgres. The patterns I discovered apply directly to financial AI, and I believe they represent a UX paradigm shift for how CFOs and finance teams will interact with AI systems.

This article walks through the architecture, the core ideas, and the implications for financial products.

Why Stateless Agents Fail in Finance

Most agent frameworks treat each invocation as independent. The user sends a message, the agent processes it, returns a response, and the entire computational graph - along with all intermediate state - evaporates.

For simple Q&A, this works. For financial workflows, it's a disaster. Consider what a real financial conversation looks like:

Turn 1: "What was our revenue growth rate last quarter?"
Turn 2: "How does that compare to our three closest competitors?"
Turn 3: "Pull the gross margin trends for the same period."
Turn 4: "Based on all of this, draft a board commentary paragraph."

By turn 4, the agent needs to remember the revenue figures from turn 1, the competitive data from turn 2, and the margin analysis from turn 3. Without persistent state, each turn starts from scratch. The user is forced to repeat context, re-upload documents, and re-explain what they're trying to accomplish.

This isn't just an inconvenience — it's a fundamental UX failure that prevents AI from replacing the iterative, conversational workflow that finance professionals actually use.

The Core Idea: Graphs That Pause and Resume

The solution relies on three primitives from LangGraph working together:

A looping graph topology where the agent responds, waits for human input, and loops back
interrupt() to suspend execution mid-graph and persist state
A Postgres checkpointer that saves the full graph state to a database at every suspension point

Here's the conversation lifecycle in plain terms:

User sends message
        ↓
   Agent processes message + full history
        ↓
   Agent responds, decides it needs more input
        ↓
   interrupt() is called
   Full state → serialized to Postgres
        ↓
   ... minutes, hours, days pass ...
        ↓
   User sends a follow-up message
        ↓
   Graph resumes from the Postgres checkpoint
   New message is injected into conversation history
        ↓
   Agent processes everything (old + new context)
        ↓
   (cycle repeats until conversation is resolved)

The critical insight: the graph doesn't terminate between turns. It suspends. The entire state — message history, turn counter, intermediate results, routing decisions — is serialized to Postgres. When the user comes back, the graph resumes exactly where it left off.

Let's build this step by step.

Step 1: Define What the Agent Remembers

The first decision is what to persist across turns. LangGraph uses a TypedDict as the state schema:

from typing import Annotated, TypedDict, Literal
from langchain_core.messages import BaseMessage
from langgraph.graph import add_messages


class ChatState(TypedDict):
    # Conversation history — new messages are appended automatically
    messages: Annotated[list[BaseMessage], add_messages]

    # Whether the agent needs more input from the user
    awaiting_input: bool

    # How many turns the conversation has gone through
    turn: int

The add_messages annotation is a LangGraph reducer — it tells the framework to append new messages to the existing list rather than overwriting it. This is how conversation history accumulates across turns without any manual bookkeeping.

awaiting_input is the flag the LLM sets when it decides it needs more information from the user. It drives the routing logic that determines whether to suspend the graph or end the conversation.

This is a minimal example. In a real financial agent, you'd add fields for accumulated analysis results, which specialized tools have been called, and any structured data the agent has gathered. The principle is the same: everything the agent needs to remember goes into the state, and the checkpointer handles persistence automatically.

Step 2: Build the Looping Graph

The graph creates a cycle between two nodes — the agent and a "human gate" that suspends execution:

from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from langgraph.constants import END
from langgraph.graph import StateGraph
from langgraph.types import interrupt
from langchain_core.messages import AIMessage, SystemMessage


async def agent_node(state: ChatState) -> dict:
    """
    The agent node. It receives the full conversation history,
    reasons over it, and decides whether to continue or wait
    for more input.
    """
    # In production, you'd use with_structured_output() here
    # to get a typed response with an explicit awaiting_input flag.
    # For simplicity, this example uses a plain LLM call.
    response = await llm.ainvoke(
        [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
    )

    # Determine if we need more input (simplified logic)
    needs_input = "?" in response.content  # naive heuristic for demo

    return {
        "messages": [AIMessage(content=response.content)],
        "awaiting_input": needs_input,
        "turn": state["turn"] + 1,
    }


async def human_gate(state: ChatState) -> dict:
    """
    Suspends the graph and waits for the user's next message.

    interrupt() does three things:
    1. Triggers the checkpointer to save full state to Postgres
    2. Halts execution of the graph
    3. Returns the user's new message when the graph resumes
    """
    user_message = interrupt("Waiting for user")
    return {
        "messages": [user_message],
        "awaiting_input": False,
    }


def route(state: ChatState) -> str:
    """Send to human gate if the agent wants more input, otherwise end."""
    return "human_gate" if state["awaiting_input"] else "end"


# Assemble the graph
builder = StateGraph(ChatState)
builder.add_node("agent", agent_node)
builder.add_node("human_gate", human_gate)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", route, {"human_gate": "human_gate", "end": END})
builder.add_edge("human_gate", "agent")

# Compile with checkpointer — this is what makes interrupt() work
checkpointer = AsyncPostgresSaver.from_conn_string("postgresql://...")
await checkpointer.setup()  # creates checkpoint tables (idempotent)
graph = builder.compile(checkpointer=checkpointer)

This creates the following topology:

entry → agent → [awaiting_input=True]  → human_gate → (back to agent)
              → [awaiting_input=False] → END

Without the checkpointer, interrupt() would raise an error — there's nowhere to persist the state. The checkpointer is not optional infrastructure; it's a structural requirement of the interrupt/resume pattern.

Step 3: Drive the Conversation

On the application side, you invoke the graph with a thread_id that identifies the conversation:

from langchain_core.messages import HumanMessage

thread_config = {
    "configurable": {"thread_id": "conversation-001"}
}

# First turn — start the conversation
result = await graph.ainvoke(
    {
        "messages": [HumanMessage(content="What was our OPEX last quarter?")],
        "awaiting_input": False,
        "turn": 0,
    },
    config=thread_config,
)

# ... time passes, user comes back ...

# Second turn — resume with the same thread_id
result = await graph.ainvoke(
    {"messages": [HumanMessage(content="Break that out by department")]},
    config=thread_config,
)

# Third turn — still the same thread, full history available
result = await graph.ainvoke(
    {"messages": [HumanMessage(content="Draft a board paragraph from this")]},
    config=thread_config,
)

Same thread_id = same conversation = resume from the last checkpoint. The graph loads the full state from Postgres before processing each new message. By turn 3, the agent has the OPEX figures from turn 1, the departmental breakdown from turn 2, and the full reasoning chain — all without the user repeating anything.

Step 4: Add Specialized Sub-Agents

The pattern becomes truly powerful when the conversational agent can delegate to specialized agents. Instead of one monolithic LLM doing everything, you have an orchestrator that routes to domain experts:

async def revenue_agent(state: ChatState) -> dict:
    """Specialized agent for revenue analysis."""
    analysis = await run_revenue_analysis(state["messages"])
    return {"messages": [AIMessage(content=analysis)]}


async def forecast_agent(state: ChatState) -> dict:
    """Specialized agent for scenario modeling."""
    forecast = await run_forecast_model(state["messages"])
    return {"messages": [AIMessage(content=forecast)]}


# Extended routing
def route(state: ChatState) -> str:
    if state.get("next_agent") == "revenue":
        return "revenue_agent"
    if state.get("next_agent") == "forecast":
        return "forecast_agent"
    if state["awaiting_input"]:
        return "human_gate"
    return "end"


# Sub-agents return to the orchestrator
builder.add_edge("revenue_agent", "agent")
builder.add_edge("forecast_agent", "agent")

Now the conversation flow becomes:

User: "Compare our margins to competitors"
  → agent decides: need margin data first
  → routes to revenue_agent
  → revenue_agent returns results into state
  → agent synthesizes, responds to user
  → interrupt() → state saved to Postgres

User: "Now model what happens if we cut R&D by 10%"
  → graph resumes from checkpoint
  → agent decides: need forecast model
  → routes to forecast_agent
  → forecast_agent runs scenario, returns results
  → agent combines revenue analysis + forecast
  → responds with comprehensive answer

The user experiences a natural conversation. Behind the scenes, multiple specialized agents are being orchestrated, their results accumulated in state, and the entire history persisted across turns. Each sub-agent can use different tools, different prompts, even different LLM models — the conversational agent just cares about results.

The Financial AI Implications

This architecture isn't just a technical pattern — it's a UX paradigm shift for financial AI products. Here's why it matters.

From Q&A Interfaces to Collaborative Conversations

Today's financial AI tools are essentially search engines with natural language wrappers. You ask a question, you get an answer. The interaction model is transactional.

The interrupt/resume pattern enables a fundamentally different model: conversations. A CFO can start an analysis, drill down into anomalies, pivot to scenario modeling, and build up to a complex deliverable — a board presentation, a variance analysis, a budget recommendation — over multiple turns. The AI maintains full context throughout.

This mirrors how CFOs actually work with their FP&A teams. You don't hand your analyst a single question and wait for a report. You have a conversation. You iterate. You refine. The conversation is the interface.

Asynchronous Financial Workflows

Not every financial question has an instant answer. Some analyses require running complex models, querying multiple data sources, or waiting for market data feeds. With the interrupt/resume pattern, the agent can say "I'm running the Monte Carlo simulation on your revenue scenarios — I'll notify you when results are ready" and checkpoint its state. When the computation finishes, the conversation resumes where it left off.

This opens the door to financial AI that handles genuinely complex workflows: multi-day budget review processes, iterative forecast refinement, or collaborative analysis sessions where the CFO and the AI work through a problem over the course of a week.

Audit Trail by Architecture

Every checkpoint is a serialized snapshot of the full conversation state at a specific point in time. This means you get a complete, immutable audit trail of every decision, every analysis, and every piece of data the agent considered — as a natural byproduct of the architecture. In financial services, where regulatory compliance demands traceability, this isn't a feature. It's table stakes.

You can query the checkpoint history for any conversation thread and reconstruct exactly what the agent knew, what it recommended, and why — at any point in the conversation. No additional logging infrastructure required.

Multi-Agent Financial Intelligence

The sub-agent pattern maps naturally to how finance teams are organized. You build specialized agents for different domains — revenue analysis, cost allocation, cash flow forecasting, competitive intelligence, regulatory compliance — and let the conversational agent route between them based on what the user is asking about.

Each agent maintains its own domain expertise while the orchestrator maintains conversational context. The result is an AI system that mirrors the organizational structure of a finance team: specialized expertise coordinated by a generalist who understands the big picture and remembers the full conversation.

Practical Lessons

Building this pattern for production taught me several things I wouldn't have learned from documentation alone.

The checkpointer is not optional. It's tempting to think of persistence as a nice-to-have that you'll add later. It's not. Without interrupt() + checkpointer, you simply cannot build multi-turn conversational agents. The entire architecture depends on the graph's ability to suspend and resume with full state intact. Start with the checkpointer from day one.

Use structured output for routing. Don't try to parse routing decisions out of free-text LLM output. Use with_structured_output() to get a typed response object with explicit fields like awaiting_input: bool and next_agent: str | None. Free-text parsing is fragile and leads to subtle bugs that only surface in production conversations.

Track conversation status explicitly. You need a way to distinguish "the agent is actively processing" from "the agent is waiting for the user to respond." A distinct PAUSED status in your task or conversation model gives you this, and enables operational features like timeout cleanup, stale conversation alerts, and accurate status indicators in the UI.

State accumulation is the killer feature. The ability to accumulate analysis results across turns means the agent's context grows richer with every interaction. By the end of a 10-turn conversation, the agent has a comprehensive picture of the analysis the user is building — the revenue data from turn 1, the competitive benchmarks from turn 4, the scenario models from turn 7. No stateless agent can achieve this.

Keep the graph topology simple. It's tempting to build elaborate conditional routing with dozens of edges. Resist this. A clean loop — agent → human gate → agent, with sub-agents branching off and returning to the orchestrator — handles the vast majority of conversational workflows. Complexity in the graph is complexity in debugging.

What This Means for the Future of Financial AI

The industry is converging on a model where AI financial assistants are not tools you query but collaborators you converse with. The technical infrastructure to support this — persistent state, interrupt/resume, multi-agent orchestration — is now mature enough for production.

I believe the next generation of CFO-facing AI products will be built on these patterns. Not single-shot Q&A systems, but stateful conversational agents that remember your context, orchestrate specialized analyses, and evolve their understanding of your business over time.

The companies that figure this out first will have a decisive advantage. Not because the underlying LLMs are better, but because the architecture around them — the state management, the orchestration, the persistence — creates an experience that feels like working with an exceptionally capable colleague rather than querying a database with natural language.

The technology is ready. The question is who builds the product.

I'm a CFO and AI Solutions Architect with 20+ years in fintech and banking. I build production agentic systems at the intersection of finance and AI. If you're working on similar problems — particularly conversational AI for enterprise finance — let's connect on LinkedIn.

Three LangGraph Agent Patterns That Replaced Hundreds of Lines of Glue Code

Ilya Rubtsov — Fri, 13 Feb 2026 15:33:25 +0000

What if your AI system's biggest problem isn't the AI?

I've watched teams spend months fine-tuning prompts, swapping models, and chasing benchmark improvements — only to realize their actual bottleneck was architecture. The model was fine. The way they wired it together was the problem.

After building production multi-agent systems with LangGraph and LangChain across financial analysis, document processing, and operational automation, I've converged on three reusable agent patterns that handle the vast majority of agentic workflows. They're not novel research. They won't trend on AI Twitter. But they quietly eliminated entire categories of bugs, cut development time on new pipelines by half, and — most importantly — made the systems predictable enough that non-AI engineers on the team could reason about them.

This article walks through each pattern with simplified code samples and practical examples. Whether you're a CTO evaluating agentic architectures or an engineer knee-deep in LangChain, you should walk away with something you can use on Monday morning.

The Problem with "Just Use an Agent"

Most LangGraph tutorials show you a single agent doing everything: reasoning, tool-calling, routing, and output formatting. That works for demos. In production, it falls apart.

Why? Because a single mega-agent conflates three fundamentally different cognitive tasks:

Analysis — understanding data using tools
Decision-making — choosing what happens next in a workflow
Structured extraction — converting unstructured reasoning into validated output

Mixing these in one prompt leads to brittle behavior: the model tries to analyze and route and format simultaneously, and gets confused. Splitting them into specialized agents with clear contracts between them made everything more reliable.

Here are the three patterns I now use as building blocks.

Pattern 1: The Analyzer Agent

What it does: Takes a prompt and a set of tools, reasons over data, and produces a free-text summary.

When to use it: Any time you need an LLM to investigate something — read financial filings, scan customer support tickets, evaluate vendor contracts — and produce a human-readable analysis.

The key insight is that the Analyzer is generic. The same class handles wildly different tasks depending on which prompt and tools you inject. Need to assess a company's quarterly earnings? Pass it SEC filing tools and a financial analysis prompt. Need to review insurance claims? Same class, different prompt and tools.

The Architecture

from langchain_core.language_models import BaseChatModel
from langchain_core.tools import BaseTool
from typing import List


class AnalyzerAgent:
    """
    A generic analysis agent. Give it a prompt and tools,
    and it will reason over data to produce a text summary.
    """

    def __init__(
        self,
        llm: BaseChatModel,
        tools: List[BaseTool],
        prompt: str,
    ):
        self._llm = llm
        self._tools = tools
        self._prompt = prompt

    async def analyze(self, context: str) -> str:
        """Run the analysis loop: LLM reasons, calls tools, summarizes."""

        # Bind tools to the LLM so it can call them during reasoning
        llm_with_tools = self._llm.bind_tools(self._tools)

        messages = [
            {"role": "system", "content": self._prompt},
            {"role": "user", "content": context},
        ]

        # Agentic loop: let the LLM call tools until it's done
        while True:
            response = await llm_with_tools.ainvoke(messages)
            messages.append(response)

            if not response.tool_calls:
                # No more tool calls — the LLM is done reasoning
                return response.content

            # Execute each tool call and feed results back
            for tool_call in response.tool_calls:
                tool = next(
                    t for t in self._tools if t.name == tool_call["name"]
                )
                result = await tool.ainvoke(tool_call["args"])
                messages.append({
                    "role": "tool",
                    "content": str(result),
                    "tool_call_id": tool_call["id"],
                })

Example: Financial Earnings Analysis

EARNINGS_ANALYSIS_PROMPT = """
You are a financial analysis agent specializing in public company
earnings. Your job is to examine quarterly filings and earnings
call transcripts to produce an investment-relevant summary.

## PROCESS
1. Use the fetch_filing tool to retrieve the latest 10-Q data.
2. Use the get_transcript tool to pull the most recent earnings call.
3. Cross-reference reported figures against analyst consensus.

## OUTPUT
Return a clear summary covering:
- Revenue and EPS vs. consensus estimates
- Management guidance changes (raised, maintained, or lowered)
- Key risk factors mentioned in the filing or call
- Notable shifts in segment performance
"""

The same class, different configuration — here's a customer support use case:

TICKET_TRIAGE_PROMPT = """
You are a support ticket analysis agent. Examine the incoming
ticket and any related customer history to assess urgency and topic.

## PROCESS
1. Use the get_customer_history tool to pull past interactions.
2. Use the check_sla tool to determine the customer's service tier.
3. Analyze the ticket content for severity indicators.

## OUTPUT
Return a summary covering:
- Issue category (billing, technical, account access, feature request)
- Severity assessment (critical, high, medium, low)
- Relevant customer context (tenure, tier, recent issues)
- Recommended routing
"""

Why This Pattern Works

The beauty is in the separation of concerns. The AnalyzerAgent class knows nothing about finance, support tickets, or any specific domain. All domain knowledge lives in the prompt and tool selection. This means:

Reusability: One class, unlimited use cases
Testability: Swap the LLM for a mock, test tool interactions independently
Composability: Chain analyzers together in a LangGraph workflow, each one adding context for the next

In production, I run analyzers for financial document review, compliance checking, data quality assessment, and more — all using the same class with different configurations.

Pattern 2: The Router Agent

What it does: Reads upstream context and routes the workflow to the correct next step by calling a special routing tool.

When to use it: Any time your workflow needs to branch — different processing paths based on analysis results, document type classification, or risk level assessment.

Most routing in LangGraph tutorials is done with conditional edges and deterministic functions. That's fine for simple cases. But when the routing decision requires understanding unstructured text — e.g., "based on the financial analysis, does this company need a full due diligence review or a standard summary?" — you need an LLM to make the call.

The Routing Tool

The trick is a dedicated RoutingTool that stores the LLM's decision as state:

from typing import Optional


class RoutingTool:
    """
    A tool that captures the LLM's routing decision.
    The selected route is stored and can be read by the
    LangGraph workflow to determine the next node.
    """

    def __init__(self):
        self._route: Optional[str] = None

    @property
    def route(self) -> Optional[str]:
        return self._route

    async def select_route(
        self,
        route: Optional[str] = None,
        error_details: Optional[str] = None,
    ) -> str:
        """
        Call this tool to select which path the workflow should take.

        Args:
            route: The chosen route (e.g., "full_review", "standard").
            error_details: If routing fails, explain why.
        """
        if error_details:
            self._route = None
            return f"Routing failed: {error_details}"

        self._route = route
        return f"Route selected: {route}"

The Router Agent

class RouterAgent:
    """
    Reads previous context and selects a workflow route
    by calling the RoutingTool.
    """

    def __init__(
        self,
        llm: BaseChatModel,
        routing_tool: RoutingTool,
        prompt: str,
    ):
        self._llm = llm
        self._routing_tool = routing_tool
        self._prompt = prompt

    @property
    def route(self) -> Optional[str]:
        """The route selected by the LLM after execution."""
        return self._routing_tool.route

    async def decide(self, context: str) -> str:
        """Run the router: LLM reads context and calls select_route."""

        llm_with_tools = self._llm.bind_tools(
            [self._routing_tool.select_route]
        )

        messages = [
            {"role": "system", "content": self._prompt},
            {"role": "user", "content": context},
        ]

        response = await llm_with_tools.ainvoke(messages)

        # Execute the tool call to store the route
        if response.tool_calls:
            tool_call = response.tool_calls[0]
            await self._routing_tool.select_route(**tool_call["args"])

        return self._routing_tool.route

Example: Financial Document Routing

DOCUMENT_ROUTER_PROMPT = """
You are a routing agent for a financial document processing pipeline.
Read the previous agent's analysis and call select_route with the
appropriate processing path.

## ROUTES
- route="earnings_deep_dive" → Revenue miss >5% OR guidance lowered
- route="standard_summary" → Results in line with expectations
- route="risk_alert" → Material risk factors flagged (litigation,
  restatement, going concern, covenant breach)
- route=None → Cannot determine (provide error_details)

## INSTRUCTIONS
1. Read the upstream analysis in the conversation history
2. Evaluate against the route criteria above
3. Call select_route EXACTLY ONCE

## EXAMPLES
Analysis shows revenue missed consensus by 12%, guidance cut
→ select_route(route="earnings_deep_dive")

Analysis shows EPS beat by $0.02, guidance maintained
→ select_route(route="standard_summary")

Analysis flags ongoing SEC investigation and auditor concerns
→ select_route(route="risk_alert")
"""

Here's another example — routing in an HR automation pipeline:

CANDIDATE_ROUTER_PROMPT = """
You are a routing agent for a recruitment pipeline. Read the
candidate screening summary and route to the appropriate
next step.

## ROUTES
- route="technical_interview" → Strong technical match, meets requirements
- route="culture_screen" → Technical skills borderline, strong soft signals
- route="reject_with_feedback" → Clear mismatch on must-have criteria
- route=None → Insufficient data to decide (provide error_details)

Call select_route EXACTLY ONCE based on the screening analysis.
"""

Wiring It Into LangGraph

In your LangGraph workflow, the router agent's decision directly controls the graph's conditional edge:

from langgraph.graph import StateGraph

def route_decision(state):
    """LangGraph conditional edge function."""
    route = state["selected_route"]
    routing_map = {
        "earnings_deep_dive": "deep_analysis",
        "standard_summary": "quick_summary",
        "risk_alert": "risk_pipeline",
    }
    return routing_map.get(route, "handle_error")

# In the graph definition:
graph.add_conditional_edges(
    "router",
    route_decision,
    {
        "deep_analysis": "detailed_review_agents",
        "quick_summary": "summary_generator",
        "risk_pipeline": "risk_assessment_agents",
        "handle_error": "error_handler",
    },
)

Why Not Just Use a Classifier?

You could classify with a simple function or even keyword matching. But LLM-based routing shines when:

The decision requires interpreting nuanced, unstructured context — a 2,000-word earnings analysis isn't something you regex through
Routes aren't purely deterministic — the same data could warrant different paths depending on subtle signals like management tone on the earnings call
You want the routing logic to be expressed in natural language (the prompt), not code — product managers can read and adjust routing criteria without touching Python
The set of possible routes may change and you want to update a prompt, not refactor a decision tree

The RoutingTool pattern also gives you observability: you can log every routing decision, inspect the LLM's reasoning, and debug misroutes by looking at the conversation history.

Pattern 3: The Report Compiler

What it does: Takes all upstream conversation context and extracts structured data into a validated Pydantic schema — no tools, no reasoning loops, just extraction.

When to use it: At the end of any multi-agent pipeline where you need clean, typed, validated output. Think: generating a JSON report for a dashboard, populating a database record, or returning structured results to an API caller.

This is the pattern I'm most proud of because it's the most boring. And boring is exactly what you want at the output stage.

The Core Idea

LangChain's with_structured_output() forces the LLM to return data matching a Pydantic schema. But the prompt engineering matters enormously. Feed it a vague prompt and you'll get hallucinated field values. Feed it a dynamically generated prompt that mirrors the schema exactly, and extraction becomes remarkably reliable.

The Report Agent

from pydantic import BaseModel
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import BaseMessage
from typing import List, Type


class ReportCompiler:
    """
    Extracts structured data from conversation history
    into a validated Pydantic schema.

    No tools, no reasoning loops — pure extraction.
    """

    def __init__(
        self,
        llm: BaseChatModel,
        schema: Type[BaseModel],
        prompt: str,
    ):
        self._llm = llm
        self._schema = schema
        self._prompt = prompt

    async def compile(self, messages: List[BaseMessage]) -> BaseModel:
        """
        Takes conversation history and returns a populated schema instance.
        """
        structured_llm = self._llm.with_structured_output(self._schema)

        all_messages = [
            {"role": "system", "content": self._prompt},
        ] + messages

        return await structured_llm.ainvoke(all_messages)

The Dynamic Prompt Generator

Here's where the magic lives. Instead of manually writing extraction prompts for every schema, I generate them automatically from the Pydantic model:

import json
from pydantic import BaseModel
from typing import get_origin, get_args, Literal, Type, List


def build_extraction_prompt(schema: Type[BaseModel]) -> str:
    """
    Dynamically generates an extraction prompt from a Pydantic schema.
    Handles nested models, Literal constraints, Optional fields, and lists.
    """

    field_descriptions = []
    example_output = {}

    for idx, (name, field) in enumerate(schema.model_fields.items(), 1):
        annotation = field.annotation
        origin = get_origin(annotation)

        # Resolve Optional[X] → X
        is_optional = False
        if origin is type(None) or (origin and type(None) in get_args(annotation)):
            is_optional = True
            annotation = next(
                a for a in get_args(annotation) if a is not type(None)
            )
            origin = get_origin(annotation)

        # Build type string and example value
        if origin is Literal:
            allowed = get_args(annotation)
            type_str = f"one of: {', '.join(repr(v) for v in allowed)}"
            example_output[name] = allowed[0]

        elif origin is list:
            inner = get_args(annotation)[0]
            type_str = f"List[{inner.__name__}]"
            example_output[name] = []

        elif annotation is str:
            type_str = "string"
            example_output[name] = "extracted_value"

        elif annotation is int:
            type_str = "integer"
            example_output[name] = 0

        elif annotation is bool:
            type_str = "boolean"
            example_output[name] = False

        else:
            type_str = str(annotation)
            example_output[name] = None

        opt = " (optional)" if is_optional else ""
        desc = field.description or "No description"
        field_descriptions.append(f"{idx}. {name} ({type_str}{opt}) — {desc}")

    return f"""You are a report compilation agent. Extract information
from the conversation history into the structured output below.

CRITICAL: Do NOT invent values. Only extract what is explicitly
present in the conversation. If a value is not found, use the
default for its type (string→null, int→null, list→[]).

FIELDS:
{chr(10).join(field_descriptions)}

Expected structure:
{json.dumps(example_output, indent=2)}

Extract all fields now. Every field must have a value."""

Example: Financial Report Compilation

from pydantic import BaseModel, Field
from typing import Optional, Literal, List


class EarningsReport(BaseModel):
    """Structured output for quarterly earnings analysis."""

    company_name: str = Field(description="Company legal name")
    ticker: str = Field(description="Stock ticker symbol")
    quarter: str = Field(description="Fiscal quarter, e.g. Q3 2024")
    revenue_actual: float = Field(description="Reported revenue in millions USD")
    revenue_consensus: float = Field(description="Analyst consensus revenue estimate")
    eps_actual: float = Field(description="Reported earnings per share")
    eps_consensus: float = Field(description="Analyst consensus EPS estimate")
    guidance_direction: Literal["raised", "maintained", "lowered", "withdrawn"] = Field(
        description="Direction of forward guidance change"
    )
    has_material_risks: bool = Field(
        description="Whether material risk factors were identified"
    )
    key_risks: Optional[str] = Field(
        description="Summary of material risks if any were found"
    )
    sentiment: Literal["bullish", "neutral", "bearish"] = Field(
        description="Overall analyst sentiment based on the analysis"
    )


# Generate the prompt automatically — no manual prompt writing
prompt = build_extraction_prompt(EarningsReport)

# Create the compiler
compiler = ReportCompiler(
    llm=my_llm,
    schema=EarningsReport,
    prompt=prompt,
)

# At the end of the pipeline, pass all accumulated messages
report = await compiler.compile(conversation_history)

# report is a validated EarningsReport instance
print(report.company_name)         # "Acme Corp"
print(report.guidance_direction)   # "lowered"
print(report.has_material_risks)   # True

And here's a completely different domain — same compiler, different schema:

class InsuranceClaimAssessment(BaseModel):
    """Structured output for insurance claim triage."""

    claim_id: str = Field(description="Unique claim identifier")
    claimant_name: str = Field(description="Name of the person filing the claim")
    incident_type: Literal["auto", "property", "liability", "health"] = Field(
        description="Category of the insurance claim"
    )
    estimated_amount: float = Field(description="Estimated claim amount in USD")
    fraud_risk: Literal["low", "medium", "high"] = Field(
        description="Assessed fraud risk level"
    )
    requires_adjuster: bool = Field(description="Whether a field adjuster visit is needed")
    notes: Optional[str] = Field(description="Additional context from the analysis")

# Same pattern — schema drives the prompt, compiler does the rest
prompt = build_extraction_prompt(InsuranceClaimAssessment)
compiler = ReportCompiler(llm=my_llm, schema=InsuranceClaimAssessment, prompt=prompt)

Why Dynamic Prompts Matter

You might ask: why not just let with_structured_output() handle everything? It does work without a custom prompt. But in practice, I found that:

Explicit field descriptions in the prompt dramatically reduce hallucination
Example output structures help the model understand the expected format, especially for nested objects and lists
Default value instructions prevent the model from inventing data when information is genuinely missing
Constraint reminders (for Literal fields) reduce mismatches between what the model generates and what Pydantic accepts

The dynamic prompt generator means I never write extraction prompts by hand. Define a schema, call build_extraction_prompt(), and the prompt stays perfectly in sync with the data model. When a product manager asks to add a new field to the report, I add it to the Pydantic class and the prompt updates itself.

How They Compose: A Complete Pipeline

Here's how these three patterns fit together in a real LangGraph workflow — using the financial analysis example end to end:

Each component has a single, well-defined job. The conversation history flows through the graph as messages, and each agent adds its contribution. By the time context reaches the Report Compiler, all the analysis and decisions are already in the message history — the compiler just extracts and structures.

LangGraph Integration

from langgraph.graph import StateGraph, MessagesState

workflow = StateGraph(MessagesState)

# Add nodes
workflow.add_node("analyze_earnings", run_earnings_analyzer)
workflow.add_node("route_by_risk", run_risk_router)
workflow.add_node("deep_dive", run_deep_analysis)
workflow.add_node("standard_track", run_standard_summary)
workflow.add_node("compile_report", run_report_compiler)

# Wire the flow
workflow.add_edge("analyze_earnings", "route_by_risk")
workflow.add_conditional_edges("route_by_risk", route_decision, {
    "earnings_deep_dive": "deep_dive",
    "standard_summary": "standard_track",
    "risk_alert": "deep_dive",
})
workflow.add_edge("deep_dive", "compile_report")
workflow.add_edge("standard_track", "compile_report")

graph = workflow.compile()

The Deeper Lesson: Agents Are Functions, Not Personalities

If there's one idea I'd want to stick, it's this: stop thinking of agents as autonomous entities with personalities, and start thinking of them as specialized functions with natural-language interfaces.

The Analyzer is a function that takes (prompt, tools, context) and returns analysis_text. The Router is a function that takes (context, route_options) and returns selected_route. The Compiler is a function that takes (context, schema) and returns structured_data. The fact that an LLM powers each one is an implementation detail.

This mental shift has three practical consequences:

First, it makes architecture decisions obvious. When you need to branch a workflow, you don't ask "how do I make my agent smarter?" — you add a Router. When you need structured output, you don't tune the analysis prompt to also produce JSON — you add a Compiler. Each problem has a pattern, and each pattern has a single responsibility.

Second, it makes testing tractable. You can test each agent in isolation with known inputs and expected outputs. You can mock the LLM and verify that tool calls happen in the right order. You can validate that the Router's output correctly drives the conditional edge. These are normal software engineering practices, applied to AI systems.

Third, it makes the system legible to your entire team. A product manager can read a Router prompt and understand the business logic. A QA engineer can look at the Pydantic schema and know exactly what the output should contain. An ops engineer can trace a misrouted document by reading the conversation history. The system is transparent because each piece does one thing and documents it in plain language.

The multi-agent systems that actually work in production aren't the ones with the cleverest prompts or the most autonomous agents. They're the ones built from simple, composable, well-tested parts — where the architecture does the heavy lifting and each agent is just good enough at its one job.

Build boring agents. Compose them well. Ship on Monday.

I'm a finance and AI professional building production agentic systems at the intersection of enterprise workflows and modern AI. If you're working on similar problems, let's connect — I'm always interested in comparing notes on what actually works.

Object tracking and video cropping with computer vision and machine learning

Ilya Rubtsov — Tue, 11 Apr 2023 20:10:21 +0000

Imagine you have a video footage captured by a fixed camera, and you want to have a video clip following an object on the screen. It could be a person walking left and right on a stage. Or a car moving. Or a tennis player with sudden movements. Sometimes it is difficult to follow an object with a camera. But it could be much easier if you take a wider shot and then crop it with a focus on a specific object. But the manual processing of this video could consume a lot of time. And this is the exact situation where machine learning can help. It is smart enough to find and capture an object on the screen. But we need another software to crop the video stream.

Both of these solutions - tracking an object and video stream and image processing are parts of the computer vision technology used in many applications and software systems.

I am going to present a simple implication of these technologies with a software code written in Python programming language. I am using a state-of-art YOLOv8 machine learning model for the object detection task and the OpenCV library for video processing purposes.

For a final solution there should be a lot of math with selecting a proper object and movement smoothing etc., but the basic logic is following:

Open an input video stream (I use a video file, however it could be screen or camera capture etc.)
Open an output video stream (video file in the example)
Read a frame from the input video stream as an image
Find a position of the object in the image
Crop the image around the object
Add the cropped image to the output video stream
Read the next frame from the input video stream until the end of the video.
If the video isn’t finished, then proceed to the step 4
Close the input video stream
Close the output video stream

This is the video processing only. If you want to keep the original audio track in the video then the best way to do it is to extract the audio from the original video and combine the resulting video with the original audio using FFMPEG or MoviePy libraries.

Next, I will show all the steps with appropriate software code in Python for each step.

Initialization

But first, we need to install some libraries for it and initialize our machine learning model. I will use YOLOv8 model here since it is the latest and the best pretrained model for object detection. This model has also the ability to classify and segment object and process different data sources including video camera streams, videofiles and images. It is pretrained on the COCO dataset, but you can improve its performance by training it on a custom dataset. It has several models of different size from nano to xlarge. Generally speaking, the bigger model gives you more precise results but works slower. You should choose one that better fits your needs. I use the small model here since from my experience it performs well enough without significant decrease in speed.
First, we need to create a virtual environment so all our software is isolated from other projects and environments. You can do it with Anaconda or Python. Anaconda is a very useful package for data science and machine learning. With Anaconda you need to install it first (https://www.anaconda.com/products/distribution#Downloads). Then you can create a virtual environment with the following console command:
conda create --name myenv
Then, you have to activate your virtual environment with the command:
conda activate myenv
The next code will not just install the YOLOv8 model, but also all of its required dependencies:
pip install ultralytics
Next command will install OpenCV library. It is used for the video and image processing.
pip install opencv-python
Sometimes you need to compile OpenCV from source or adjust the package, but usually the base approach works.
That's all, we are ready to start writing our Python script for the computer vision task. The initialisation part of the script:

from ultralytics import YOLO # import our machine learning model
import cv2 # import OpenCV
model = YOLO("yolov8s.pt")  # load a pretrained model

fileSource = 'test_video.mp4' # this is the source file we will process
fileTarget = 'test_vide0_processed.mp4' # this is the file path where processed video will be saved
cropCoords = [100,100,500,500] # coordinates of the cropping box we will start with, this cropping box will follow our object

Open an input video stream.

Also I will adjust the size of the cropping box if its size is bigger than the size of the video.

vidCapture = cv2.VideoCapture(fileSource)
fps = vidCapture.get(cv2.CAP_PROP_FPS)
totalFrames = vidCapture.get(cv2.CAP_PROP_FRAME_COUNT)
width = int(vidCapture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(vidCapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
if not cropCoords:
    [box_left, box_top, box_right, box_bottom] = [0, 0, width, height]
else:
    [box_left, box_top, box_right, box_bottom] = cropCoords
    if (box_left<0):
        box_left=0
    if (box_top<0):
        box_top=0
    if (box_right)>width:
        box_right=width
    if (box_bottom>height):
        box_bottom=height
lastCoords = [box_left, box_top, box_right, box_bottom]
lastBoxCoords = lastCoords
box_width = box_right-box_left
box_height = box_bottom-box_top

Open an output video stream.

I use MPEG codec for the video file, the same FPS as the input video stream (so the video speed will be the same, but you can adjust that), and the video dimensions as the size of the cropping box. You can resize the processed image also, but you should be consistent here, all the frames must have the same size as the output video stream size defined in the constructor.

outputWriter = cv2.VideoWriter(fileTarget, cv2.VideoWriter_fourcc(*'MPEG'), fps, (box_width, box_height))

Read a frame from the input video stream as an image. Read the next frame from the input video stream until the end of the video.

All the operations are performed inside the while cycle until the end of the input video stream.

frameCounter = 1
while True:

    r, im = vidCapture.read()

    if not r:
        print("Video Finished!")
        break

    print("Frame: "+str(frameCounter))
    frameCounter = frameCounter+1

Find a position of the object in the image.

Sometimes video can have several objects. I would say that usually a video contains many objects. And since we process video frame-by-frame, the machine learning model finds many objects in the frame. Our task is to figure out which object is the correct one. I use pretty complicated logic to find it out, but for the purpose of this tutorial the easiest way to do it is to find the closest object to the position of the selected object in the previous frame. This approach works since frames usually don't change much.

    results = model.predict(source=im, conf=0.5, iou=0.1) # request for the YOLO model to find objects, you can see the documentation on the YOLO model for params
    boxes = results[0].boxes # boxes are coordinates of objects YOLO has found
    box = closestBox(boxes, lastBoxCoords)  # returns the best box - closest to the last one
    lastBoxCoords = box.xyxy[0].numpy().astype(int) # converts the PyTorch Tensor into box coordinates and saves for the next iteration

Crop the image around the object

    newCoords = adjustBoxSize(box.xyxy[0].numpy().astype(int), box_width, box_height) # since the area YOLO has found for the object depends on the object but not on the cropping area we need to convert the area of the object to the cropping area
    newCoords = adjustBoundaries(newCoords,[width, height]) # don't allow to get the cropping area go out of video screen edges
    [box_left, box_top, box_right, box_bottom] = newCoords
    imCropped = im[box_top:box_bottom, box_left:box_right] # cropping the image

Add the cropped image to the output video stream

    outputWriter.write(imCropped) # writing the cropped image as the new frame into the output video stream

This is the end of the code block inside the while loop. The program will get from the input stream and process the next frame.

Close input and output video streams

After all frames are processed we have to close the video streams.

vidCapture.release()
outputWriter.release()

You can set the source and result file names with fileSource and fileTarget variables or you can use environment or other ways to tell the program what files to process.

Additional support functions

Also we need three functions here: closestBox to find the best next object, adjustBoxSize to convert the size of the object to the size of the cropping area and adjustBoundaries to keep the cropping area inside the video boundaries. Also I use one additional function boxCenter that returns horizontal and vertical coordinates of an area's center.

def boxCenter(coords):
    [left, top, right, bottom] = coords
    return [(left+right)/2,(top+bottom)/2]

def closestBox(boxes, coords):
    distance = []
    center = boxCenter(coords)
    for box in boxes:
        boxCent = boxCenter(box.xyxy[0].numpy().astype(int))
        distance.append(math.dist(boxCent,center))
    return boxes[distance.index(min(distance))]

def adjustBoxSize(coords, box_width, box_height):
    [centerX, centerY] = boxCenter(coords)
    return [centerX-box_width/2, centerY-box_height/2, centerX+box_width/2, centerY+box_height/2]

def adjustBoundaries(coords, screen):
    [left, top, right, bottom] = coords
    [width, height]=screen
    if left<0:
        right=right-left
        left=0
    if top<0:
        bottom=bottom-top
        top=0
    if right>width:
        left=left-(right-width)
        right=width
    if bottom>height:
        top=top-(bottom-height)
        bottom=height
    return [round(left), round(top), round(right), round(bottom)]

And that's all for now.
You could add smoothing to movements of the video, remove background, adjust lighting and perform a lot of other operations with the video. But this article describes basic principles how you can use machine learning, computer vision and useful Python libraries to process any videos you like.

Example

And there is an example you can get with this software:
Source video:

Processed file with the computer vision:

Cover photo by cottonbro studio from Pexels: https://www.pexels.com/photo/hand-of-a-person-and-a-bionic-hand-6153343/

YOLOv8 classifier trained on a custom dataset

Ilya Rubtsov — Tue, 28 Feb 2023 01:05:56 +0000

YOLO (you only look once) is an advanced deep learning model that allows ML software developers to solve computer vision problems easily and efficiently. YOLOv8 is the latest version released in January 2023. It includes a number of pretrained models with different set of parameters (5 options from nano to xlarge).
YOLOv8 can solve three tasks related to computer vision: object detection, segmentation and classification. Each of the tasks has its own scope of application, which can be visualized in the image:

Classification is needed when you want to understand what kind of object is shown in the image. It doesn't matter where in the image the object is located. Often it can be similar images, such as products on a store shelf or letters on a license plate. With a machine learning model, the object can be recognized quickly and accurately.

YOLOv8 has several model variants, which have been pretrained on known and common datasets. Detection and Segmentation models are pretrained on the COCO dataset, while Classification models are pretrained on the ImageNet dataset.
Unfortunately, these datasets and the models trained on them are not always well suited for a particular application. For example, if you need to track people in a video, the COCO dataset may not be a good fit, because in addition to people it will find chairs, cars, phones and other objects. So in many application tasks there is a need to train models on a custom dataset.

YOLOv8 allows developers to train the model on custom datasets, this can be done both from the command line, and with the help of program code written in Python.
CLI:

yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100

Python:

from ultralytics import YOLO

model = YOLO("yolov8n.pt")
model.train(data="coco128.yaml", epochs=5)

The key to a model's ability to make accurate predictions is to prepare the dataset in the format required for use in the model.

Model training consists of 5 stages:

preparing images and assigning them to classes
split data for train, valuation, and test
preparation the configuration file
selecting the structure of the model
running the model’s training

1. Preparing images and assigning them to classes

Usually a dataset prepared for training an object detection model consists of images and special files for each image with annotations of the objects depicted in the image with the indication of the coordinates of the object location. These files are not needed for the classification task.
The class of objects for deep learning model YOLOv8 is determined by placing the image in the folder with the class name. All the images are sufficiently placed in the folders on disk, and these folders’ names define the class names.

2. Splitting data for model’s train, evaluation, and test

Traditionally in machine learning model training, a dataset is divided into three parts: the first part is used to train the model, the second part to validate the accuracy of the model, and the third part to objectively test the model on new data that the model has not seen. Usually the dataset is divided into these three parts in the proportion of 70-20-10, but it can be any ratio.

In order to divide the data for the YOLOv8 model, you need to create special folders within a dataset’s directory. The "datasets" folder should reside in the folder where your project's work files are located and model training is running. Within this “datasets" folder you should create a folder with the name of your dataset, and then train, val, and test folders. Each of the train, val, test folders should have folders with class names which contain files with dataset images.

As a result the structure of folders looks like this:

3. Preparing the configuration file

In the datasets directory, you need to prepare a configuration file that tells the model which classes it should recognize. Here is an example of a configuration file with three classes:

train: train/
valid: valid/
test: test/

# number of classes
nc: 3

# class names
names: ["сlass1","сlass2","сlass3"]

The filename should be the name of your dataset (the same as your dataset's folder name), with the extension of ".yaml". The structure of this configuration file is obvious, and you can adjust it to fit your project.

4. Selecting the structure of the model

A neural machine learning model consists of multiple layers with varying numbers of parameters. The structure of the model layers can be defined manually, but you can use a ready-made structure of one of the pretrained models. For the classification problem, YOLOv8 has 5 ready-made models, which differ in the number of parameters, accuracy and speed:

It is advisable to try different models and choose the one that will be optimal for your particular project with respect to speed and accuracy.

5. Running the model’s training (and waiting for a long time)

After you have placed all the images in folders and prepared the configuration file, the final step is to run the model training. The easiest way to do this is from the command line or terminal, if you are using a server.

yolo task=classify mode=train data=mydataset model=yolov8n-cls.pt epochs=50

As you can see, the name of your dataset with corresponding folder and configuration file is set by the data parameter, and the selected model structure (in this example it is yolov8n-cls.pt) is defined in the model parameter. You should perform at least 10 runs (epochs), depending on the model and your dataset it could be 50-100.
You will see the whole process of training the model and the results of each run. It is desirable to use a computer with a powerful video card, plenty of memory and the PyTorch library with CUDA support for training.
When finished, the trained model will be saved to the address runs/classify/trainX/weights/best.pt, where X is the sequential number of training runs.

In order to use the new model in the program, you can use the code in Python:

from ultralytics import YOLO
model = YOLO("runs/classify/train1/weights/best.pt")
filePath = "img.jpg"
results = model(filePath)[0]
results = results.probs.tolist()
print(“Maximum probability: ",max(results))
print(“Class with maximum probability: ",results.index(max(results))+1)
    return results.index(max(results))+1

This code will show the class number that the neural network has detected as the best match for the selected image in the img.jpg file.
Thanks for reading and good luck!
If you have any questions, write in the comments, I will do my best to help you.
Regards, Ilya.