DEV Community: Ajay Gupta

AG-UI + LangGraph Streaming: Technical Implementation Guide

Ajay Gupta — Thu, 11 Sep 2025 10:33:49 +0000

🎯 Purpose

This guide shows how to achieve real-time event streaming from AI workflows to UI using AG-UI protocol with LangGraph StateGraph execution. The approach provides sub-100ms latency for live user feedback during complex AI operations.

🚀 Demo Application

A complete working implementation of this architecture is available at:
https://github.com/cimulink/ai-workflow-engine

The repository includes:

Pure LangGraph + AG-UI server implementation
React frontend with real-time streaming
Document processing workflow example
Complete setup and deployment instructions

📋 AG-UI Protocol Context

AG-UI defines a standardized set of event types for real-time agent-user interaction. Our goal is to adapt our entire LangGraph workflow to generate these predefined events and stream them from backend to frontend for real-time user feedback.

AG-UI Message Types

AG-UI defines several event categories for different aspects of agent communication:

Lifecycle Events

RUN_STARTED, RUN_FINISHED, RUN_ERROR
STEP_STARTED, STEP_FINISHED

Text Message Events

TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END

Tool Call Events

TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END

State Management Events

STATE_SNAPSHOT, STATE_DELTA, MESSAGES_SNAPSHOT

Special Events

RAW, CUSTOM

🏗️ Architecture Overview

The system combines three key components:

LangGraph StateGraph - Handles workflow orchestration with nodes and conditional edges
AG-UI Protocol - Defines event types for real-time UI communication
HTTP Streaming - Uses Server-Sent Events (SSE) for browser-compatible streaming

High-Level Architecture

Data Flow Overview

🔧 Core Technical Components

1. asyncio.Queue: Event Management Hub

The asyncio.Queue acts as a thread-safe buffer between LangGraph node execution and HTTP streaming. When LangGraph nodes complete, they place events in the queue. The HTTP streaming endpoint continuously reads from the queue and sends events to the frontend.

Key Benefits:

Thread Safety: Multiple nodes can emit events concurrently
Backpressure: Prevents memory overflow during heavy processing
Order Preservation: Events maintain chronological sequence
Non-blocking: Event production and consumption happen independently

2. yield Keyword: Python Streaming Pattern

Python's yield keyword transforms regular functions into async generators. Instead of returning all results at once, the function yields events one by one as they're produced. This creates a memory-efficient streaming pipeline where events are processed immediately rather than buffered.

Streaming Benefits:

Memory Efficient: Events are yielded as produced, not stored
Real-time: Zero buffering delay between event production and consumption
Lazy Evaluation: Execution pauses until next event is requested
Natural Flow Control: Consumer controls processing pace

3. Custom AGUIStreamingCheckpointer: State + Events

This extends LangGraph's built-in SqliteSaver checkpointer to automatically emit AG-UI events whenever workflow state changes. Every time a node completes and LangGraph saves a checkpoint, the custom checkpointer also emits a corresponding AG-UI event.

Checkpointer Advantages:

Automatic Events: No manual event emission required in nodes
State Consistency: Events always reflect actual LangGraph state
Unified Persistence: Database and streaming work together
Recovery Support: Failed workflows can resume with complete event history

4. HTTP Streaming: Server-Sent Events (SSE)

FastAPI serves AG-UI events using Server-Sent Events format. SSE is a web standard that allows servers to push data to browsers over a single HTTP connection. Each event is formatted as data: {json}\n\n and sent immediately to the client.

SSE Benefits Over WebSocket:

Simpler Protocol: Standard HTTP, no connection upgrades needed
Auto-Reconnection: Browsers handle reconnection automatically
Firewall Friendly: Uses standard HTTP ports, works through proxies
One-Way Optimal: Perfect for event streaming (no bidirectional needed)
Better Debugging: Standard HTTP tools work (curl, Postman)

🔄 End-to-End Flow

Complete Sequence Diagram

Step-by-Step Flow

Frontend Request: React component sends document content via useAgent hook
HTTP Streaming Setup: FastAPI creates async generator for event streaming
LangGraph Execution: Workflow nodes execute with conditional routing
Event Generation: Custom checkpointer emits AG-UI events on state changes
Queue Management: Events flow through asyncio.Queue to HTTP response
Frontend Processing: Browser receives SSE events and updates UI in real-time

Event Flow Visualization

✅ Pros and Cons

Advantages

Real-Time UX: Immediate progress feedback, responsive human-in-the-loop
True LangGraph: Preserves StateGraph orchestration, conditional edges, checkpointing
Scalable: AsyncIO handles concurrent streams, event sourcing audit trail
Developer Friendly: Standard HTTP debugging, type-safe events, familiar React patterns

Disadvantages

Complexity: Requires understanding async generators, custom checkpointer maintenance
Resource Usage: Persistent connections, database growth
Error Handling: Stream interruptions, partial failures, connection recovery logic
Testing: Async streaming harder to unit test, timing considerations, race conditions

🎯 Implementation Strategy

Progressive Implementation Roadmap

Event Design Patterns

Error Handling Strategy

Graceful Degradation: Emit error events, return partial results
Connection Recovery: Frontend auto-reconnection with exponential backoff
Event Batching: Reduce network overhead during high-volume periods

🚀 When to Use This Architecture

Perfect For:

Document Processing: Multi-step analysis with human review
Data Pipelines: Real-time ETL progress tracking
AI Agents: Conversational workflows with tool usage
Long Tasks: Processes >30 seconds needing progress updates

Consider Alternatives:

Simple Request/Response: Single API calls
Batch Processing: No real-time requirements
Resource Constrained: Limited memory/bandwidth

💡 Key Takeaways

This architecture delivers production-ready real-time AI workflow interfaces by combining:

LangGraph's orchestration (StateGraph, nodes, edges)
AG-UI's streaming protocol (real-time events)
HTTP SSE streaming (browser-compatible, simple debugging)
asyncio.Queue + yield (memory-efficient event pipeline)

Success Factors: Design events for user value, implement robust error handling, monitor performance, test streaming behavior, plan for horizontal scaling.

The complexity is justified when user experience and workflow transparency are critical. Start simple and add streaming capabilities as real-time interaction becomes essential.

Don't Run it Twice: Mastering Idempotency in Production LangGraph Agents

Ajay Gupta — Wed, 10 Sep 2025 07:15:02 +0000

You've built an amazing AI agent with LangGraph, but what happens when things fail? What if an API times out, or a process restarts? Will you charge a customer twice or create duplicate users? If these questions make you nervous, you need to think about idempotency. It's the unsung hero of reliable systems and a must for production-grade AI agents.

This post covers what idempotency is, why it's critical for LangGraph, and how to implement it with practical code for both simple and concurrent scenarios.

What is Idempotency, and Why Should I Care?

An operation is idempotent if calling it multiple times has the same effect as calling it once. Think of setting a light to ON. Whether you send the command once or ten times, the result is the same: the light is on.

Many actions in agentic workflows are not naturally idempotent, like creating a booking (POST /api/bookings), charging a customer, or sending a notification. When your multi-step graph executes, any step can fail. Naively retrying a non-idempotent operation leads to duplicate data and unhappy users.

The Core Pattern: The Idempotency Key

The standard way to enforce idempotency is through a contract between your LangGraph node (the client) and the API you're calling (the server).

Client Generates Key: Before the first attempt, the client generates a unique idempotency key for that specific operation.
Client Sends Key: The client sends this key with every request, usually in an HTTP header like Idempotency-Key: .
Server Checks Key: The server tracks processed keys. If a request has a new key, it processes it and stores the result. If the key has been seen before, it skips processing and returns the stored result.

This guarantees that even with multiple retries, the server-side logic runs only once.

Example: The Idempotent Flight Booker ✈️

Let's implement this in LangGraph for a flaky flight booking agent using the tenacity library (pip install tenacity).

Step 1: The Graph State

Our State needs a field to hold the idempotency key, keeping it stable across retries of a node.

Step 2: The Graph Nodes

We'll use one node to generate the key and another to perform the retriable action.

Step 3: Assemble and Run

The flow is simple: generate the key, then attempt the booking.

This pattern is perfect for a single process. But how do you handle concurrency?

The Hard Part: Idempotency with Concurrent Workers

When your application is deployed with multiple replicas (e.g., on Kubernetes), two workers could retry the exact same task at the exact same time. This creates a race condition, undermining our idempotency guarantee.

The solution is to use a shared, persistent state manager that supports atomic operations, like Redis.

The "Claim Check" Pattern with Redis 🎟️

This pattern ensures only one worker can "claim" the right to execute an operation for a given key.

Stable Key: The idempotency key must be deterministic (e.g., a hash of the flight details) so any worker can regenerate it.
Atomic SET: Before acting, a worker tries to claim the key in Redis using the atomic SET ... NX command. NX means "only set this key if it does not already exist."
Race Solved: The first worker's SET NX command succeeds, granting it a "lock" to proceed. Any other worker's attempt will fail, telling it to back off.

LangGraph's persistent Checkpointers (like RedisSaver) are perfect for this, as your graph's state already lives in the shared store you can use for locking.

Here's a conceptual snippet for a concurrent book_flight node:

Key Takeaways & Best Practices

Identify Critical Actions: Focus on nodes with external side effects (database writes, payments, etc.).
Generate Keys Before the Action: The key must be created and saved to the state before the fallible operation begins.
Use Persistent Checkpointers: For any serious workload, use a persistent checkpointer (RedisSaver, SQLiteSaver). This is the foundation for resilience.
Embrace the Claim Check: For concurrent workers, use a distributed locking mechanism like Redis SET NX to prevent race conditions.
Log Everything: Log key generation, retries, and lock statuses. These logs will be invaluable for debugging.

By mastering idempotency, you can turn a cool LangGraph prototype into a robust, reliable, and production-ready application.

Happy building!