DEV Community

Nebula
Nebula

Posted on

How to Stream AI Agent Responses in 5 Min

Your agent calls three tools, thinks for 20 seconds, then dumps a wall of text. The user sees nothing until it's done. That's a broken experience.

Here's how to stream agent responses -- both text tokens and tool call events -- so users see progress in real time.

The Code

import asyncio
from agents import Agent, Runner, function_tool
from openai.types.responses import ResponseTextDeltaEvent


@function_tool
def lookup_price(ticker: str) -> str:
    """Look up the current price of a stock."""
    prices = {"AAPL": "$198.50", "GOOG": "$176.30", "TSLA": "$245.10"}
    return prices.get(ticker.upper(), f"No data for {ticker}")


agent = Agent(
    name="StockAssistant",
    instructions="You help users check stock prices. Use the lookup_price tool.",
    tools=[lookup_price],
)


async def main():
    result = Runner.run_streamed(agent, input="What's the price of AAPL and GOOG?")

    async for event in result.stream_events():
        if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
            print(event.data.delta, end="", flush=True)
        elif event.type == "run_item_stream_event":
            if event.item.type == "tool_call_item":
                print(f"\n>> Calling tool...")
            elif event.item.type == "tool_call_output_item":
                print(f">> Tool returned: {event.item.output}")
            elif event.item.type == "message_output_item":
                pass  # Already streaming via raw events above

    print()  # Final newline


if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Install and run:

pip install openai-agents
export OPENAI_API_KEY="sk-..."
python stream_agent.py
Enter fullscreen mode Exit fullscreen mode

What You'll See

>> Calling tool...
>> Tool returned: $198.50
>> Calling tool...
>> Tool returned: $176.30
Apple (AAPL) is currently at $198.50 and Alphabet (GOOG)
is at $176.30. Both are showing...
Enter fullscreen mode Exit fullscreen mode

Tokens appear one by one as the agent generates them. Tool calls show up the moment they happen -- not after the entire run completes.

How It Works

Runner.run_streamed() replaces Runner.run(). Instead of blocking until the agent finishes, it returns a result object immediately. You consume events from result.stream_events() as an async iterator.

Three event types matter:

raw_response_event fires for every token the LLM generates. Filter for ResponseTextDeltaEvent to grab the actual text deltas. Print them with end="" and flush=True to get the token-by-token effect.

run_item_stream_event fires when a complete item is generated -- a tool call, a tool output, or a finished message. This is where you show "Calling tool..." progress indicators.

agent_updated_stream_event fires when the current agent changes (during handoffs). You can skip this for single-agent setups.

The key insight: raw events give you real-time text, while item events give you structured milestones. Use both together for the best UX.

Streaming vs Blocking: The Difference

With Runner.run(), a two-tool-call agent takes 10-15 seconds of silence followed by the complete response. With Runner.run_streamed(), users see tool calls happening within 1-2 seconds and text appearing token-by-token immediately after.

For CLI apps, the code above works as-is. For web apps, replace print() with writes to a Server-Sent Events stream or WebSocket.

Quick Tips

  • Don't mix streaming with .final_output. The result.final_output is only available after the stream is fully consumed. Read events first, then access it.
  • Handle None deltas. Some chunks have empty deltas -- the isinstance check filters those out.
  • Tool approval works too. If a tool requires human approval, the stream pauses at that point. Check result.interruptions after the stream ends.

Next Steps

Combine streaming with the patterns from earlier articles in this series -- retry logic, structured output, and human approval gates -- to build agents that are responsive, reliable, and safe.

Building agents that need streaming, tools, and orchestration out of the box? Nebula handles the infrastructure so you can focus on the logic.

Top comments (0)