Your agent calls three tools, thinks for 20 seconds, then dumps a wall of text. The user sees nothing until it's done. That's a broken experience.
Here's how to stream agent responses -- both text tokens and tool call events -- so users see progress in real time.
The Code
import asyncio
from agents import Agent, Runner, function_tool
from openai.types.responses import ResponseTextDeltaEvent
@function_tool
def lookup_price(ticker: str) -> str:
"""Look up the current price of a stock."""
prices = {"AAPL": "$198.50", "GOOG": "$176.30", "TSLA": "$245.10"}
return prices.get(ticker.upper(), f"No data for {ticker}")
agent = Agent(
name="StockAssistant",
instructions="You help users check stock prices. Use the lookup_price tool.",
tools=[lookup_price],
)
async def main():
result = Runner.run_streamed(agent, input="What's the price of AAPL and GOOG?")
async for event in result.stream_events():
if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
print(event.data.delta, end="", flush=True)
elif event.type == "run_item_stream_event":
if event.item.type == "tool_call_item":
print(f"\n>> Calling tool...")
elif event.item.type == "tool_call_output_item":
print(f">> Tool returned: {event.item.output}")
elif event.item.type == "message_output_item":
pass # Already streaming via raw events above
print() # Final newline
if __name__ == "__main__":
asyncio.run(main())
Install and run:
pip install openai-agents
export OPENAI_API_KEY="sk-..."
python stream_agent.py
What You'll See
>> Calling tool...
>> Tool returned: $198.50
>> Calling tool...
>> Tool returned: $176.30
Apple (AAPL) is currently at $198.50 and Alphabet (GOOG)
is at $176.30. Both are showing...
Tokens appear one by one as the agent generates them. Tool calls show up the moment they happen -- not after the entire run completes.
How It Works
Runner.run_streamed() replaces Runner.run(). Instead of blocking until the agent finishes, it returns a result object immediately. You consume events from result.stream_events() as an async iterator.
Three event types matter:
raw_response_event fires for every token the LLM generates. Filter for ResponseTextDeltaEvent to grab the actual text deltas. Print them with end="" and flush=True to get the token-by-token effect.
run_item_stream_event fires when a complete item is generated -- a tool call, a tool output, or a finished message. This is where you show "Calling tool..." progress indicators.
agent_updated_stream_event fires when the current agent changes (during handoffs). You can skip this for single-agent setups.
The key insight: raw events give you real-time text, while item events give you structured milestones. Use both together for the best UX.
Streaming vs Blocking: The Difference
With Runner.run(), a two-tool-call agent takes 10-15 seconds of silence followed by the complete response. With Runner.run_streamed(), users see tool calls happening within 1-2 seconds and text appearing token-by-token immediately after.
For CLI apps, the code above works as-is. For web apps, replace print() with writes to a Server-Sent Events stream or WebSocket.
Quick Tips
-
Don't mix streaming with
.final_output. Theresult.final_outputis only available after the stream is fully consumed. Read events first, then access it. -
Handle
Nonedeltas. Some chunks have empty deltas -- theisinstancecheck filters those out. -
Tool approval works too. If a tool requires human approval, the stream pauses at that point. Check
result.interruptionsafter the stream ends.
Next Steps
Combine streaming with the patterns from earlier articles in this series -- retry logic, structured output, and human approval gates -- to build agents that are responsive, reliable, and safe.
Building agents that need streaming, tools, and orchestration out of the box? Nebula handles the infrastructure so you can focus on the logic.
Top comments (0)