Build Your First Multi-Agent System with OpenAI Agents SDK — Step-by-Step Python Tutorial (2026)
You have heard the buzz around AI agents. You have probably built a single-agent chatbot. But real-world automation needs multiple agents working together — one to research, one to write, one to review and catch mistakes.
The OpenAI Agents SDK makes this surprisingly straightforward. In this tutorial, you will build a complete multi-agent content pipeline: a Research Agent gathers information, a Writer Agent drafts content, and a Reviewer Agent validates quality with guardrails. All orchestrated through handoffs, running in under 100 lines of core logic.
By the end, you will understand every building block — Agent, Runner, Handoff, and Guardrails — and have a working system you can adapt to your own projects.
What you will build: A three-agent content pipeline where agents hand off work to each other automatically. The Research Agent finds information, the Writer Agent creates a draft, and the Reviewer Agent enforces quality standards using guardrails.
What Is the OpenAI Agents SDK?
The OpenAI Agents SDK is an open-source Python framework for building multi-agent AI systems. Originally developed as a successor to the experimental Swarm library, it provides production-ready primitives for creating agents that can use tools, delegate work to each other, and enforce safety checks — all with minimal boilerplate.
Key characteristics:
| Feature | Detail |
|---|---|
| Language | Python (3.10+) |
| Current version | 0.13.4 (April 2026) |
| License | MIT |
| LLM support | OpenAI models natively, 100+ models via LiteLLM |
| Core primitives | Agent, Runner, Handoff, Guardrails, Tools |
| Install size | Lightweight — Pydantic and Requests as main dependencies |
Unlike heavier frameworks that require you to learn complex graph abstractions, the OpenAI Agents SDK keeps things Pythonic. You define agents as objects, wire them together with handoffs, and run them with a single Runner.run() call.
If you have worked with the LangGraph framework (covered in our previous tutorial), the Agents SDK takes a different philosophy: less explicit graph construction, more implicit orchestration through handoffs and tool calls.
Prerequisites
Before we start building, make sure you have:
- Python 3.10 or higher (the SDK requires 3.10+, supports up to 3.14)
- An OpenAI API key with access to GPT-4o or later models
- Basic Python knowledge — functions, classes, async/await
- A terminal and a code editor
Installation
Create a project directory and set up a virtual environment:
mkdir multi-agent-pipeline && cd multi-agent-pipeline
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the SDK with a pinned version:
pip install openai-agents==0.13.4
Create a requirements.txt for reproducibility:
openai-agents==0.13.4
pydantic>=2.0
API Key Setup
Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY="sk-your-key-here"
Or create a .env file (never commit this to version control):
OPENAI_API_KEY=sk-your-key-here
Cost note: This tutorial uses
gpt-4o-minifor most agents to keep costs low. A full pipeline run typically consumes 3,000–8,000 tokens. At current pricing (approximately $0.15 per 1M input tokens and $0.60 per 1M output tokens for gpt-4o-mini), each run costs well under $0.01. We break down real costs in the cost analysis section below.
Core Concepts: Agent, Runner, Handoff, Guardrails
Before writing code, let us understand the four building blocks. These are the only concepts you need to build sophisticated multi-agent systems.
Agent
An Agent is an LLM equipped with instructions and tools. Think of it as a specialized worker with a clear job description.
from agents import Agent
research_agent = Agent(
name="Research Agent",
instructions="You are a research specialist. Find accurate, up-to-date information on any topic.",
model="gpt-4o-mini",
)
Key parameters:
-
name— Human-readable identifier -
instructions— The system prompt that defines the agent's behavior -
model— Which LLM to use -
tools— Functions the agent can call -
handoffs— Other agents it can delegate to -
output_type— Pydantic model for structured output
Runner
The Runner executes agents. It manages the agent loop: call the LLM, process tool calls, handle handoffs, and repeat until the agent produces a final output.
from agents import Runner
result = Runner.run_sync(research_agent, "What is retrieval-augmented generation?")
print(result.final_output)
Three execution modes:
| Method | Use case |
|---|---|
Runner.run() |
Async execution (recommended for production) |
Runner.run_sync() |
Synchronous wrapper (simpler for scripts) |
Runner.run_streamed() |
Async with streaming events |
Handoff
A Handoff lets one agent delegate work to another. Under the hood, handoffs appear as tools to the LLM — when the triage agent decides to hand off to the writer, it calls a transfer_to_writer_agent tool.
triage_agent = Agent(
name="Triage Agent",
instructions="Route research requests to the Research Agent, writing tasks to the Writer Agent.",
handoffs=[research_agent, writer_agent],
)
Guardrails
Guardrails validate inputs and outputs. They can run in parallel with agent execution (for speed) or block execution until validation passes (for safety).
from agents import input_guardrail, GuardrailFunctionOutput
@input_guardrail
async def check_topic_safety(ctx, agent, input):
# Validate the input before the agent processes it
result = ... # Your validation logic
return GuardrailFunctionOutput(
output_info={"safe": True},
tripwire_triggered=False,
)
Project Structure
Here is what we are building:
multi-agent-pipeline/
├── requirements.txt
├── agents/
│ ├── __init__.py
│ ├── research_agent.py
│ ├── writer_agent.py
│ └── reviewer_agent.py
├── tools/
│ ├── __init__.py
│ └── search_tools.py
├── guardrails/
│ ├── __init__.py
│ └── quality_checks.py
├── main.py
└── run_pipeline.py
Let us build each piece step by step.
Building Agent #1: The Research Agent
The Research Agent's job is to gather information on a given topic. We will give it a web search tool and structured output so downstream agents get clean data.
Define the Output Schema
First, define what the Research Agent should return:
# agents/research_agent.py
from pydantic import BaseModel, Field
from agents import Agent, function_tool
class ResearchResult(BaseModel):
"""Structured output from the Research Agent."""
topic: str = Field(description="The researched topic")
summary: str = Field(description="A 2-3 paragraph summary of findings")
key_facts: list[str] = Field(description="5-8 key facts discovered")
sources_note: str = Field(description="Note about information sources and currency")
Create the Search Tool
The Research Agent needs a tool to search for information. Here we create a simulated search tool — in production, you would connect this to a real search API:
# tools/search_tools.py
from agents import function_tool
@function_tool
def web_search(query: str) -> str:
"""Search the web for information on a given query.
Args:
query: The search query to look up.
"""
# In production, connect to a real search API (Brave, Serper, Tavily, etc.)
# For this tutorial, the agent will use its training knowledge
# and note that results should be verified.
return (
f"Search results for: '{query}'\n"
f"Note: In production, this would return real search results. "
f"The agent should use its knowledge and clearly mark any claims "
f"that need verification."
)
@function_tool
def save_research_notes(topic: str, notes: str) -> str:
"""Save research notes for a topic.
Args:
topic: The topic being researched.
notes: The research notes to save.
"""
# In production, persist to a database or file
return f"Research notes saved for topic: {topic}"
Assemble the Research Agent
# agents/research_agent.py (continued)
from tools.search_tools import web_search, save_research_notes
research_agent = Agent(
name="Research Agent",
instructions="""You are an expert research analyst. Your job is to gather
accurate, comprehensive information on any given topic.
Rules:
- Search for the topic using the web_search tool
- Compile findings into a structured format
- Include 5-8 key facts with specific details
- Note the currency and reliability of information
- If you cannot verify a claim, mark it as [UNVERIFIED]
- Never fabricate statistics or quotes""",
model="gpt-4o-mini",
tools=[web_search, save_research_notes],
output_type=ResearchResult,
)
Test It Standalone
from agents import Runner
result = Runner.run_sync(
research_agent,
"Research the current state of multi-agent AI systems in 2026"
)
print(f"Topic: {result.final_output.topic}")
print(f"Summary: {result.final_output.summary}")
for fact in result.final_output.key_facts:
print(f" • {fact}")
Pro tip: Using
output_type=ResearchResultforces the agent to return a Pydantic model instead of free text. This is critical for multi-agent pipelines — downstream agents receive predictable, typed data instead of parsing unstructured strings. The SDK handles JSON schema generation and validation automatically.
Building Agent #2: The Writer Agent
The Writer Agent takes research output and produces a well-structured draft. It receives the Research Agent's structured output as its input context.
Define the Writer Output
# agents/writer_agent.py
from pydantic import BaseModel, Field
from agents import Agent
class WriterOutput(BaseModel):
"""Structured output from the Writer Agent."""
title: str = Field(description="Article title")
draft: str = Field(description="The full article draft in markdown")
word_count: int = Field(description="Approximate word count")
sections: list[str] = Field(description="List of section headings used")
Assemble the Writer Agent
# agents/writer_agent.py (continued)
writer_agent = Agent(
name="Writer Agent",
instructions="""You are a skilled technical writer. Your job is to take
research findings and produce a well-structured, engaging article draft.
Rules:
- Write in a clear, practical tone suitable for developers
- Use markdown formatting with proper headings (##, ###)
- Include code examples where relevant
- Target 800-1200 words for the draft
- Structure: Introduction → Main Sections → Practical Takeaways → Conclusion
- Never fabricate quotes, statistics, or case studies
- If the research notes something as [UNVERIFIED], keep that marker""",
model="gpt-4o-mini",
output_type=WriterOutput,
)
Notice the Writer Agent has no tools — it is a pure text generation agent. Not every agent needs tools. The Writer focuses entirely on transforming structured research into polished prose.
Building Agent #3: The Reviewer Agent with Guardrails
The Reviewer Agent is our quality gate. It checks the draft for accuracy, completeness, and quality issues. This is where guardrails shine.
Define Quality Check Guardrails
# guardrails/quality_checks.py
from agents import output_guardrail, GuardrailFunctionOutput
@output_guardrail
async def check_no_fabrication(ctx, agent, output):
"""Check that the output does not contain fabricated data markers."""
draft_text = output.draft if hasattr(output, 'draft') else str(output)
fabrication_markers = [
"according to a study", # vague attribution without source
"research shows that 99%", # suspicious round statistics
"as John Smith, CEO", # likely fabricated quotes
]
issues = []
for marker in fabrication_markers:
if marker.lower() in draft_text.lower():
issues.append(f"Potential fabrication detected: '{marker}'")
return GuardrailFunctionOutput(
output_info={"issues": issues, "passed": len(issues) == 0},
tripwire_triggered=len(issues) > 0,
)
@output_guardrail
async def check_minimum_length(ctx, agent, output):
"""Ensure the draft meets minimum word count."""
draft_text = output.draft if hasattr(output, 'draft') else str(output)
word_count = len(draft_text.split())
return GuardrailFunctionOutput(
output_info={"word_count": word_count, "minimum": 200},
tripwire_triggered=word_count < 200,
)
Define the Review Output
# agents/reviewer_agent.py
from pydantic import BaseModel, Field
from agents import Agent
from guardrails.quality_checks import check_no_fabrication, check_minimum_length
class ReviewResult(BaseModel):
"""Structured output from the Reviewer Agent."""
approved: bool = Field(description="Whether the draft passes review")
score: int = Field(description="Quality score from 1-10")
feedback: list[str] = Field(description="List of feedback items")
final_draft: str = Field(description="The approved or revised draft")
Assemble the Reviewer Agent
# agents/reviewer_agent.py (continued)
reviewer_agent = Agent(
name="Reviewer Agent",
instructions="""You are a meticulous content reviewer and editor. Your job
is to evaluate article drafts for quality, accuracy, and completeness.
Review checklist:
1. Factual accuracy — flag any claims that seem unsupported
2. Structure — verify logical flow and proper headings
3. Completeness — ensure the topic is covered adequately
4. Tone — confirm it matches a practical, developer-friendly style
5. No fabrication — reject any invented statistics, quotes, or case studies
Scoring guide:
- 8-10: Approve with minor notes
- 5-7: Needs revision, provide specific feedback
- 1-4: Reject, major issues found
If approved, return the draft as-is in final_draft.
If revisions are needed, apply them yourself and return the improved version.""",
model="gpt-4o-mini",
output_type=ReviewResult,
output_guardrails=[check_no_fabrication, check_minimum_length],
)
Pro tip: Output guardrails run after the agent produces its result but before it is returned to your code. If a guardrail trips, the SDK raises
OutputGuardrailTripwireTriggered, giving you a chance to handle the failure programmatically. This is different from input guardrails, which can run in parallel with the agent for lower latency.
Orchestrating Multi-Agent Handoffs
Now we connect all three agents. There are two patterns for this, and we will show both.
Pattern 1: Handoffs (Delegation Chain)
With handoffs, each agent delegates to the next. The Research Agent hands off to the Writer, who hands off to the Reviewer.
# main.py — Handoff pattern
from agents import Agent, Runner, handoff
from agents.research_agent import research_agent, ResearchResult
from agents.writer_agent import writer_agent
from agents.reviewer_agent import reviewer_agent
# Wire up handoffs: Research → Writer → Reviewer
research_agent_with_handoff = Agent(
name="Research Agent",
instructions=research_agent.instructions + """
After completing your research, hand off to the Writer Agent
with your findings so they can draft the article.""",
model="gpt-4o-mini",
tools=research_agent.tools,
handoffs=[writer_agent],
)
writer_agent_with_handoff = Agent(
name="Writer Agent",
instructions=writer_agent.instructions + """
After completing the draft, hand off to the Reviewer Agent
for quality review.""",
model="gpt-4o-mini",
handoffs=[reviewer_agent],
)
def run_with_handoffs(topic: str):
"""Run the full pipeline using the handoff pattern."""
result = Runner.run_sync(
research_agent_with_handoff,
f"Research and produce an article about: {topic}",
max_turns=30,
)
print(f"Final agent: {result.last_agent.name}")
print(f"Output: {result.final_output}")
return result
Pattern 2: Agents as Tools (Orchestrator)
With the orchestrator pattern, a manager agent calls specialist agents as tools:
# main.py — Orchestrator pattern
from agents import Agent, Runner
orchestrator = Agent(
name="Content Pipeline Orchestrator",
instructions="""You manage a content production pipeline. For any topic:
1. First, use the research tool to gather information
2. Then, use the writing tool to produce a draft from the research
3. Finally, use the review tool to check quality
Pass the full output from each step to the next tool.
Return the final reviewed draft to the user.""",
model="gpt-4o-mini",
tools=[
research_agent.as_tool(
tool_name="research_topic",
tool_description="Research a topic and return structured findings with key facts.",
),
writer_agent.as_tool(
tool_name="write_draft",
tool_description="Write an article draft based on provided research findings.",
),
reviewer_agent.as_tool(
tool_name="review_draft",
tool_description="Review an article draft for quality, accuracy, and completeness.",
),
],
)
def run_with_orchestrator(topic: str):
"""Run the full pipeline using the orchestrator pattern."""
result = Runner.run_sync(
orchestrator,
f"Produce a reviewed article about: {topic}",
max_turns=15,
)
print(f"Output: {result.final_output}")
return result
Which Pattern Should You Use?
| Aspect | Handoffs | Agents as Tools |
|---|---|---|
| Control | Each agent decides when to hand off | Orchestrator controls flow |
| Visibility | Active agent changes mid-run | Orchestrator sees all outputs |
| Best for | Linear pipelines, customer service routing | Complex coordination, parallel tasks |
| Guardrails | Input on first agent, output on last | Can apply at orchestrator level |
| Debugging | Follow the handoff chain | Check orchestrator's tool calls |
For our content pipeline, the orchestrator pattern gives more control since we want to pass structured data between steps. The handoff pattern works better for conversational routing where you do not know the path in advance.
Putting It All Together: The Run Script
Here is the complete pipeline using the orchestrator pattern:
# run_pipeline.py
import asyncio
from agents import Agent, Runner, function_tool
from pydantic import BaseModel, Field
# ── Output Schemas ──────────────────────────────
class ResearchResult(BaseModel):
topic: str = Field(description="The researched topic")
summary: str = Field(description="2-3 paragraph summary")
key_facts: list[str] = Field(description="5-8 key facts")
class ReviewResult(BaseModel):
approved: bool
score: int = Field(ge=1, le=10)
feedback: list[str]
final_draft: str
# ── Tools ───────────────────────────────────────
@function_tool
def web_search(query: str) -> str:
"""Search the web for current information.
Args:
query: The search query.
"""
return f"Results for '{query}': Use your knowledge and mark unverified claims."
# ── Agents ──────────────────────────────────────
research_agent = Agent(
name="Research Agent",
instructions=(
"You are a research specialist. Use web_search to find information. "
"Return structured findings with key facts. Mark anything unverified."
),
model="gpt-4o-mini",
tools=[web_search],
output_type=ResearchResult,
)
writer_agent = Agent(
name="Writer Agent",
instructions=(
"You are a technical writer. Take research findings and write a clear, "
"well-structured article in markdown. Target 800-1200 words. "
"Never fabricate data."
),
model="gpt-4o-mini",
)
reviewer_agent = Agent(
name="Reviewer Agent",
instructions=(
"You are a content reviewer. Check the draft for accuracy, structure, "
"and quality. Score 1-10. If score >= 7, approve. Return the final draft."
),
model="gpt-4o-mini",
output_type=ReviewResult,
)
# ── Orchestrator ────────────────────────────────
orchestrator = Agent(
name="Pipeline Orchestrator",
instructions=(
"You manage a content pipeline. For any topic:\n"
"1. Call research_topic to gather information\n"
"2. Call write_draft with the research results\n"
"3. Call review_draft with the written draft\n"
"Return the reviewer's final output to the user."
),
model="gpt-4o-mini",
tools=[
research_agent.as_tool(
tool_name="research_topic",
tool_description="Research a topic thoroughly.",
),
writer_agent.as_tool(
tool_name="write_draft",
tool_description="Write an article from research findings.",
),
reviewer_agent.as_tool(
tool_name="review_draft",
tool_description="Review and score an article draft.",
),
],
)
# ── Run ─────────────────────────────────────────
async def main():
topic = "How multi-agent AI systems are changing software development in 2026"
print(f"Starting pipeline for: {topic}\n")
result = await Runner.run(orchestrator, f"Produce a reviewed article about: {topic}")
print("=" * 60)
print(f"Pipeline complete!")
print(f"Final output:\n{result.final_output}")
print(f"\nToken usage: {result.raw_responses[-1].usage if result.raw_responses else 'N/A'}")
if __name__ == "__main__":
asyncio.run(main())
Run it:
python run_pipeline.py
You should see the orchestrator call each agent in sequence, producing a researched, written, and reviewed article.
Real Cost Breakdown
One of the most common questions about multi-agent systems: how much does it cost to run?
Here is a realistic breakdown for our three-agent pipeline using gpt-4o-mini:
| Agent | Input Tokens (est.) | Output Tokens (est.) | Cost per Run |
|---|---|---|---|
| Research Agent | ~1,500 | ~800 | ~$0.0007 |
| Writer Agent | ~2,000 | ~1,500 | ~$0.0012 |
| Reviewer Agent | ~2,500 | ~600 | ~$0.0008 |
| Orchestrator overhead | ~1,000 | ~500 | ~$0.0005 |
| Total | ~7,000 | ~3,400 | ~$0.003 |
Note: These are estimates based on gpt-4o-mini pricing as of April 2026 (~$0.15/1M input, ~$0.60/1M output tokens). Actual costs vary by prompt length and output verbosity. Always check OpenAI's pricing page for current rates before production use.
Scaling the math:
- 100 articles/day: ~$0.30/day
- 1,000 articles/day: ~$3.00/day
If you switch to gpt-4o for higher quality output, costs increase roughly 15–20x. A common pattern: use gpt-4o-mini for research and writing, gpt-4o for the reviewer agent where quality judgment matters most.
Reducing Costs Further
- Cache research results — Skip the Research Agent for previously researched topics
- Use structured outputs — Pydantic models reduce wasted tokens on formatting
-
Set
max_turns— Prevent agents from looping excessively -
Use
gpt-4o-miniby default — Only upgrade models where quality is critical
OpenAI Agents SDK vs LangGraph vs CrewAI — When to Use Which
If you are evaluating agent frameworks, here is how they compare:
| Feature | OpenAI Agents SDK | LangGraph | CrewAI |
|---|---|---|---|
| Philosophy | Minimal, Pythonic | Graph-based, explicit | Role-based, high-level |
| Learning curve | Low | Medium-High | Low-Medium |
| Multi-agent pattern | Handoffs + tools | State graphs + nodes | Crews + tasks |
| Structured output | Native Pydantic | Via output parsers | Built-in |
| Guardrails | Built-in (input/output) | Custom nodes | Limited |
| LLM support | OpenAI native, 100+ via LiteLLM | Any LLM via LangChain | Multiple providers |
| State management | Context object | Explicit state graph | Shared memory |
| Streaming | Built-in | Built-in | Limited |
| Best for | OpenAI-first teams, rapid prototyping | Complex workflows with branching | Team simulations, role-play agents |
| Production readiness | High | High | Medium |
Choose OpenAI Agents SDK when:
- You primarily use OpenAI models
- You want the fastest path from prototype to production
- Your workflow is a pipeline or triage pattern
- You need built-in guardrails without extra dependencies
Choose LangGraph when:
- Your workflow has complex branching and cycles
- You need fine-grained control over state transitions
- You want explicit, visual workflow graphs
- You are already in the LangChain ecosystem
We covered LangGraph in depth in our LangGraph step-by-step tutorial — if you want to compare both frameworks hands-on, work through both tutorials with the same project.
Choose CrewAI when:
- You think in terms of team roles and collaboration
- You want the highest-level abstraction
- Your use case is research, analysis, or content generation
- You prefer convention over configuration
Advanced Patterns Worth Knowing
Dynamic Instructions
Agent behavior can adapt at runtime:
from agents import Agent, RunContextWrapper
def dynamic_instructions(ctx: RunContextWrapper, agent: Agent) -> str:
user_tier = ctx.context.get("tier", "free")
if user_tier == "pro":
return "Provide detailed, in-depth analysis with code examples."
return "Provide a concise summary suitable for beginners."
adaptive_agent = Agent(
name="Adaptive Agent",
instructions=dynamic_instructions,
model="gpt-4o-mini",
)
Parallel Agent Execution
Run independent agents simultaneously with asyncio.gather:
import asyncio
from agents import Runner
async def parallel_research(topics: list[str]):
tasks = [
Runner.run(research_agent, f"Research: {topic}")
for topic in topics
]
results = await asyncio.gather(*tasks)
return results
Agent Cloning
Create agent variants without duplicating configuration:
formal_writer = writer_agent.clone(
name="Formal Writer",
instructions="Write in a formal, academic tone. " + writer_agent.instructions,
)
casual_writer = writer_agent.clone(
name="Casual Writer",
instructions="Write in a casual, conversational tone. " + writer_agent.instructions,
)
Common Pitfalls and How to Avoid Them
| Pitfall | Solution |
|---|---|
| Agents looping infinitely | Set max_turns on Runner.run()
|
| Vague handoff behavior | Write explicit handoff instructions in the agent's prompt |
| Unstructured data between agents | Use output_type with Pydantic models |
| High costs from GPT-4o | Use gpt-4o-mini for most agents, upgrade selectively |
| Guardrail false positives | Test guardrails independently before integrating |
| Lost context in handoffs | Use input_filter on handoffs to control what the next agent sees |
FAQ
What is the OpenAI Agents SDK?
The OpenAI Agents SDK is an open-source Python framework for building single-agent and multi-agent AI systems. It provides primitives for agent creation, tool use, inter-agent handoffs, and input/output guardrails. It is the production successor to OpenAI's experimental Swarm library.
How do I install the OpenAI Agents SDK?
Install it via pip: pip install openai-agents==0.13.4. The SDK requires Python 3.10 or higher. Set your OPENAI_API_KEY environment variable before running any agent code.
What is the difference between handoffs and agents-as-tools?
Handoffs transfer control entirely — the receiving agent becomes the active agent and responds directly. Agents-as-tools keeps the orchestrator in control — specialist agents run as tool calls and return results to the orchestrator. Use handoffs for routing, agents-as-tools for coordination.
Can I use non-OpenAI models with the Agents SDK?
Yes. The SDK supports over 100 LLMs through LiteLLM integration. You can use Anthropic, Google, Mistral, and local models — though OpenAI models have the most native support.
How much does it cost to run a multi-agent pipeline?
With gpt-4o-mini, a three-agent pipeline typically costs under $0.01 per run. See our cost breakdown for detailed estimates.
Is the OpenAI Agents SDK a replacement for Swarm?
Yes. The Agents SDK is the production-ready evolution of OpenAI's experimental Swarm library. It adds structured outputs, guardrails, streaming, and MCP tool support that Swarm did not have.
How do guardrails work in the OpenAI Agents SDK?
Input guardrails validate user input before or in parallel with the first agent. Output guardrails check the final agent's response. If a guardrail triggers its tripwire, the SDK raises an exception that you can catch and handle. Tool guardrails can also validate individual function calls.
What to Build Next
You now have a working multi-agent pipeline. Here are some directions to take it further:
- Add real search tools — Connect to Brave Search, Serper, or Tavily for live web data
- Combine with RAG — Use retrieval-augmented generation to ground your agents in your own documents
- Add MCP tools — The SDK has built-in MCP server support for connecting to external services
- Build a UI — Wrap the pipeline in a Streamlit or Gradio interface
- Explore vibe coding tools — Use AI app builders to create a frontend for your agent pipeline
If you are exploring the broader AI development ecosystem, check out our guide to free AI coding tools and see how tools like Claude Code approach multi-agent patterns differently with subagents and commands.
Wrapping Up
The OpenAI Agents SDK makes multi-agent systems accessible without requiring deep framework expertise. The core pattern is simple:
- Define agents with clear instructions and tools
- Connect them via handoffs or the orchestrator pattern
- Add guardrails to enforce quality and safety
- Run with the Runner and let the SDK handle orchestration
The hardest part is not the code — it is designing clear agent boundaries and instructions. Spend your time there, and the SDK handles the rest.
All code from this tutorial is available in the project structure above. Clone it, swap in your own tools and prompts, and start building.
This article is part of Effloow's AI Agent Tutorial series. We build and test every framework we write about — see how we run our own company with 16 AI agents.
Some links in this article may be affiliate links. We only recommend tools we have actually tested. See our affiliate disclosure for details.
This article may contain affiliate links to products or services we recommend. If you purchase through these links, we may earn a small commission at no extra cost to you. This helps support Effloow and allows us to continue creating free, high-quality content. See our affiliate disclosure for full details.
Top comments (0)