Jangwook Kim

Posted on Apr 5 • Originally published at effloow.com

Build Your First Multi-Agent System with OpenAI Agents SDK — Step-by-Step Python Tutorial (2026)

#tutorial #openai #ai #python

Build Your First Multi-Agent System with OpenAI Agents SDK — Step-by-Step Python Tutorial (2026)

You have heard the buzz around AI agents. You have probably built a single-agent chatbot. But real-world automation needs multiple agents working together — one to research, one to write, one to review and catch mistakes.

The OpenAI Agents SDK makes this surprisingly straightforward. In this tutorial, you will build a complete multi-agent content pipeline: a Research Agent gathers information, a Writer Agent drafts content, and a Reviewer Agent validates quality with guardrails. All orchestrated through handoffs, running in under 100 lines of core logic.

By the end, you will understand every building block — Agent, Runner, Handoff, and Guardrails — and have a working system you can adapt to your own projects.

What you will build: A three-agent content pipeline where agents hand off work to each other automatically. The Research Agent finds information, the Writer Agent creates a draft, and the Reviewer Agent enforces quality standards using guardrails.

What Is the OpenAI Agents SDK?

The OpenAI Agents SDK is an open-source Python framework for building multi-agent AI systems. Originally developed as a successor to the experimental Swarm library, it provides production-ready primitives for creating agents that can use tools, delegate work to each other, and enforce safety checks — all with minimal boilerplate.

Key characteristics:

Feature	Detail
Language	Python (3.10+)
Current version	0.13.4 (April 2026)
License	MIT
LLM support	OpenAI models natively, 100+ models via LiteLLM
Core primitives	Agent, Runner, Handoff, Guardrails, Tools
Install size	Lightweight — Pydantic and Requests as main dependencies

Unlike heavier frameworks that require you to learn complex graph abstractions, the OpenAI Agents SDK keeps things Pythonic. You define agents as objects, wire them together with handoffs, and run them with a single Runner.run() call.

If you have worked with the LangGraph framework (covered in our previous tutorial), the Agents SDK takes a different philosophy: less explicit graph construction, more implicit orchestration through handoffs and tool calls.

Prerequisites

Before we start building, make sure you have:

Python 3.10 or higher (the SDK requires 3.10+, supports up to 3.14)
An OpenAI API key with access to GPT-4o or later models
Basic Python knowledge — functions, classes, async/await
A terminal and a code editor

Installation

Create a project directory and set up a virtual environment:

mkdir multi-agent-pipeline && cd multi-agent-pipeline
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the SDK with a pinned version:

pip install openai-agents==0.13.4

Create a requirements.txt for reproducibility:

openai-agents==0.13.4
pydantic>=2.0

API Key Setup

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="sk-your-key-here"

Or create a .env file (never commit this to version control):

OPENAI_API_KEY=sk-your-key-here

Cost note: This tutorial uses gpt-4o-mini for most agents to keep costs low. A full pipeline run typically consumes 3,000–8,000 tokens. At current pricing (approximately $0.15 per 1M input tokens and $0.60 per 1M output tokens for gpt-4o-mini), each run costs well under $0.01. We break down real costs in the cost analysis section below.

Core Concepts: Agent, Runner, Handoff, Guardrails

Before writing code, let us understand the four building blocks. These are the only concepts you need to build sophisticated multi-agent systems.

Agent

An Agent is an LLM equipped with instructions and tools. Think of it as a specialized worker with a clear job description.

from agents import Agent

research_agent = Agent(
    name="Research Agent",
    instructions="You are a research specialist. Find accurate, up-to-date information on any topic.",
    model="gpt-4o-mini",
)

Key parameters:

name — Human-readable identifier
instructions — The system prompt that defines the agent's behavior
model — Which LLM to use
tools — Functions the agent can call
handoffs — Other agents it can delegate to
output_type — Pydantic model for structured output

Runner

The Runner executes agents. It manages the agent loop: call the LLM, process tool calls, handle handoffs, and repeat until the agent produces a final output.

from agents import Runner

result = Runner.run_sync(research_agent, "What is retrieval-augmented generation?")
print(result.final_output)

Three execution modes:

Method	Use case
`Runner.run()`	Async execution (recommended for production)
`Runner.run_sync()`	Synchronous wrapper (simpler for scripts)
`Runner.run_streamed()`	Async with streaming events

Handoff

A Handoff lets one agent delegate work to another. Under the hood, handoffs appear as tools to the LLM — when the triage agent decides to hand off to the writer, it calls a transfer_to_writer_agent tool.

triage_agent = Agent(
    name="Triage Agent",
    instructions="Route research requests to the Research Agent, writing tasks to the Writer Agent.",
    handoffs=[research_agent, writer_agent],
)

Guardrails

Guardrails validate inputs and outputs. They can run in parallel with agent execution (for speed) or block execution until validation passes (for safety).

from agents import input_guardrail, GuardrailFunctionOutput

@input_guardrail
async def check_topic_safety(ctx, agent, input):
    # Validate the input before the agent processes it
    result = ...  # Your validation logic
    return GuardrailFunctionOutput(
        output_info={"safe": True},
        tripwire_triggered=False,
    )

Project Structure

Here is what we are building:

multi-agent-pipeline/
├── requirements.txt
├── agents/
│   ├── __init__.py
│   ├── research_agent.py
│   ├── writer_agent.py
│   └── reviewer_agent.py
├── tools/
│   ├── __init__.py
│   └── search_tools.py
├── guardrails/
│   ├── __init__.py
│   └── quality_checks.py
├── main.py
└── run_pipeline.py

Let us build each piece step by step.

Building Agent #1: The Research Agent

The Research Agent's job is to gather information on a given topic. We will give it a web search tool and structured output so downstream agents get clean data.

Define the Output Schema

First, define what the Research Agent should return:

# agents/research_agent.py
from pydantic import BaseModel, Field
from agents import Agent, function_tool


class ResearchResult(BaseModel):
    """Structured output from the Research Agent."""
    topic: str = Field(description="The researched topic")
    summary: str = Field(description="A 2-3 paragraph summary of findings")
    key_facts: list[str] = Field(description="5-8 key facts discovered")
    sources_note: str = Field(description="Note about information sources and currency")

Create the Search Tool

The Research Agent needs a tool to search for information. Here we create a simulated search tool — in production, you would connect this to a real search API:

# tools/search_tools.py
from agents import function_tool


@function_tool
def web_search(query: str) -> str:
    """Search the web for information on a given query.

    Args:
        query: The search query to look up.
    """
    # In production, connect to a real search API (Brave, Serper, Tavily, etc.)
    # For this tutorial, the agent will use its training knowledge
    # and note that results should be verified.
    return (
        f"Search results for: '{query}'\n"
        f"Note: In production, this would return real search results. "
        f"The agent should use its knowledge and clearly mark any claims "
        f"that need verification."
    )


@function_tool
def save_research_notes(topic: str, notes: str) -> str:
    """Save research notes for a topic.

    Args:
        topic: The topic being researched.
        notes: The research notes to save.
    """
    # In production, persist to a database or file
    return f"Research notes saved for topic: {topic}"

Assemble the Research Agent

# agents/research_agent.py (continued)
from tools.search_tools import web_search, save_research_notes

research_agent = Agent(
    name="Research Agent",
    instructions="""You are an expert research analyst. Your job is to gather
accurate, comprehensive information on any given topic.

Rules:
- Search for the topic using the web_search tool
- Compile findings into a structured format
- Include 5-8 key facts with specific details
- Note the currency and reliability of information
- If you cannot verify a claim, mark it as [UNVERIFIED]
- Never fabricate statistics or quotes""",
    model="gpt-4o-mini",
    tools=[web_search, save_research_notes],
    output_type=ResearchResult,
)

Test It Standalone

from agents import Runner

result = Runner.run_sync(
    research_agent,
    "Research the current state of multi-agent AI systems in 2026"
)
print(f"Topic: {result.final_output.topic}")
print(f"Summary: {result.final_output.summary}")
for fact in result.final_output.key_facts:
    print(f"  • {fact}")

Pro tip: Using output_type=ResearchResult forces the agent to return a Pydantic model instead of free text. This is critical for multi-agent pipelines — downstream agents receive predictable, typed data instead of parsing unstructured strings. The SDK handles JSON schema generation and validation automatically.

Building Agent #2: The Writer Agent

The Writer Agent takes research output and produces a well-structured draft. It receives the Research Agent's structured output as its input context.

Define the Writer Output

# agents/writer_agent.py
from pydantic import BaseModel, Field
from agents import Agent


class WriterOutput(BaseModel):
    """Structured output from the Writer Agent."""
    title: str = Field(description="Article title")
    draft: str = Field(description="The full article draft in markdown")
    word_count: int = Field(description="Approximate word count")
    sections: list[str] = Field(description="List of section headings used")

Assemble the Writer Agent

# agents/writer_agent.py (continued)

writer_agent = Agent(
    name="Writer Agent",
    instructions="""You are a skilled technical writer. Your job is to take
research findings and produce a well-structured, engaging article draft.

Rules:
- Write in a clear, practical tone suitable for developers
- Use markdown formatting with proper headings (##, ###)
- Include code examples where relevant
- Target 800-1200 words for the draft
- Structure: Introduction → Main Sections → Practical Takeaways → Conclusion
- Never fabricate quotes, statistics, or case studies
- If the research notes something as [UNVERIFIED], keep that marker""",
    model="gpt-4o-mini",
    output_type=WriterOutput,
)

Notice the Writer Agent has no tools — it is a pure text generation agent. Not every agent needs tools. The Writer focuses entirely on transforming structured research into polished prose.

Building Agent #3: The Reviewer Agent with Guardrails

The Reviewer Agent is our quality gate. It checks the draft for accuracy, completeness, and quality issues. This is where guardrails shine.

Define Quality Check Guardrails

# guardrails/quality_checks.py
from agents import output_guardrail, GuardrailFunctionOutput


@output_guardrail
async def check_no_fabrication(ctx, agent, output):
    """Check that the output does not contain fabricated data markers."""
    draft_text = output.draft if hasattr(output, 'draft') else str(output)

    fabrication_markers = [
        "according to a study",  # vague attribution without source
        "research shows that 99%",  # suspicious round statistics
        "as John Smith, CEO",  # likely fabricated quotes
    ]

    issues = []
    for marker in fabrication_markers:
        if marker.lower() in draft_text.lower():
            issues.append(f"Potential fabrication detected: '{marker}'")

    return GuardrailFunctionOutput(
        output_info={"issues": issues, "passed": len(issues) == 0},
        tripwire_triggered=len(issues) > 0,
    )


@output_guardrail
async def check_minimum_length(ctx, agent, output):
    """Ensure the draft meets minimum word count."""
    draft_text = output.draft if hasattr(output, 'draft') else str(output)
    word_count = len(draft_text.split())

    return GuardrailFunctionOutput(
        output_info={"word_count": word_count, "minimum": 200},
        tripwire_triggered=word_count < 200,
    )

Define the Review Output

# agents/reviewer_agent.py
from pydantic import BaseModel, Field
from agents import Agent
from guardrails.quality_checks import check_no_fabrication, check_minimum_length


class ReviewResult(BaseModel):
    """Structured output from the Reviewer Agent."""
    approved: bool = Field(description="Whether the draft passes review")
    score: int = Field(description="Quality score from 1-10")
    feedback: list[str] = Field(description="List of feedback items")
    final_draft: str = Field(description="The approved or revised draft")

Assemble the Reviewer Agent

# agents/reviewer_agent.py (continued)

reviewer_agent = Agent(
    name="Reviewer Agent",
    instructions="""You are a meticulous content reviewer and editor. Your job
is to evaluate article drafts for quality, accuracy, and completeness.

Review checklist:
1. Factual accuracy — flag any claims that seem unsupported
2. Structure — verify logical flow and proper headings
3. Completeness — ensure the topic is covered adequately
4. Tone — confirm it matches a practical, developer-friendly style
5. No fabrication — reject any invented statistics, quotes, or case studies

Scoring guide:
- 8-10: Approve with minor notes
- 5-7: Needs revision, provide specific feedback
- 1-4: Reject, major issues found

If approved, return the draft as-is in final_draft.
If revisions are needed, apply them yourself and return the improved version.""",
    model="gpt-4o-mini",
    output_type=ReviewResult,
    output_guardrails=[check_no_fabrication, check_minimum_length],
)

Pro tip: Output guardrails run after the agent produces its result but before it is returned to your code. If a guardrail trips, the SDK raises OutputGuardrailTripwireTriggered, giving you a chance to handle the failure programmatically. This is different from input guardrails, which can run in parallel with the agent for lower latency.

Orchestrating Multi-Agent Handoffs

Now we connect all three agents. There are two patterns for this, and we will show both.

Pattern 1: Handoffs (Delegation Chain)

With handoffs, each agent delegates to the next. The Research Agent hands off to the Writer, who hands off to the Reviewer.

# main.py — Handoff pattern
from agents import Agent, Runner, handoff

from agents.research_agent import research_agent, ResearchResult
from agents.writer_agent import writer_agent
from agents.reviewer_agent import reviewer_agent


# Wire up handoffs: Research → Writer → Reviewer
research_agent_with_handoff = Agent(
    name="Research Agent",
    instructions=research_agent.instructions + """

After completing your research, hand off to the Writer Agent
with your findings so they can draft the article.""",
    model="gpt-4o-mini",
    tools=research_agent.tools,
    handoffs=[writer_agent],
)

writer_agent_with_handoff = Agent(
    name="Writer Agent",
    instructions=writer_agent.instructions + """

After completing the draft, hand off to the Reviewer Agent
for quality review.""",
    model="gpt-4o-mini",
    handoffs=[reviewer_agent],
)


def run_with_handoffs(topic: str):
    """Run the full pipeline using the handoff pattern."""
    result = Runner.run_sync(
        research_agent_with_handoff,
        f"Research and produce an article about: {topic}",
        max_turns=30,
    )
    print(f"Final agent: {result.last_agent.name}")
    print(f"Output: {result.final_output}")
    return result

Pattern 2: Agents as Tools (Orchestrator)

With the orchestrator pattern, a manager agent calls specialist agents as tools:

# main.py — Orchestrator pattern
from agents import Agent, Runner


orchestrator = Agent(
    name="Content Pipeline Orchestrator",
    instructions="""You manage a content production pipeline. For any topic:

1. First, use the research tool to gather information
2. Then, use the writing tool to produce a draft from the research
3. Finally, use the review tool to check quality

Pass the full output from each step to the next tool.
Return the final reviewed draft to the user.""",
    model="gpt-4o-mini",
    tools=[
        research_agent.as_tool(
            tool_name="research_topic",
            tool_description="Research a topic and return structured findings with key facts.",
        ),
        writer_agent.as_tool(
            tool_name="write_draft",
            tool_description="Write an article draft based on provided research findings.",
        ),
        reviewer_agent.as_tool(
            tool_name="review_draft",
            tool_description="Review an article draft for quality, accuracy, and completeness.",
        ),
    ],
)


def run_with_orchestrator(topic: str):
    """Run the full pipeline using the orchestrator pattern."""
    result = Runner.run_sync(
        orchestrator,
        f"Produce a reviewed article about: {topic}",
        max_turns=15,
    )
    print(f"Output: {result.final_output}")
    return result

Which Pattern Should You Use?

Aspect	Handoffs	Agents as Tools
Control	Each agent decides when to hand off	Orchestrator controls flow
Visibility	Active agent changes mid-run	Orchestrator sees all outputs
Best for	Linear pipelines, customer service routing	Complex coordination, parallel tasks
Guardrails	Input on first agent, output on last	Can apply at orchestrator level
Debugging	Follow the handoff chain	Check orchestrator's tool calls

For our content pipeline, the orchestrator pattern gives more control since we want to pass structured data between steps. The handoff pattern works better for conversational routing where you do not know the path in advance.

Putting It All Together: The Run Script

Here is the complete pipeline using the orchestrator pattern:

# run_pipeline.py
import asyncio
from agents import Agent, Runner, function_tool
from pydantic import BaseModel, Field


# ── Output Schemas ──────────────────────────────

class ResearchResult(BaseModel):
    topic: str = Field(description="The researched topic")
    summary: str = Field(description="2-3 paragraph summary")
    key_facts: list[str] = Field(description="5-8 key facts")


class ReviewResult(BaseModel):
    approved: bool
    score: int = Field(ge=1, le=10)
    feedback: list[str]
    final_draft: str


# ── Tools ───────────────────────────────────────

@function_tool
def web_search(query: str) -> str:
    """Search the web for current information.

    Args:
        query: The search query.
    """
    return f"Results for '{query}': Use your knowledge and mark unverified claims."


# ── Agents ──────────────────────────────────────

research_agent = Agent(
    name="Research Agent",
    instructions=(
        "You are a research specialist. Use web_search to find information. "
        "Return structured findings with key facts. Mark anything unverified."
    ),
    model="gpt-4o-mini",
    tools=[web_search],
    output_type=ResearchResult,
)

writer_agent = Agent(
    name="Writer Agent",
    instructions=(
        "You are a technical writer. Take research findings and write a clear, "
        "well-structured article in markdown. Target 800-1200 words. "
        "Never fabricate data."
    ),
    model="gpt-4o-mini",
)

reviewer_agent = Agent(
    name="Reviewer Agent",
    instructions=(
        "You are a content reviewer. Check the draft for accuracy, structure, "
        "and quality. Score 1-10. If score >= 7, approve. Return the final draft."
    ),
    model="gpt-4o-mini",
    output_type=ReviewResult,
)

# ── Orchestrator ────────────────────────────────

orchestrator = Agent(
    name="Pipeline Orchestrator",
    instructions=(
        "You manage a content pipeline. For any topic:\n"
        "1. Call research_topic to gather information\n"
        "2. Call write_draft with the research results\n"
        "3. Call review_draft with the written draft\n"
        "Return the reviewer's final output to the user."
    ),
    model="gpt-4o-mini",
    tools=[
        research_agent.as_tool(
            tool_name="research_topic",
            tool_description="Research a topic thoroughly.",
        ),
        writer_agent.as_tool(
            tool_name="write_draft",
            tool_description="Write an article from research findings.",
        ),
        reviewer_agent.as_tool(
            tool_name="review_draft",
            tool_description="Review and score an article draft.",
        ),
    ],
)


# ── Run ─────────────────────────────────────────

async def main():
    topic = "How multi-agent AI systems are changing software development in 2026"

    print(f"Starting pipeline for: {topic}\n")
    result = await Runner.run(orchestrator, f"Produce a reviewed article about: {topic}")

    print("=" * 60)
    print(f"Pipeline complete!")
    print(f"Final output:\n{result.final_output}")
    print(f"\nToken usage: {result.raw_responses[-1].usage if result.raw_responses else 'N/A'}")


if __name__ == "__main__":
    asyncio.run(main())

Run it:

python run_pipeline.py

You should see the orchestrator call each agent in sequence, producing a researched, written, and reviewed article.

Real Cost Breakdown

One of the most common questions about multi-agent systems: how much does it cost to run?

Here is a realistic breakdown for our three-agent pipeline using gpt-4o-mini:

Agent	Input Tokens (est.)	Output Tokens (est.)	Cost per Run
Research Agent	~1,500	~800	~$0.0007
Writer Agent	~2,000	~1,500	~$0.0012
Reviewer Agent	~2,500	~600	~$0.0008
Orchestrator overhead	~1,000	~500	~$0.0005
Total	~7,000	~3,400	~$0.003

Note: These are estimates based on gpt-4o-mini pricing as of April 2026 (~$0.15/1M input, ~$0.60/1M output tokens). Actual costs vary by prompt length and output verbosity. Always check OpenAI's pricing page for current rates before production use.

Scaling the math:

100 articles/day: ~$0.30/day
1,000 articles/day: ~$3.00/day

If you switch to gpt-4o for higher quality output, costs increase roughly 15–20x. A common pattern: use gpt-4o-mini for research and writing, gpt-4o for the reviewer agent where quality judgment matters most.

Reducing Costs Further

Cache research results — Skip the Research Agent for previously researched topics
Use structured outputs — Pydantic models reduce wasted tokens on formatting
Set max_turns — Prevent agents from looping excessively
Use gpt-4o-mini by default — Only upgrade models where quality is critical

OpenAI Agents SDK vs LangGraph vs CrewAI — When to Use Which

If you are evaluating agent frameworks, here is how they compare:

Feature	OpenAI Agents SDK	LangGraph	CrewAI
Philosophy	Minimal, Pythonic	Graph-based, explicit	Role-based, high-level
Learning curve	Low	Medium-High	Low-Medium
Multi-agent pattern	Handoffs + tools	State graphs + nodes	Crews + tasks
Structured output	Native Pydantic	Via output parsers	Built-in
Guardrails	Built-in (input/output)	Custom nodes	Limited
LLM support	OpenAI native, 100+ via LiteLLM	Any LLM via LangChain	Multiple providers
State management	Context object	Explicit state graph	Shared memory
Streaming	Built-in	Built-in	Limited
Best for	OpenAI-first teams, rapid prototyping	Complex workflows with branching	Team simulations, role-play agents
Production readiness	High	High	Medium

Choose OpenAI Agents SDK when:

You primarily use OpenAI models
You want the fastest path from prototype to production
Your workflow is a pipeline or triage pattern
You need built-in guardrails without extra dependencies

Choose LangGraph when:

Your workflow has complex branching and cycles
You need fine-grained control over state transitions
You want explicit, visual workflow graphs
You are already in the LangChain ecosystem

We covered LangGraph in depth in our LangGraph step-by-step tutorial — if you want to compare both frameworks hands-on, work through both tutorials with the same project.

Choose CrewAI when:

You think in terms of team roles and collaboration
You want the highest-level abstraction
Your use case is research, analysis, or content generation
You prefer convention over configuration

Advanced Patterns Worth Knowing

Dynamic Instructions

Agent behavior can adapt at runtime:

from agents import Agent, RunContextWrapper


def dynamic_instructions(ctx: RunContextWrapper, agent: Agent) -> str:
    user_tier = ctx.context.get("tier", "free")
    if user_tier == "pro":
        return "Provide detailed, in-depth analysis with code examples."
    return "Provide a concise summary suitable for beginners."


adaptive_agent = Agent(
    name="Adaptive Agent",
    instructions=dynamic_instructions,
    model="gpt-4o-mini",
)

Parallel Agent Execution

Run independent agents simultaneously with asyncio.gather:

import asyncio
from agents import Runner


async def parallel_research(topics: list[str]):
    tasks = [
        Runner.run(research_agent, f"Research: {topic}")
        for topic in topics
    ]
    results = await asyncio.gather(*tasks)
    return results

Agent Cloning

Create agent variants without duplicating configuration:

formal_writer = writer_agent.clone(
    name="Formal Writer",
    instructions="Write in a formal, academic tone. " + writer_agent.instructions,
)

casual_writer = writer_agent.clone(
    name="Casual Writer",
    instructions="Write in a casual, conversational tone. " + writer_agent.instructions,
)

Common Pitfalls and How to Avoid Them

Pitfall	Solution
Agents looping infinitely	Set `max_turns` on `Runner.run()`
Vague handoff behavior	Write explicit handoff instructions in the agent's prompt
Unstructured data between agents	Use `output_type` with Pydantic models
High costs from GPT-4o	Use `gpt-4o-mini` for most agents, upgrade selectively
Guardrail false positives	Test guardrails independently before integrating
Lost context in handoffs	Use `input_filter` on handoffs to control what the next agent sees

FAQ

What is the OpenAI Agents SDK?

The OpenAI Agents SDK is an open-source Python framework for building single-agent and multi-agent AI systems. It provides primitives for agent creation, tool use, inter-agent handoffs, and input/output guardrails. It is the production successor to OpenAI's experimental Swarm library.

How do I install the OpenAI Agents SDK?

Install it via pip: pip install openai-agents==0.13.4. The SDK requires Python 3.10 or higher. Set your OPENAI_API_KEY environment variable before running any agent code.

What is the difference between handoffs and agents-as-tools?

Handoffs transfer control entirely — the receiving agent becomes the active agent and responds directly. Agents-as-tools keeps the orchestrator in control — specialist agents run as tool calls and return results to the orchestrator. Use handoffs for routing, agents-as-tools for coordination.

Can I use non-OpenAI models with the Agents SDK?

Yes. The SDK supports over 100 LLMs through LiteLLM integration. You can use Anthropic, Google, Mistral, and local models — though OpenAI models have the most native support.

How much does it cost to run a multi-agent pipeline?

With gpt-4o-mini, a three-agent pipeline typically costs under $0.01 per run. See our cost breakdown for detailed estimates.

Is the OpenAI Agents SDK a replacement for Swarm?

Yes. The Agents SDK is the production-ready evolution of OpenAI's experimental Swarm library. It adds structured outputs, guardrails, streaming, and MCP tool support that Swarm did not have.

How do guardrails work in the OpenAI Agents SDK?

Input guardrails validate user input before or in parallel with the first agent. Output guardrails check the final agent's response. If a guardrail triggers its tripwire, the SDK raises an exception that you can catch and handle. Tool guardrails can also validate individual function calls.

What to Build Next

You now have a working multi-agent pipeline. Here are some directions to take it further:

Add real search tools — Connect to Brave Search, Serper, or Tavily for live web data
Combine with RAG — Use retrieval-augmented generation to ground your agents in your own documents
Add MCP tools — The SDK has built-in MCP server support for connecting to external services
Build a UI — Wrap the pipeline in a Streamlit or Gradio interface
Explore vibe coding tools — Use AI app builders to create a frontend for your agent pipeline

If you are exploring the broader AI development ecosystem, check out our guide to free AI coding tools and see how tools like Claude Code approach multi-agent patterns differently with subagents and commands.

Wrapping Up

The OpenAI Agents SDK makes multi-agent systems accessible without requiring deep framework expertise. The core pattern is simple:

Define agents with clear instructions and tools
Connect them via handoffs or the orchestrator pattern
Add guardrails to enforce quality and safety
Run with the Runner and let the SDK handle orchestration

The hardest part is not the code — it is designing clear agent boundaries and instructions. Spend your time there, and the SDK handles the rest.

All code from this tutorial is available in the project structure above. Clone it, swap in your own tools and prompts, and start building.

This article is part of Effloow's AI Agent Tutorial series. We build and test every framework we write about — see how we run our own company with 16 AI agents.

Some links in this article may be affiliate links. We only recommend tools we have actually tested. See our affiliate disclosure for details.

This article may contain affiliate links to products or services we recommend. If you purchase through these links, we may earn a small commission at no extra cost to you. This helps support Effloow and allows us to continue creating free, high-quality content. See our affiliate disclosure for full details.