DEV Community

Jangwook Kim
Jangwook Kim

Posted on • Originally published at effloow.com

Build Your First Multi-Agent System with OpenAI Agents SDK — Step-by-Step Python Tutorial (2026)

Build Your First Multi-Agent System with OpenAI Agents SDK — Step-by-Step Python Tutorial (2026)

You have heard the buzz around AI agents. You have probably built a single-agent chatbot. But real-world automation needs multiple agents working together — one to research, one to write, one to review and catch mistakes.

The OpenAI Agents SDK makes this surprisingly straightforward. In this tutorial, you will build a complete multi-agent content pipeline: a Research Agent gathers information, a Writer Agent drafts content, and a Reviewer Agent validates quality with guardrails. All orchestrated through handoffs, running in under 100 lines of core logic.

By the end, you will understand every building block — Agent, Runner, Handoff, and Guardrails — and have a working system you can adapt to your own projects.

What you will build: A three-agent content pipeline where agents hand off work to each other automatically. The Research Agent finds information, the Writer Agent creates a draft, and the Reviewer Agent enforces quality standards using guardrails.


What Is the OpenAI Agents SDK?

The OpenAI Agents SDK is an open-source Python framework for building multi-agent AI systems. Originally developed as a successor to the experimental Swarm library, it provides production-ready primitives for creating agents that can use tools, delegate work to each other, and enforce safety checks — all with minimal boilerplate.

Key characteristics:

Feature Detail
Language Python (3.10+)
Current version 0.13.4 (April 2026)
License MIT
LLM support OpenAI models natively, 100+ models via LiteLLM
Core primitives Agent, Runner, Handoff, Guardrails, Tools
Install size Lightweight — Pydantic and Requests as main dependencies

Unlike heavier frameworks that require you to learn complex graph abstractions, the OpenAI Agents SDK keeps things Pythonic. You define agents as objects, wire them together with handoffs, and run them with a single Runner.run() call.

If you have worked with the LangGraph framework (covered in our previous tutorial), the Agents SDK takes a different philosophy: less explicit graph construction, more implicit orchestration through handoffs and tool calls.


Prerequisites

Before we start building, make sure you have:

  • Python 3.10 or higher (the SDK requires 3.10+, supports up to 3.14)
  • An OpenAI API key with access to GPT-4o or later models
  • Basic Python knowledge — functions, classes, async/await
  • A terminal and a code editor

Installation

Create a project directory and set up a virtual environment:

mkdir multi-agent-pipeline && cd multi-agent-pipeline
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

Install the SDK with a pinned version:

pip install openai-agents==0.13.4
Enter fullscreen mode Exit fullscreen mode

Create a requirements.txt for reproducibility:

openai-agents==0.13.4
pydantic>=2.0
Enter fullscreen mode Exit fullscreen mode

API Key Setup

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="sk-your-key-here"
Enter fullscreen mode Exit fullscreen mode

Or create a .env file (never commit this to version control):

OPENAI_API_KEY=sk-your-key-here
Enter fullscreen mode Exit fullscreen mode

Cost note: This tutorial uses gpt-4o-mini for most agents to keep costs low. A full pipeline run typically consumes 3,000–8,000 tokens. At current pricing (approximately $0.15 per 1M input tokens and $0.60 per 1M output tokens for gpt-4o-mini), each run costs well under $0.01. We break down real costs in the cost analysis section below.


Core Concepts: Agent, Runner, Handoff, Guardrails

Before writing code, let us understand the four building blocks. These are the only concepts you need to build sophisticated multi-agent systems.

Agent

An Agent is an LLM equipped with instructions and tools. Think of it as a specialized worker with a clear job description.

from agents import Agent

research_agent = Agent(
    name="Research Agent",
    instructions="You are a research specialist. Find accurate, up-to-date information on any topic.",
    model="gpt-4o-mini",
)
Enter fullscreen mode Exit fullscreen mode

Key parameters:

  • name — Human-readable identifier
  • instructions — The system prompt that defines the agent's behavior
  • model — Which LLM to use
  • tools — Functions the agent can call
  • handoffs — Other agents it can delegate to
  • output_type — Pydantic model for structured output

Runner

The Runner executes agents. It manages the agent loop: call the LLM, process tool calls, handle handoffs, and repeat until the agent produces a final output.

from agents import Runner

result = Runner.run_sync(research_agent, "What is retrieval-augmented generation?")
print(result.final_output)
Enter fullscreen mode Exit fullscreen mode

Three execution modes:

Method Use case
Runner.run() Async execution (recommended for production)
Runner.run_sync() Synchronous wrapper (simpler for scripts)
Runner.run_streamed() Async with streaming events

Handoff

A Handoff lets one agent delegate work to another. Under the hood, handoffs appear as tools to the LLM — when the triage agent decides to hand off to the writer, it calls a transfer_to_writer_agent tool.

triage_agent = Agent(
    name="Triage Agent",
    instructions="Route research requests to the Research Agent, writing tasks to the Writer Agent.",
    handoffs=[research_agent, writer_agent],
)
Enter fullscreen mode Exit fullscreen mode

Guardrails

Guardrails validate inputs and outputs. They can run in parallel with agent execution (for speed) or block execution until validation passes (for safety).

from agents import input_guardrail, GuardrailFunctionOutput

@input_guardrail
async def check_topic_safety(ctx, agent, input):
    # Validate the input before the agent processes it
    result = ...  # Your validation logic
    return GuardrailFunctionOutput(
        output_info={"safe": True},
        tripwire_triggered=False,
    )
Enter fullscreen mode Exit fullscreen mode

Project Structure

Here is what we are building:

multi-agent-pipeline/
├── requirements.txt
├── agents/
│   ├── __init__.py
│   ├── research_agent.py
│   ├── writer_agent.py
│   └── reviewer_agent.py
├── tools/
│   ├── __init__.py
│   └── search_tools.py
├── guardrails/
│   ├── __init__.py
│   └── quality_checks.py
├── main.py
└── run_pipeline.py
Enter fullscreen mode Exit fullscreen mode

Let us build each piece step by step.


Building Agent #1: The Research Agent

The Research Agent's job is to gather information on a given topic. We will give it a web search tool and structured output so downstream agents get clean data.

Define the Output Schema

First, define what the Research Agent should return:

# agents/research_agent.py
from pydantic import BaseModel, Field
from agents import Agent, function_tool


class ResearchResult(BaseModel):
    """Structured output from the Research Agent."""
    topic: str = Field(description="The researched topic")
    summary: str = Field(description="A 2-3 paragraph summary of findings")
    key_facts: list[str] = Field(description="5-8 key facts discovered")
    sources_note: str = Field(description="Note about information sources and currency")
Enter fullscreen mode Exit fullscreen mode

Create the Search Tool

The Research Agent needs a tool to search for information. Here we create a simulated search tool — in production, you would connect this to a real search API:

# tools/search_tools.py
from agents import function_tool


@function_tool
def web_search(query: str) -> str:
    """Search the web for information on a given query.

    Args:
        query: The search query to look up.
    """
    # In production, connect to a real search API (Brave, Serper, Tavily, etc.)
    # For this tutorial, the agent will use its training knowledge
    # and note that results should be verified.
    return (
        f"Search results for: '{query}'\n"
        f"Note: In production, this would return real search results. "
        f"The agent should use its knowledge and clearly mark any claims "
        f"that need verification."
    )


@function_tool
def save_research_notes(topic: str, notes: str) -> str:
    """Save research notes for a topic.

    Args:
        topic: The topic being researched.
        notes: The research notes to save.
    """
    # In production, persist to a database or file
    return f"Research notes saved for topic: {topic}"
Enter fullscreen mode Exit fullscreen mode

Assemble the Research Agent

# agents/research_agent.py (continued)
from tools.search_tools import web_search, save_research_notes

research_agent = Agent(
    name="Research Agent",
    instructions="""You are an expert research analyst. Your job is to gather
accurate, comprehensive information on any given topic.

Rules:
- Search for the topic using the web_search tool
- Compile findings into a structured format
- Include 5-8 key facts with specific details
- Note the currency and reliability of information
- If you cannot verify a claim, mark it as [UNVERIFIED]
- Never fabricate statistics or quotes""",
    model="gpt-4o-mini",
    tools=[web_search, save_research_notes],
    output_type=ResearchResult,
)
Enter fullscreen mode Exit fullscreen mode

Test It Standalone

from agents import Runner

result = Runner.run_sync(
    research_agent,
    "Research the current state of multi-agent AI systems in 2026"
)
print(f"Topic: {result.final_output.topic}")
print(f"Summary: {result.final_output.summary}")
for fact in result.final_output.key_facts:
    print(f"{fact}")
Enter fullscreen mode Exit fullscreen mode

Pro tip: Using output_type=ResearchResult forces the agent to return a Pydantic model instead of free text. This is critical for multi-agent pipelines — downstream agents receive predictable, typed data instead of parsing unstructured strings. The SDK handles JSON schema generation and validation automatically.


Building Agent #2: The Writer Agent

The Writer Agent takes research output and produces a well-structured draft. It receives the Research Agent's structured output as its input context.

Define the Writer Output

# agents/writer_agent.py
from pydantic import BaseModel, Field
from agents import Agent


class WriterOutput(BaseModel):
    """Structured output from the Writer Agent."""
    title: str = Field(description="Article title")
    draft: str = Field(description="The full article draft in markdown")
    word_count: int = Field(description="Approximate word count")
    sections: list[str] = Field(description="List of section headings used")
Enter fullscreen mode Exit fullscreen mode

Assemble the Writer Agent

# agents/writer_agent.py (continued)

writer_agent = Agent(
    name="Writer Agent",
    instructions="""You are a skilled technical writer. Your job is to take
research findings and produce a well-structured, engaging article draft.

Rules:
- Write in a clear, practical tone suitable for developers
- Use markdown formatting with proper headings (##, ###)
- Include code examples where relevant
- Target 800-1200 words for the draft
- Structure: Introduction → Main Sections → Practical Takeaways → Conclusion
- Never fabricate quotes, statistics, or case studies
- If the research notes something as [UNVERIFIED], keep that marker""",
    model="gpt-4o-mini",
    output_type=WriterOutput,
)
Enter fullscreen mode Exit fullscreen mode

Notice the Writer Agent has no tools — it is a pure text generation agent. Not every agent needs tools. The Writer focuses entirely on transforming structured research into polished prose.


Building Agent #3: The Reviewer Agent with Guardrails

The Reviewer Agent is our quality gate. It checks the draft for accuracy, completeness, and quality issues. This is where guardrails shine.

Define Quality Check Guardrails

# guardrails/quality_checks.py
from agents import output_guardrail, GuardrailFunctionOutput


@output_guardrail
async def check_no_fabrication(ctx, agent, output):
    """Check that the output does not contain fabricated data markers."""
    draft_text = output.draft if hasattr(output, 'draft') else str(output)

    fabrication_markers = [
        "according to a study",  # vague attribution without source
        "research shows that 99%",  # suspicious round statistics
        "as John Smith, CEO",  # likely fabricated quotes
    ]

    issues = []
    for marker in fabrication_markers:
        if marker.lower() in draft_text.lower():
            issues.append(f"Potential fabrication detected: '{marker}'")

    return GuardrailFunctionOutput(
        output_info={"issues": issues, "passed": len(issues) == 0},
        tripwire_triggered=len(issues) > 0,
    )


@output_guardrail
async def check_minimum_length(ctx, agent, output):
    """Ensure the draft meets minimum word count."""
    draft_text = output.draft if hasattr(output, 'draft') else str(output)
    word_count = len(draft_text.split())

    return GuardrailFunctionOutput(
        output_info={"word_count": word_count, "minimum": 200},
        tripwire_triggered=word_count < 200,
    )
Enter fullscreen mode Exit fullscreen mode

Define the Review Output

# agents/reviewer_agent.py
from pydantic import BaseModel, Field
from agents import Agent
from guardrails.quality_checks import check_no_fabrication, check_minimum_length


class ReviewResult(BaseModel):
    """Structured output from the Reviewer Agent."""
    approved: bool = Field(description="Whether the draft passes review")
    score: int = Field(description="Quality score from 1-10")
    feedback: list[str] = Field(description="List of feedback items")
    final_draft: str = Field(description="The approved or revised draft")
Enter fullscreen mode Exit fullscreen mode

Assemble the Reviewer Agent

# agents/reviewer_agent.py (continued)

reviewer_agent = Agent(
    name="Reviewer Agent",
    instructions="""You are a meticulous content reviewer and editor. Your job
is to evaluate article drafts for quality, accuracy, and completeness.

Review checklist:
1. Factual accuracy — flag any claims that seem unsupported
2. Structure — verify logical flow and proper headings
3. Completeness — ensure the topic is covered adequately
4. Tone — confirm it matches a practical, developer-friendly style
5. No fabrication — reject any invented statistics, quotes, or case studies

Scoring guide:
- 8-10: Approve with minor notes
- 5-7: Needs revision, provide specific feedback
- 1-4: Reject, major issues found

If approved, return the draft as-is in final_draft.
If revisions are needed, apply them yourself and return the improved version.""",
    model="gpt-4o-mini",
    output_type=ReviewResult,
    output_guardrails=[check_no_fabrication, check_minimum_length],
)
Enter fullscreen mode Exit fullscreen mode

Pro tip: Output guardrails run after the agent produces its result but before it is returned to your code. If a guardrail trips, the SDK raises OutputGuardrailTripwireTriggered, giving you a chance to handle the failure programmatically. This is different from input guardrails, which can run in parallel with the agent for lower latency.


Orchestrating Multi-Agent Handoffs

Now we connect all three agents. There are two patterns for this, and we will show both.

Pattern 1: Handoffs (Delegation Chain)

With handoffs, each agent delegates to the next. The Research Agent hands off to the Writer, who hands off to the Reviewer.

# main.py — Handoff pattern
from agents import Agent, Runner, handoff

from agents.research_agent import research_agent, ResearchResult
from agents.writer_agent import writer_agent
from agents.reviewer_agent import reviewer_agent


# Wire up handoffs: Research → Writer → Reviewer
research_agent_with_handoff = Agent(
    name="Research Agent",
    instructions=research_agent.instructions + """

After completing your research, hand off to the Writer Agent
with your findings so they can draft the article.""",
    model="gpt-4o-mini",
    tools=research_agent.tools,
    handoffs=[writer_agent],
)

writer_agent_with_handoff = Agent(
    name="Writer Agent",
    instructions=writer_agent.instructions + """

After completing the draft, hand off to the Reviewer Agent
for quality review.""",
    model="gpt-4o-mini",
    handoffs=[reviewer_agent],
)


def run_with_handoffs(topic: str):
    """Run the full pipeline using the handoff pattern."""
    result = Runner.run_sync(
        research_agent_with_handoff,
        f"Research and produce an article about: {topic}",
        max_turns=30,
    )
    print(f"Final agent: {result.last_agent.name}")
    print(f"Output: {result.final_output}")
    return result
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Agents as Tools (Orchestrator)

With the orchestrator pattern, a manager agent calls specialist agents as tools:

# main.py — Orchestrator pattern
from agents import Agent, Runner


orchestrator = Agent(
    name="Content Pipeline Orchestrator",
    instructions="""You manage a content production pipeline. For any topic:

1. First, use the research tool to gather information
2. Then, use the writing tool to produce a draft from the research
3. Finally, use the review tool to check quality

Pass the full output from each step to the next tool.
Return the final reviewed draft to the user.""",
    model="gpt-4o-mini",
    tools=[
        research_agent.as_tool(
            tool_name="research_topic",
            tool_description="Research a topic and return structured findings with key facts.",
        ),
        writer_agent.as_tool(
            tool_name="write_draft",
            tool_description="Write an article draft based on provided research findings.",
        ),
        reviewer_agent.as_tool(
            tool_name="review_draft",
            tool_description="Review an article draft for quality, accuracy, and completeness.",
        ),
    ],
)


def run_with_orchestrator(topic: str):
    """Run the full pipeline using the orchestrator pattern."""
    result = Runner.run_sync(
        orchestrator,
        f"Produce a reviewed article about: {topic}",
        max_turns=15,
    )
    print(f"Output: {result.final_output}")
    return result
Enter fullscreen mode Exit fullscreen mode

Which Pattern Should You Use?

Aspect Handoffs Agents as Tools
Control Each agent decides when to hand off Orchestrator controls flow
Visibility Active agent changes mid-run Orchestrator sees all outputs
Best for Linear pipelines, customer service routing Complex coordination, parallel tasks
Guardrails Input on first agent, output on last Can apply at orchestrator level
Debugging Follow the handoff chain Check orchestrator's tool calls

For our content pipeline, the orchestrator pattern gives more control since we want to pass structured data between steps. The handoff pattern works better for conversational routing where you do not know the path in advance.


Putting It All Together: The Run Script

Here is the complete pipeline using the orchestrator pattern:

# run_pipeline.py
import asyncio
from agents import Agent, Runner, function_tool
from pydantic import BaseModel, Field


# ── Output Schemas ──────────────────────────────

class ResearchResult(BaseModel):
    topic: str = Field(description="The researched topic")
    summary: str = Field(description="2-3 paragraph summary")
    key_facts: list[str] = Field(description="5-8 key facts")


class ReviewResult(BaseModel):
    approved: bool
    score: int = Field(ge=1, le=10)
    feedback: list[str]
    final_draft: str


# ── Tools ───────────────────────────────────────

@function_tool
def web_search(query: str) -> str:
    """Search the web for current information.

    Args:
        query: The search query.
    """
    return f"Results for '{query}': Use your knowledge and mark unverified claims."


# ── Agents ──────────────────────────────────────

research_agent = Agent(
    name="Research Agent",
    instructions=(
        "You are a research specialist. Use web_search to find information. "
        "Return structured findings with key facts. Mark anything unverified."
    ),
    model="gpt-4o-mini",
    tools=[web_search],
    output_type=ResearchResult,
)

writer_agent = Agent(
    name="Writer Agent",
    instructions=(
        "You are a technical writer. Take research findings and write a clear, "
        "well-structured article in markdown. Target 800-1200 words. "
        "Never fabricate data."
    ),
    model="gpt-4o-mini",
)

reviewer_agent = Agent(
    name="Reviewer Agent",
    instructions=(
        "You are a content reviewer. Check the draft for accuracy, structure, "
        "and quality. Score 1-10. If score >= 7, approve. Return the final draft."
    ),
    model="gpt-4o-mini",
    output_type=ReviewResult,
)

# ── Orchestrator ────────────────────────────────

orchestrator = Agent(
    name="Pipeline Orchestrator",
    instructions=(
        "You manage a content pipeline. For any topic:\n"
        "1. Call research_topic to gather information\n"
        "2. Call write_draft with the research results\n"
        "3. Call review_draft with the written draft\n"
        "Return the reviewer's final output to the user."
    ),
    model="gpt-4o-mini",
    tools=[
        research_agent.as_tool(
            tool_name="research_topic",
            tool_description="Research a topic thoroughly.",
        ),
        writer_agent.as_tool(
            tool_name="write_draft",
            tool_description="Write an article from research findings.",
        ),
        reviewer_agent.as_tool(
            tool_name="review_draft",
            tool_description="Review and score an article draft.",
        ),
    ],
)


# ── Run ─────────────────────────────────────────

async def main():
    topic = "How multi-agent AI systems are changing software development in 2026"

    print(f"Starting pipeline for: {topic}\n")
    result = await Runner.run(orchestrator, f"Produce a reviewed article about: {topic}")

    print("=" * 60)
    print(f"Pipeline complete!")
    print(f"Final output:\n{result.final_output}")
    print(f"\nToken usage: {result.raw_responses[-1].usage if result.raw_responses else 'N/A'}")


if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Run it:

python run_pipeline.py
Enter fullscreen mode Exit fullscreen mode

You should see the orchestrator call each agent in sequence, producing a researched, written, and reviewed article.


Real Cost Breakdown

One of the most common questions about multi-agent systems: how much does it cost to run?

Here is a realistic breakdown for our three-agent pipeline using gpt-4o-mini:

Agent Input Tokens (est.) Output Tokens (est.) Cost per Run
Research Agent ~1,500 ~800 ~$0.0007
Writer Agent ~2,000 ~1,500 ~$0.0012
Reviewer Agent ~2,500 ~600 ~$0.0008
Orchestrator overhead ~1,000 ~500 ~$0.0005
Total ~7,000 ~3,400 ~$0.003

Note: These are estimates based on gpt-4o-mini pricing as of April 2026 (~$0.15/1M input, ~$0.60/1M output tokens). Actual costs vary by prompt length and output verbosity. Always check OpenAI's pricing page for current rates before production use.

Scaling the math:

  • 100 articles/day: ~$0.30/day
  • 1,000 articles/day: ~$3.00/day

If you switch to gpt-4o for higher quality output, costs increase roughly 15–20x. A common pattern: use gpt-4o-mini for research and writing, gpt-4o for the reviewer agent where quality judgment matters most.

Reducing Costs Further

  1. Cache research results — Skip the Research Agent for previously researched topics
  2. Use structured outputs — Pydantic models reduce wasted tokens on formatting
  3. Set max_turns — Prevent agents from looping excessively
  4. Use gpt-4o-mini by default — Only upgrade models where quality is critical

OpenAI Agents SDK vs LangGraph vs CrewAI — When to Use Which

If you are evaluating agent frameworks, here is how they compare:

Feature OpenAI Agents SDK LangGraph CrewAI
Philosophy Minimal, Pythonic Graph-based, explicit Role-based, high-level
Learning curve Low Medium-High Low-Medium
Multi-agent pattern Handoffs + tools State graphs + nodes Crews + tasks
Structured output Native Pydantic Via output parsers Built-in
Guardrails Built-in (input/output) Custom nodes Limited
LLM support OpenAI native, 100+ via LiteLLM Any LLM via LangChain Multiple providers
State management Context object Explicit state graph Shared memory
Streaming Built-in Built-in Limited
Best for OpenAI-first teams, rapid prototyping Complex workflows with branching Team simulations, role-play agents
Production readiness High High Medium

Choose OpenAI Agents SDK when:

  • You primarily use OpenAI models
  • You want the fastest path from prototype to production
  • Your workflow is a pipeline or triage pattern
  • You need built-in guardrails without extra dependencies

Choose LangGraph when:

  • Your workflow has complex branching and cycles
  • You need fine-grained control over state transitions
  • You want explicit, visual workflow graphs
  • You are already in the LangChain ecosystem

We covered LangGraph in depth in our LangGraph step-by-step tutorial — if you want to compare both frameworks hands-on, work through both tutorials with the same project.

Choose CrewAI when:

  • You think in terms of team roles and collaboration
  • You want the highest-level abstraction
  • Your use case is research, analysis, or content generation
  • You prefer convention over configuration

Advanced Patterns Worth Knowing

Dynamic Instructions

Agent behavior can adapt at runtime:

from agents import Agent, RunContextWrapper


def dynamic_instructions(ctx: RunContextWrapper, agent: Agent) -> str:
    user_tier = ctx.context.get("tier", "free")
    if user_tier == "pro":
        return "Provide detailed, in-depth analysis with code examples."
    return "Provide a concise summary suitable for beginners."


adaptive_agent = Agent(
    name="Adaptive Agent",
    instructions=dynamic_instructions,
    model="gpt-4o-mini",
)
Enter fullscreen mode Exit fullscreen mode

Parallel Agent Execution

Run independent agents simultaneously with asyncio.gather:

import asyncio
from agents import Runner


async def parallel_research(topics: list[str]):
    tasks = [
        Runner.run(research_agent, f"Research: {topic}")
        for topic in topics
    ]
    results = await asyncio.gather(*tasks)
    return results
Enter fullscreen mode Exit fullscreen mode

Agent Cloning

Create agent variants without duplicating configuration:

formal_writer = writer_agent.clone(
    name="Formal Writer",
    instructions="Write in a formal, academic tone. " + writer_agent.instructions,
)

casual_writer = writer_agent.clone(
    name="Casual Writer",
    instructions="Write in a casual, conversational tone. " + writer_agent.instructions,
)
Enter fullscreen mode Exit fullscreen mode

Common Pitfalls and How to Avoid Them

Pitfall Solution
Agents looping infinitely Set max_turns on Runner.run()
Vague handoff behavior Write explicit handoff instructions in the agent's prompt
Unstructured data between agents Use output_type with Pydantic models
High costs from GPT-4o Use gpt-4o-mini for most agents, upgrade selectively
Guardrail false positives Test guardrails independently before integrating
Lost context in handoffs Use input_filter on handoffs to control what the next agent sees

FAQ

What is the OpenAI Agents SDK?

The OpenAI Agents SDK is an open-source Python framework for building single-agent and multi-agent AI systems. It provides primitives for agent creation, tool use, inter-agent handoffs, and input/output guardrails. It is the production successor to OpenAI's experimental Swarm library.

How do I install the OpenAI Agents SDK?

Install it via pip: pip install openai-agents==0.13.4. The SDK requires Python 3.10 or higher. Set your OPENAI_API_KEY environment variable before running any agent code.

What is the difference between handoffs and agents-as-tools?

Handoffs transfer control entirely — the receiving agent becomes the active agent and responds directly. Agents-as-tools keeps the orchestrator in control — specialist agents run as tool calls and return results to the orchestrator. Use handoffs for routing, agents-as-tools for coordination.

Can I use non-OpenAI models with the Agents SDK?

Yes. The SDK supports over 100 LLMs through LiteLLM integration. You can use Anthropic, Google, Mistral, and local models — though OpenAI models have the most native support.

How much does it cost to run a multi-agent pipeline?

With gpt-4o-mini, a three-agent pipeline typically costs under $0.01 per run. See our cost breakdown for detailed estimates.

Is the OpenAI Agents SDK a replacement for Swarm?

Yes. The Agents SDK is the production-ready evolution of OpenAI's experimental Swarm library. It adds structured outputs, guardrails, streaming, and MCP tool support that Swarm did not have.

How do guardrails work in the OpenAI Agents SDK?

Input guardrails validate user input before or in parallel with the first agent. Output guardrails check the final agent's response. If a guardrail triggers its tripwire, the SDK raises an exception that you can catch and handle. Tool guardrails can also validate individual function calls.


What to Build Next

You now have a working multi-agent pipeline. Here are some directions to take it further:

  1. Add real search tools — Connect to Brave Search, Serper, or Tavily for live web data
  2. Combine with RAG — Use retrieval-augmented generation to ground your agents in your own documents
  3. Add MCP tools — The SDK has built-in MCP server support for connecting to external services
  4. Build a UI — Wrap the pipeline in a Streamlit or Gradio interface
  5. Explore vibe coding tools — Use AI app builders to create a frontend for your agent pipeline

If you are exploring the broader AI development ecosystem, check out our guide to free AI coding tools and see how tools like Claude Code approach multi-agent patterns differently with subagents and commands.


Wrapping Up

The OpenAI Agents SDK makes multi-agent systems accessible without requiring deep framework expertise. The core pattern is simple:

  1. Define agents with clear instructions and tools
  2. Connect them via handoffs or the orchestrator pattern
  3. Add guardrails to enforce quality and safety
  4. Run with the Runner and let the SDK handle orchestration

The hardest part is not the code — it is designing clear agent boundaries and instructions. Spend your time there, and the SDK handles the rest.

All code from this tutorial is available in the project structure above. Clone it, swap in your own tools and prompts, and start building.


This article is part of Effloow's AI Agent Tutorial series. We build and test every framework we write about — see how we run our own company with 16 AI agents.

Some links in this article may be affiliate links. We only recommend tools we have actually tested. See our affiliate disclosure for details.


This article may contain affiliate links to products or services we recommend. If you purchase through these links, we may earn a small commission at no extra cost to you. This helps support Effloow and allows us to continue creating free, high-quality content. See our affiliate disclosure for full details.

Top comments (0)