Jangwook Kim

Posted on Apr 5 • Originally published at effloow.com

Build Your First Multi-Agent AI System with CrewAI + Python — Step-by-Step Tutorial (2026)

#crewai #python #ai #tutorial

Build Your First Multi-Agent AI System with CrewAI + Python — Step-by-Step Tutorial (2026)

You want multiple AI agents working together — one to research, one to write, one to review. You have heard of CrewAI but every tutorial stops at "Hello World." This one does not.

In this tutorial, you will build the same three-agent content pipeline we built in our LangGraph tutorial and OpenAI Agents SDK tutorial — but using CrewAI. By the end, you will have a working system and enough context to decide which framework fits your project.

What you will build: A Research Agent that gathers information, a Writer Agent that drafts an article, and a Reviewer Agent that validates quality. All three collaborate as a Crew, producing a polished article with a single command.

What Is CrewAI?

CrewAI is an open-source Python framework for orchestrating autonomous AI agents. It lets you define agents with specific roles, assign them tasks, and run them together as a crew — either sequentially or in a hierarchical structure with a manager agent.

Feature	Detail
Language	Python (3.10 to 3.13)
Current version	0.175.0 (as of April 2026)
License	MIT
LLM support	OpenAI (default), Anthropic, Ollama, 100+ via LiteLLM
Core primitives	Agent, Task, Crew, Tool, Process
Package manager	uv (recommended), pip also works

CrewAI's philosophy: role-playing agents with structured collaboration. Instead of building explicit graphs (like LangGraph) or wiring handoffs between agents (like the OpenAI Agents SDK), you define each agent's role, goal, and backstory — then let the framework handle coordination.

Think of it like assembling a project team. You pick the people, define who does what, and the team works together to deliver the result.

Prerequisites

Before we start:

Python 3.10 or higher (CrewAI supports 3.10 through 3.13)
An OpenAI API key — CrewAI defaults to GPT-4, but you can swap in any LLM
Basic Python knowledge — functions, classes, f-strings
A terminal and code editor

Installation

We will use pip for simplicity. CrewAI's official toolchain uses uv, but pip works fine for tutorials.

mkdir crewai-content-pipeline && cd crewai-content-pipeline
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install CrewAI and the tools package:

pip install crewai crewai-tools

Version pinning: This tutorial was built with CrewAI 0.175.0. Pin your version in production:
pip install crewai==0.175.0 crewai-tools==0.38.0

Set your API key:

export OPENAI_API_KEY="sk-your-key-here"

Verify the installation:

import crewai
print(crewai.__version__)
# Expected: 0.175.0

Core Concepts: Agents, Tasks, Crews, Tools, and Processes

CrewAI has five building blocks. Understanding these is the key to everything that follows.

Agents

An Agent is an autonomous unit with a role, a goal, and a backstory. The role defines its expertise. The goal drives its decisions. The backstory gives the LLM context for how to behave.

from crewai import Agent

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive, accurate information about the given topic",
    backstory=(
        "You are a seasoned research analyst with 15 years of experience. "
        "You are known for your ability to find the most relevant information "
        "and distinguish credible sources from noise."
    ),
    verbose=True
)

Key parameters beyond the basics:

llm — which model to use (default: GPT-4)
tools — list of tools the agent can use
allow_delegation — whether the agent can pass work to other agents (default: False)
max_iter — maximum reasoning iterations before forcing an answer (default: 20)
memory — whether to maintain conversation history across tasks

Tasks

A Task is a specific assignment for an agent. It needs a description (what to do) and an expected_output (what the result should look like).

from crewai import Task

research_task = Task(
    description=(
        "Research the topic '{topic}' thoroughly. "
        "Find key facts, recent developments, and expert opinions. "
        "Focus on information from 2025-2026."
    ),
    expected_output=(
        "A structured research brief with at least 10 key findings, "
        "each with its source or basis."
    ),
    agent=researcher
)

Tasks can depend on each other through the context parameter — a task receives the output of its context tasks as input.

Crews

A Crew is a team of agents working on a set of tasks. You choose a process — sequential or hierarchical — that determines how work flows.

from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, writing_task, review_task],
    process=Process.sequential,
    verbose=True
)

Tools

Tools extend what agents can do. CrewAI ships 30+ built-in tools (web search, file reading, code execution) and lets you build custom ones.

from crewai.tools import tool

@tool("word_count")
def word_count(text: str) -> str:
    """Count the number of words in the given text."""
    count = len(text.split())
    return f"Word count: {count}"

Processes

CrewAI supports two execution strategies:

Sequential — tasks run in order, each feeding into the next. Simple and predictable.
Hierarchical — a manager agent distributes tasks to the best-suited agent. Requires manager_llm to be set.

For this tutorial, we use sequential. It maps naturally to a pipeline: research, then write, then review.

Project Setup

Create the following file structure:

crewai-content-pipeline/
├── main.py
├── agents.py
├── tasks.py
├── tools.py
└── requirements.txt

requirements.txt:

crewai==0.175.0
crewai-tools
python-dotenv

Building Agent #1: The Research Agent

The Research Agent gathers information on a given topic. We give it the SerperDevTool for web search capability.

agents.py:

from crewai import Agent
from crewai_tools import SerperDevTool

search_tool = SerperDevTool()

def create_research_agent() -> Agent:
    return Agent(
        role="Senior Research Analyst",
        goal=(
            "Produce a comprehensive research brief on the assigned topic. "
            "Find accurate, up-to-date facts with credible sources."
        ),
        backstory=(
            "You are a veteran research analyst who has spent 15 years "
            "distilling complex topics into clear, well-sourced briefs. "
            "You never fabricate information — if you cannot verify a claim, "
            "you flag it as unverified."
        ),
        tools=[search_tool],
        verbose=True,
        allow_delegation=False,
        max_iter=25
    )

Why allow_delegation=False? In sequential mode, each agent handles its own task. Delegation is useful in hierarchical setups where a manager assigns work dynamically.

Pro tip: If you do not have a Serper API key, you can skip the search tool and the Research Agent will rely on its LLM training data. The pipeline still works — you just lose real-time web results. Get a key at serper.dev (free tier available).

Building Agent #2: The Writer Agent

The Writer Agent takes the research brief and produces a well-structured article.

Add to agents.py:

def create_writer_agent() -> Agent:
    return Agent(
        role="Senior Content Writer",
        goal=(
            "Transform research findings into an engaging, well-structured "
            "article that is informative and easy to read."
        ),
        backstory=(
            "You are an experienced technical writer who turns dense research "
            "into clear, compelling articles. You write in a practical, "
            "builder-oriented tone. You use short paragraphs, concrete "
            "examples, and avoid jargon unless you define it first."
        ),
        verbose=True,
        allow_delegation=False,
        max_iter=20
    )

No tools needed here — the Writer Agent works purely with the text it receives from the Research Agent.

Building Agent #3: The Reviewer Agent

The Reviewer Agent checks the article for quality, accuracy, and completeness. We give it a custom word-count tool so it can enforce length requirements.

Add to tools.py:

from crewai.tools import tool

@tool("word_count")
def word_count(text: str) -> str:
    """Count the number of words in a given text."""
    count = len(text.split())
    return f"The text contains {count} words."

@tool("check_structure")
def check_structure(text: str) -> str:
    """Check if the article has proper heading structure."""
    lines = text.split("\n")
    h2_count = sum(1 for line in lines if line.startswith("## "))
    h3_count = sum(1 for line in lines if line.startswith("### "))
    has_intro = not lines[0].startswith("#") if lines else False

    issues = []
    if h2_count < 3:
        issues.append(f"Only {h2_count} H2 headings found (minimum 3 recommended)")
    if h3_count < 2:
        issues.append(f"Only {h3_count} H3 headings found (minimum 2 recommended)")
    if not has_intro:
        issues.append("Article may be missing an introduction before the first heading")

    if issues:
        return "Structure issues found:\n- " + "\n- ".join(issues)
    return "Article structure looks good: sufficient headings and proper introduction."

Add the Reviewer Agent to agents.py:

from tools import word_count, check_structure

def create_reviewer_agent() -> Agent:
    return Agent(
        role="Senior Content Reviewer",
        goal=(
            "Review the article for accuracy, completeness, readability, "
            "and proper structure. Provide specific, actionable feedback."
        ),
        backstory=(
            "You are a meticulous editor with a sharp eye for factual errors, "
            "weak arguments, and structural problems. You review with the "
            "reader in mind — if something is confusing, you flag it."
        ),
        tools=[word_count, check_structure],
        verbose=True,
        allow_delegation=False,
        max_iter=15
    )

Defining the Tasks

Each task maps to one agent and describes what that agent needs to deliver. The context parameter creates the data flow between tasks.

tasks.py:

from crewai import Task
from agents import (
    create_research_agent,
    create_writer_agent,
    create_reviewer_agent,
)

# Create agents
research_agent = create_research_agent()
writer_agent = create_writer_agent()
reviewer_agent = create_reviewer_agent()

def create_tasks(topic: str) -> list:
    research_task = Task(
        description=(
            f"Research the topic: '{topic}'\n\n"
            "Your deliverables:\n"
            "1. At least 10 key findings with supporting details\n"
            "2. Recent developments (2025-2026 preferred)\n"
            "3. Notable expert opinions or industry data\n"
            "4. Any controversies or counterarguments\n\n"
            "Flag any claim you cannot verify as [UNVERIFIED]."
        ),
        expected_output=(
            "A structured research brief in markdown format with "
            "numbered findings, each containing the fact and its basis."
        ),
        agent=research_agent,
    )

    writing_task = Task(
        description=(
            f"Write a comprehensive article about: '{topic}'\n\n"
            "Requirements:\n"
            "- Use the research brief from the previous task as your source\n"
            "- Write 1500-2000 words\n"
            "- Use clear H2 and H3 headings\n"
            "- Include a compelling introduction and conclusion\n"
            "- Write in a practical, builder-oriented tone\n"
            "- Use short paragraphs (3-4 sentences max)\n"
            "- Include concrete examples where possible"
        ),
        expected_output=(
            "A well-structured article in markdown format, "
            "1500-2000 words, with proper heading hierarchy."
        ),
        agent=writer_agent,
        context=[research_task],
    )

    review_task = Task(
        description=(
            "Review the article produced by the Writer Agent.\n\n"
            "Check for:\n"
            "1. Factual accuracy — flag any unsupported claims\n"
            "2. Structure — proper heading hierarchy (H2/H3)\n"
            "3. Word count — minimum 1500 words\n"
            "4. Readability — clear language, short paragraphs\n"
            "5. Completeness — does it cover the topic adequately?\n\n"
            "Use the word_count and check_structure tools.\n"
            "Produce the final article with any necessary corrections applied."
        ),
        expected_output=(
            "The final, reviewed article in markdown format with a "
            "brief reviewer note at the top summarizing any changes made."
        ),
        agent=reviewer_agent,
        context=[writing_task],
        output_file="output/article.md",
    )

    return [research_task, writing_task, review_task]

How context works: When writing_task has context=[research_task], the Writer Agent automatically receives the Research Agent's output as part of its input. This is how data flows through the pipeline without you manually passing strings between agents.

Assembling the Crew — Sequential and Hierarchical Processes

main.py:

import os
from dotenv import load_dotenv
from crewai import Crew, Process
from tasks import create_tasks, research_agent, writer_agent, reviewer_agent

load_dotenv()

def run_content_pipeline(topic: str, process_type: str = "sequential"):
    """Run the three-agent content pipeline."""

    tasks = create_tasks(topic)

    if process_type == "hierarchical":
        crew = Crew(
            agents=[research_agent, writer_agent, reviewer_agent],
            tasks=tasks,
            process=Process.hierarchical,
            manager_llm="gpt-4o",
            verbose=True,
        )
    else:
        crew = Crew(
            agents=[research_agent, writer_agent, reviewer_agent],
            tasks=tasks,
            process=Process.sequential,
            verbose=True,
        )

    result = crew.kickoff()

    print("\n" + "=" * 60)
    print("PIPELINE COMPLETE")
    print("=" * 60)
    print(f"\nToken usage: {result.token_usage}")
    print(f"\nFinal output saved to: output/article.md")

    return result

if __name__ == "__main__":
    topic = "The rise of multi-agent AI systems in 2026 and why they matter for developers"
    run_content_pipeline(topic)

Sequential vs. Hierarchical: When to Use Each

Sequential (Process.sequential):

Tasks run in a fixed order: research → write → review
Each task's output feeds into the next via context
Predictable, easy to debug
Best for linear pipelines where the workflow is known upfront

Hierarchical (Process.hierarchical):

A manager agent (powered by manager_llm) decides which agent handles which task
Agents can be reassigned dynamically based on results
More flexible but harder to predict
Best for complex projects where task assignment depends on intermediate results

For most content pipelines, sequential is the right choice. Use hierarchical when you have many agents and tasks where the optimal routing is not obvious upfront.

Running the Pipeline

python main.py

You will see verbose output showing each agent's reasoning process. The final article is saved to output/article.md.

Expected output structure:

[Research Agent] Starting research on: The rise of multi-agent AI systems...
[Research Agent] Using SerperDevTool to search...
[Research Agent] Found 10 key findings...
[Writer Agent] Received research brief, beginning article...
[Writer Agent] Draft complete: 1,847 words
[Reviewer Agent] Reviewing article structure...
[Reviewer Agent] Using word_count tool... Result: 1,847 words
[Reviewer Agent] Using check_structure tool... Structure looks good
[Reviewer Agent] Final article approved with minor edits

PIPELINE COMPLETE
Token usage: {'total_tokens': 18432, 'prompt_tokens': 14201, 'completion_tokens': 4231}

Note: Actual output format and token counts will vary depending on your CrewAI version and LLM model. The pattern above shows typical behavior, not exact output.

Real Cost Breakdown

Running this three-agent pipeline costs real money. Here is what to expect with GPT-4o pricing as of early 2026:

Metric	Typical Range
Total tokens	15,000 – 25,000
Prompt tokens	12,000 – 18,000
Completion tokens	3,000 – 7,000
Cost per run (GPT-4o)	$0.08 – $0.15
Cost per run (GPT-4o-mini)	$0.005 – $0.015
Execution time	45 – 120 seconds

Source: Token counts based on OpenAI's published pricing at openai.com/pricing. Actual costs depend on prompt complexity, agent verbosity, and tool usage. [UNVERIFIED: exact per-run costs — run your own pipeline and check result.token_usage for actual numbers.]

Cost optimization tips:

Use GPT-4o-mini for the Writer and Reviewer — set llm="gpt-4o-mini" on agents where you do not need maximum reasoning capability
Reduce max_iter — lower the iteration cap to prevent agents from over-thinking
Enable caching — CrewAI caches tool results by default, avoiding duplicate API calls
Use verbose=False in production — verbose logging adds overhead

CrewAI vs LangGraph vs OpenAI Agents SDK — Comparison

Having built the same content pipeline with all three frameworks (Articles #10, #26, and this one), here is how they compare:

Feature	CrewAI	LangGraph	OpenAI Agents SDK
Philosophy	Role-playing teams	Graph-based workflows	Handoff-based orchestration
Learning curve	Low — define roles, run crew	Medium — learn graph concepts	Low — Pythonic, minimal API
Multi-agent setup	Built-in (Crew + Process)	Manual (nodes + edges)	Built-in (Handoffs)
Sequential workflow	`Process.sequential`	Define edges explicitly	Chain handoffs
Hierarchical workflow	`Process.hierarchical`	Custom manager node	Not built-in
Tool integration	30+ built-in + custom	LangChain tools ecosystem	Function-based tools
Guardrails	Task-level guardrails	Custom validation nodes	Built-in Guardrails class
Memory	Built-in (short/long-term)	Built-in persistence	External (bring your own)
Structured output	Pydantic models	Pydantic via LangChain	Pydantic models
Async support	Yes (`akickoff()`)	Yes (native)	Yes (`Runner.run()` is async)
LLM flexibility	100+ models via LiteLLM	100+ via LangChain	OpenAI native, LiteLLM addon
Best for	Team-based AI workflows	Complex stateful workflows	OpenAI-native agent systems
Lines of code (this tutorial)	~120	~180	~100
Python requirement	3.10 – 3.13	3.9+	3.10+
License	MIT	MIT	MIT

When to Choose CrewAI

You want the fastest setup for multi-agent collaboration
Your workflow maps naturally to roles and responsibilities
You need built-in hierarchical delegation (manager agent)
You prefer YAML configuration over code for agent definitions

When to Choose LangGraph

You need fine-grained control over execution flow
Your workflow has complex branching, loops, or conditional logic
You need durable execution with checkpoint/resume
You want human-in-the-loop approval steps at specific points

When to Choose OpenAI Agents SDK

You are already in the OpenAI ecosystem
You want the simplest possible API with minimal abstractions
Handoff-based routing between agents fits your use case
You need built-in guardrails for safety-critical applications

Extending the Pipeline: Adding RAG and Custom Tools

Once your basic pipeline works, here are practical next steps.

Adding a Knowledge Base with RAG

Combine CrewAI agents with a RAG (Retrieval-Augmented Generation) system so your Research Agent can query your own documents instead of just the web.

from crewai_tools import RagTool

# Point to your document directory
rag_tool = RagTool(
    config=dict(
        llm=dict(provider="openai", config=dict(model="gpt-4o-mini")),
        embedder=dict(provider="openai", config=dict(model="text-embedding-3-small")),
    )
)

research_agent = Agent(
    role="Internal Knowledge Researcher",
    goal="Find relevant information from our internal knowledge base",
    backstory="Expert at querying and synthesizing internal documentation.",
    tools=[rag_tool],
    verbose=True,
)

For a deeper dive into building RAG systems, see our LlamaIndex RAG tutorial.

Building Custom Tools

The @tool decorator makes it easy to give agents new capabilities:

from crewai.tools import tool
import json

@tool("read_json_file")
def read_json_file(file_path: str) -> str:
    """Read and parse a JSON file, returning its contents as formatted text."""
    with open(file_path, "r") as f:
        data = json.load(f)
    return json.dumps(data, indent=2)

@tool("save_to_file")
def save_to_file(content: str, file_path: str) -> str:
    """Save content to a file at the specified path."""
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    with open(file_path, "w") as f:
        f.write(content)
    return f"Content saved to {file_path}"

Using Structured Output

Force agents to return data in a specific format using Pydantic models:

from pydantic import BaseModel
from typing import List

class ResearchBrief(BaseModel):
    topic: str
    key_findings: List[str]
    sources: List[str]
    confidence_level: str

research_task = Task(
    description="Research the given topic thoroughly.",
    expected_output="Structured research brief.",
    agent=research_agent,
    output_pydantic=ResearchBrief,
)

This is useful when your pipeline feeds into other systems that expect structured data — APIs, databases, or downstream processing steps.

Common Mistakes and How to Avoid Them

After building multi-agent systems with CrewAI, here are the pitfalls we see most often:

1. Vague agent backstories. The backstory is not decoration — it shapes how the LLM interprets the role. "You are a writer" produces worse results than "You are a technical writer who specializes in developer tutorials, writes in short paragraphs, and always includes code examples."

2. Missing expected_output detail. A task with expected_output="A good article" gives the agent no quality criteria. Be specific: word count, format, required sections.

3. Forgetting context on dependent tasks. Without context=[previous_task], the next agent has no access to prior work. This is the most common cause of "my agents are ignoring each other's output."

4. Setting max_iter too low. If your agent's task is complex and it hits the iteration limit, it will return whatever it has — often incomplete work. Start with the default (20) and adjust based on observed behavior.

5. Running hierarchical without understanding the overhead. Hierarchical mode adds a manager agent that consumes tokens deciding who should do what. For simple 3-agent pipelines, sequential is cheaper and more predictable.

FAQ

How is CrewAI different from LangChain?

LangChain is a general-purpose LLM framework for building chains and retrievals. CrewAI is specifically designed for multi-agent collaboration. CrewAI can use LangChain tools, but it adds the Agent-Task-Crew orchestration layer that LangChain does not have natively. LangGraph (part of the LangChain ecosystem) is the closer comparison — see our full comparison above.

Can I use CrewAI with local models like Ollama?

Yes. Set the llm parameter on any agent to point to your local model:

agent = Agent(
    role="Researcher",
    goal="Research topics using local LLM",
    backstory="Local-first researcher.",
    llm="ollama/llama3.1",
    verbose=True,
)

For a guide on setting up Ollama, see our Ollama + Open WebUI self-hosting guide.

How many agents can a Crew have?

There is no hard limit. Practically, 3-7 agents cover most use cases. Beyond that, consider using hierarchical process with sub-crews to keep complexity manageable.

Can CrewAI agents call other Crews?

Yes — this is the Flows feature. A Flow can trigger multiple Crews as part of a larger pipeline. This is useful for complex applications where different stages need different team compositions.

Is CrewAI production-ready?

CrewAI is used in production by companies building AI-powered workflows. It has built-in memory, caching, rate limiting, and error handling. For production deployments, pin your CrewAI version and test thoroughly — the API surface is still evolving between releases.

How does CrewAI compare to building agents with Claude Code?

Claude Code is an AI coding assistant that uses its own agent architecture for software engineering tasks. CrewAI is a framework you use to build custom agent systems for any domain. They solve different problems — you might even use Claude Code to help you write your CrewAI agents.

What to Build Next

You now have a working multi-agent content pipeline with CrewAI. Here are paths to explore:

Add more agents — a Fact-Checker Agent, an SEO Agent, or a Translator Agent
Try hierarchical mode — let a manager agent decide task routing
Connect to your data — use RAG tools to feed agents your internal docs
Build a Slack bot — trigger the pipeline from a Slack message and post results back
Compare frameworks — build the same pipeline with LangGraph or OpenAI Agents SDK and decide which fits your team

If you are curious how we use multi-agent systems in our own workflow, read how we built a company powered by 14 AI agents.

This is Part 3 of our Agent Trilogy series. Part 1 covers LangGraph, Part 2 covers the OpenAI Agents SDK, and this article covers CrewAI. All three tutorials build the same content pipeline so you can compare frameworks directly.

Effloow uses affiliate links where noted. If you sign up for a service through our links, we may earn a commission at no extra cost to you. We only recommend tools we actually use.

This article may contain affiliate links to products or services we recommend. If you purchase through these links, we may earn a small commission at no extra cost to you. This helps support Effloow and allows us to continue creating free, high-quality content. See our affiliate disclosure for full details.

DEV Community

Build Your First Multi-Agent AI System with CrewAI + Python — Step-by-Step Tutorial (2026)

Build Your First Multi-Agent AI System with CrewAI + Python — Step-by-Step Tutorial (2026)

What Is CrewAI?

Prerequisites

Installation

Core Concepts: Agents, Tasks, Crews, Tools, and Processes

Agents

Tasks

Crews

Tools

Processes

Project Setup

Building Agent #1: The Research Agent

Building Agent #2: The Writer Agent

Building Agent #3: The Reviewer Agent

Defining the Tasks

Assembling the Crew — Sequential and Hierarchical Processes

Sequential vs. Hierarchical: When to Use Each

Running the Pipeline

Real Cost Breakdown

CrewAI vs LangGraph vs OpenAI Agents SDK — Comparison

When to Choose CrewAI

When to Choose LangGraph

When to Choose OpenAI Agents SDK

Extending the Pipeline: Adding RAG and Custom Tools

Adding a Knowledge Base with RAG

Building Custom Tools

Using Structured Output

Common Mistakes and How to Avoid Them

FAQ

How is CrewAI different from LangChain?

Can I use CrewAI with local models like Ollama?

How many agents can a Crew have?

Can CrewAI agents call other Crews?

Is CrewAI production-ready?

How does CrewAI compare to building agents with Claude Code?

What to Build Next

Top comments (0)