If you've tried building an AI agent in the last six months, you've hit the same wall: there are half a dozen frameworks, each with a different philosophy, a different API surface, and a different definition of what an "agent" even is.
I spent a weekend writing the same simple agent — "read a GitHub issue, classify it as bug/feature/question, and post a comment" — in six different frameworks. This is what I found.
TL;DR
| Framework | Lines of Code* | Native Tool System | Multi-Agent | Learning Curve | Best For |
|---|---|---|---|---|---|
| LangChain / LangGraph | ~85 | Rich (500+ integrations) | Yes (LangGraph) | Steep | Production pipelines with complex RAG/tool chains |
| CrewAI | ~60 | Built-in + custom tools | Yes (role-based) | Moderate | Multi-agent roleplay workflows |
| AutoGen (Microsoft) | ~55 | Custom function tools | Yes (conversation-based) | Moderate | Research experiments, agent-to-agent conversations |
| OpenAI Agents SDK | ~40 | Built-in (function calling) | Yes (handoffs) | Low | Quick prototyping, simple single-agent tasks |
| Pydantic AI | ~45 | Structured via Pydantic models | Limited | Low | Type-safe agents, ML pipeline integration |
| Nebula | ~35 | Declarative tool bindings | Yes (native) | Low | Multi-agent orchestration, production deployments |
*Approximate lines for the "read a GitHub issue and classify it" task.
What We're Building
The test task: a GitHub issue classifier agent that reads a new issue, decides whether it's a bug report, feature request, or general question, and posts a label recommendation comment.
It's simple enough to be a fair comparison, but real enough to expose each framework's ergonomics.
1. LangChain / LangGraph — The Heavy Lifter
LangChain is the most mature framework and the most complex. For a single-agent task you'd use LangChain directly; for anything involving state machines or cycles you'd move to LangGraph.
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import tool
from langchain_community.tools.github import GitHubIssueTool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
@tool
def classify_issue(title: str, body: str) -> str:
"""Classify a GitHub issue based on title and body."""
keywords = {"bug": ["error", "crash", "broken", "fix", "bug"],
"feature": ["request", "add", "want", "feature", "would be great"],
"question": ["how", "what", "help", "confused", "?"]}
for category, terms in keywords.items():
if any(t in title.lower() or t in body.lower() for t in terms):
return category
return "question"
tools = [classify_issue, GitHubIssueTool()]
prompt = ChatPromptTemplate.from_messages([
("system", "You classify GitHub issues. Use the tools to read and comment."),
("human", "Classify issue #{issue_number} from {repo}"),
MessagesPlaceholder("agent_scratchpad"),
])
agent = create_openai_functions_agent(ChatOpenAI(model="gpt-4"), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
Verdict: LangChain is powerful but heavy. The abstraction layers (runnables, callbacks, message types) add cognitive overhead. Great for complex RAG pipelines. Overkill for straightforward agent apps.
2. CrewAI — Role-Playing Agents
CrewAI's defining idea: agents are actors with roles, goals, and backstories. You compose them into crews that work together.
from crewai import Agent, Task, Crew, Process
from crewai_tools import GithubTool
classifier = Agent(
role="GitHub Issue Triage Specialist",
goal="Classify issues and suggest appropriate labels",
backstory="Expert at reading GitHub issues and categorizing them",
tools=[GithubTool()],
allow_delegation=False,
verbose=True
)
classify_task = Task(
description="Read issue #{issue_number} on {repo} and classify it",
expected_output="A label suggestion and brief reasoning",
agent=classifier
)
crew = Crew(
agents=[classifier],
tasks=[classify_task],
process=Process.sequential
)
CrewAI's role-based architecture is genuinely useful when you need two agents with distinct personalities to collaborate (a coder + a reviewer, for example). For a single-agent task the role scaffolding feels excessive, but the API is clean and Pythonic.
3. AutoGen (Microsoft) — Conversational Agents
AutoGen frames everything as conversations between agents. Even your tool calls are "the agent talks to a function."
import autogen
classifier = autogen.AssistantAgent(
name="Classifier",
system_message="You classify GitHub issues. Reply with BUG, FEATURE, or QUESTION.",
llm_config={"config_list": [{"model": "gpt-4", "api_key": "..."}]}
)
github_tool = autogen.UserProxyAgent(
name="GitHubTool",
human_input_mode="NEVER",
code_execution_config={"use_docker": False},
function_map={...}
)
user = autogen.UserProxyAgent(
name="User",
human_input_mode="ALWAYS",
code_execution_config=False
)
# Conversation drives execution
user.initiate_chat(
classifier,
message="Classify GitHub issue #42 from my repo. Use the GitHub tool to read it.",
)
AutoGen is designed for multi-agent conversation research — it was born out of Microsoft Research. The conversational model is powerful for debugging (you see every message), but verbose for production use. If you're doing research on how agents communicate, this is your pick.
4. OpenAI Agents SDK — The Minimalist
OpenAI released their Agents SDK in early 2025, and it strips everything down to the essentials: agents have instructions, tools, and handoffs. That's it.
from agents import Agent, Runner, function_tool
from github import Github
@function_tool
async def classify_github_issue(repo: str, issue_number: int) -> dict:
"""Read a GitHub issue and classify it."""
g = Github("your_token")
issue = g.get_repo(repo).get_issue(issue_number)
keywords = {"bug": ["error", "crash", "broken"],
"feature": ["request", "add", "feature"],
"question": ["how", "what", "help"]}
text = f"{issue.title} {issue.body}".lower()
for cat, terms in keywords.items():
if any(t in text for t in terms):
return {"label": cat, "title": issue.title}
return {"label": "question", "title": issue.title}
agent = Agent(
name="Issue Classifier",
instructions="Classify GitHub issues and return a label recommendation.",
tools=[classify_github_issue]
)
result = Runner.run_sync(agent, "Check issue #42 in nebula-gg/nebula")
The OpenAI Agents SDK is the fastest path to a working agent. The tradeoff: you're locked into OpenAI models, and the handoff system (for multi-agent) is simpler than LangGraph's state machine. For shipping something quickly, this is hard to beat.
5. Pydantic AI — Type-Safe by Default
Pydantic AI lets you define your agent's output as a Pydantic model, giving you structured, validated results.
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
from github import Github
class Classification(BaseModel):
label: str # bug, feature, or question
confidence: float
reasoning: str
agent = Agent(
"openai:gpt-4",
result_type=Classification,
system_prompt="Classify GitHub issues into bug/feature/question."
)
@agent.tool_plain
def read_issue(ctx: RunContext, repo: str, number: int) -> str:
g = Github()
issue = g.get_repo(repo).get_issue(number)
return f"Title: {issue.title}\nBody: {issue.body}"
result = agent.run_sync("Classify issue #42 from nebula-gg/nebula")
# result.data is already a validated Classification object
print(f"Label: {result.data.label}, Confidence: {result.data.confidence}")
If you love types, you'll love Pydantic AI. The result_type parameter means you never parse raw LLM output. The tradeoff: the multi-agent story is less mature than CrewAI or LangGraph. Perfect for ML pipelines where downstream tasks need typed inputs.
6. Nebula — Declarative Multi-Agent Orchestration
Nebula takes a different approach: agents are configured declaratively with tool bindings, triggers, and permissions configured at the platform level — not in code.
# agent configuration (declarative)
name: issue-classifier
model: claude-sonnet-4
tools:
- github:read_issue
- github:create_comment
instructions: |
Read the issue, classify it as bug/feature/question,
and post a label recommendation comment.
triggers:
- event: github:issue_opened
on_repo: nebula-gg/nebula
The agent doesn't import a framework — it is the framework. Tool bindings are pre-configured: github:read_issue automatically authenticates via the platform's OAuth connection. Triggers wire the agent to events without polling code. Multi-agent means spinning up another config block, not managing thread pools.
# The same task via Nebula's Python SDK
from nebula import Agent, tool, on
agent = Agent(name="issue-classifier")
@tool
def read_and_classify(repo: str, issue_number: int) -> str:
g = Github() # auth handled by platform
issue = g.get_repo(repo).get_issue(issue_number)
# classification logic
...
@on("github:issue_opened")
async def handle(event):
result = await agent.run(
f"Classify issue #{event.issue_number} from {event.repo}"
)
print(f"Classification: {result}")
agent.deploy()
Nebula shines when you need to run many agents that talk to each other, trigger off events, and require zero DevOps. The tradeoff: it's a platform, not a pip package — you deploy to Nebula rather than running locally.
The Verdict
Ship fastest: OpenAI Agents SDK or Nebula. OpenAI SDK wins for pure prototyping speed. Nebula wins when you need triggers, auth, and multi-agent orchestration out of the box.
Most capable in production: LangChain/LangGraph. The ecosystem is unmatched, but be ready for the learning curve.
Best for research: AutoGen. The conversational model is ideal for studying agent behavior.
Best for type-safety: Pydantic AI. If your team lives in type annotations, this is your framework.
Best for role-based workflows: CrewAI. When you need a writer agent and a reviewer agent with distinct personalities, CrewAI's role system is elegant.
Where to Go From Here
- Start with the simplest framework that solves your problem. Don't adopt LangChain because it's popular if a flat agent with two tools is all you need.
- Your first agent should read input, use a tool, and return a result. Add multi-agent complexity only when you have a concrete reason.
- Pay attention to deployment. A framework that works in a notebook but takes two weeks to deploy to production isn't the right choice for a shipped product.
- Try the same task in two frameworks. The 30 minutes you spend will teach you more about your actual requirements than reading docs for an hour.
The right framework is the one that gets out of your way. For me, that means OpenAI Agents SDK for quick scripts and Nebula for production multi-agent systems. Your mileage will vary — and that's the point.
Top comments (0)