DEV Community

The Daily Agent
The Daily Agent

Posted on

6 AI Agent Frameworks Compared: Which One Ships Your First Agent Fastest?

If you've tried building an AI agent in the last six months, you've hit the same wall: there are half a dozen frameworks, each with a different philosophy, a different API surface, and a different definition of what an "agent" even is.

I spent a weekend writing the same simple agent — "read a GitHub issue, classify it as bug/feature/question, and post a comment" — in six different frameworks. This is what I found.

TL;DR

Framework Lines of Code* Native Tool System Multi-Agent Learning Curve Best For
LangChain / LangGraph ~85 Rich (500+ integrations) Yes (LangGraph) Steep Production pipelines with complex RAG/tool chains
CrewAI ~60 Built-in + custom tools Yes (role-based) Moderate Multi-agent roleplay workflows
AutoGen (Microsoft) ~55 Custom function tools Yes (conversation-based) Moderate Research experiments, agent-to-agent conversations
OpenAI Agents SDK ~40 Built-in (function calling) Yes (handoffs) Low Quick prototyping, simple single-agent tasks
Pydantic AI ~45 Structured via Pydantic models Limited Low Type-safe agents, ML pipeline integration
Nebula ~35 Declarative tool bindings Yes (native) Low Multi-agent orchestration, production deployments

*Approximate lines for the "read a GitHub issue and classify it" task.

What We're Building

The test task: a GitHub issue classifier agent that reads a new issue, decides whether it's a bug report, feature request, or general question, and posts a label recommendation comment.

It's simple enough to be a fair comparison, but real enough to expose each framework's ergonomics.

1. LangChain / LangGraph — The Heavy Lifter

LangChain is the most mature framework and the most complex. For a single-agent task you'd use LangChain directly; for anything involving state machines or cycles you'd move to LangGraph.

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import tool
from langchain_community.tools.github import GitHubIssueTool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@tool
def classify_issue(title: str, body: str) -> str:
    """Classify a GitHub issue based on title and body."""
    keywords = {"bug": ["error", "crash", "broken", "fix", "bug"],
                "feature": ["request", "add", "want", "feature", "would be great"],
                "question": ["how", "what", "help", "confused", "?"]}
    for category, terms in keywords.items():
        if any(t in title.lower() or t in body.lower() for t in terms):
            return category
    return "question"

tools = [classify_issue, GitHubIssueTool()]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You classify GitHub issues. Use the tools to read and comment."),
    ("human", "Classify issue #{issue_number} from {repo}"),
    MessagesPlaceholder("agent_scratchpad"),
])

agent = create_openai_functions_agent(ChatOpenAI(model="gpt-4"), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
Enter fullscreen mode Exit fullscreen mode

Verdict: LangChain is powerful but heavy. The abstraction layers (runnables, callbacks, message types) add cognitive overhead. Great for complex RAG pipelines. Overkill for straightforward agent apps.

2. CrewAI — Role-Playing Agents

CrewAI's defining idea: agents are actors with roles, goals, and backstories. You compose them into crews that work together.

from crewai import Agent, Task, Crew, Process
from crewai_tools import GithubTool

classifier = Agent(
    role="GitHub Issue Triage Specialist",
    goal="Classify issues and suggest appropriate labels",
    backstory="Expert at reading GitHub issues and categorizing them",
    tools=[GithubTool()],
    allow_delegation=False,
    verbose=True
)

classify_task = Task(
    description="Read issue #{issue_number} on {repo} and classify it",
    expected_output="A label suggestion and brief reasoning",
    agent=classifier
)

crew = Crew(
    agents=[classifier],
    tasks=[classify_task],
    process=Process.sequential
)
Enter fullscreen mode Exit fullscreen mode

CrewAI's role-based architecture is genuinely useful when you need two agents with distinct personalities to collaborate (a coder + a reviewer, for example). For a single-agent task the role scaffolding feels excessive, but the API is clean and Pythonic.

3. AutoGen (Microsoft) — Conversational Agents

AutoGen frames everything as conversations between agents. Even your tool calls are "the agent talks to a function."

import autogen

classifier = autogen.AssistantAgent(
    name="Classifier",
    system_message="You classify GitHub issues. Reply with BUG, FEATURE, or QUESTION.",
    llm_config={"config_list": [{"model": "gpt-4", "api_key": "..."}]}
)

github_tool = autogen.UserProxyAgent(
    name="GitHubTool",
    human_input_mode="NEVER",
    code_execution_config={"use_docker": False},
    function_map={...}
)

user = autogen.UserProxyAgent(
    name="User",
    human_input_mode="ALWAYS",
    code_execution_config=False
)

# Conversation drives execution
user.initiate_chat(
    classifier,
    message="Classify GitHub issue #42 from my repo. Use the GitHub tool to read it.",
)
Enter fullscreen mode Exit fullscreen mode

AutoGen is designed for multi-agent conversation research — it was born out of Microsoft Research. The conversational model is powerful for debugging (you see every message), but verbose for production use. If you're doing research on how agents communicate, this is your pick.

4. OpenAI Agents SDK — The Minimalist

OpenAI released their Agents SDK in early 2025, and it strips everything down to the essentials: agents have instructions, tools, and handoffs. That's it.

from agents import Agent, Runner, function_tool
from github import Github

@function_tool
async def classify_github_issue(repo: str, issue_number: int) -> dict:
    """Read a GitHub issue and classify it."""
    g = Github("your_token")
    issue = g.get_repo(repo).get_issue(issue_number)
    keywords = {"bug": ["error", "crash", "broken"],
                "feature": ["request", "add", "feature"],
                "question": ["how", "what", "help"]}
    text = f"{issue.title} {issue.body}".lower()
    for cat, terms in keywords.items():
        if any(t in text for t in terms):
            return {"label": cat, "title": issue.title}
    return {"label": "question", "title": issue.title}

agent = Agent(
    name="Issue Classifier",
    instructions="Classify GitHub issues and return a label recommendation.",
    tools=[classify_github_issue]
)

result = Runner.run_sync(agent, "Check issue #42 in nebula-gg/nebula")
Enter fullscreen mode Exit fullscreen mode

The OpenAI Agents SDK is the fastest path to a working agent. The tradeoff: you're locked into OpenAI models, and the handoff system (for multi-agent) is simpler than LangGraph's state machine. For shipping something quickly, this is hard to beat.

5. Pydantic AI — Type-Safe by Default

Pydantic AI lets you define your agent's output as a Pydantic model, giving you structured, validated results.

from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
from github import Github

class Classification(BaseModel):
    label: str  # bug, feature, or question
    confidence: float
    reasoning: str

agent = Agent(
    "openai:gpt-4",
    result_type=Classification,
    system_prompt="Classify GitHub issues into bug/feature/question."
)

@agent.tool_plain
def read_issue(ctx: RunContext, repo: str, number: int) -> str:
    g = Github()
    issue = g.get_repo(repo).get_issue(number)
    return f"Title: {issue.title}\nBody: {issue.body}"

result = agent.run_sync("Classify issue #42 from nebula-gg/nebula")
# result.data is already a validated Classification object
print(f"Label: {result.data.label}, Confidence: {result.data.confidence}")
Enter fullscreen mode Exit fullscreen mode

If you love types, you'll love Pydantic AI. The result_type parameter means you never parse raw LLM output. The tradeoff: the multi-agent story is less mature than CrewAI or LangGraph. Perfect for ML pipelines where downstream tasks need typed inputs.

6. Nebula — Declarative Multi-Agent Orchestration

Nebula takes a different approach: agents are configured declaratively with tool bindings, triggers, and permissions configured at the platform level — not in code.

# agent configuration (declarative)
name: issue-classifier
model: claude-sonnet-4
tools:
  - github:read_issue
  - github:create_comment
instructions: |
  Read the issue, classify it as bug/feature/question,
  and post a label recommendation comment.
triggers:
  - event: github:issue_opened
    on_repo: nebula-gg/nebula
Enter fullscreen mode Exit fullscreen mode

The agent doesn't import a framework — it is the framework. Tool bindings are pre-configured: github:read_issue automatically authenticates via the platform's OAuth connection. Triggers wire the agent to events without polling code. Multi-agent means spinning up another config block, not managing thread pools.

# The same task via Nebula's Python SDK
from nebula import Agent, tool, on

agent = Agent(name="issue-classifier")

@tool
def read_and_classify(repo: str, issue_number: int) -> str:
    g = Github()  # auth handled by platform
    issue = g.get_repo(repo).get_issue(issue_number)
    # classification logic
    ...

@on("github:issue_opened")
async def handle(event):
    result = await agent.run(
        f"Classify issue #{event.issue_number} from {event.repo}"
    )
    print(f"Classification: {result}")

agent.deploy()
Enter fullscreen mode Exit fullscreen mode

Nebula shines when you need to run many agents that talk to each other, trigger off events, and require zero DevOps. The tradeoff: it's a platform, not a pip package — you deploy to Nebula rather than running locally.

The Verdict

Ship fastest: OpenAI Agents SDK or Nebula. OpenAI SDK wins for pure prototyping speed. Nebula wins when you need triggers, auth, and multi-agent orchestration out of the box.

Most capable in production: LangChain/LangGraph. The ecosystem is unmatched, but be ready for the learning curve.

Best for research: AutoGen. The conversational model is ideal for studying agent behavior.

Best for type-safety: Pydantic AI. If your team lives in type annotations, this is your framework.

Best for role-based workflows: CrewAI. When you need a writer agent and a reviewer agent with distinct personalities, CrewAI's role system is elegant.

Where to Go From Here

  1. Start with the simplest framework that solves your problem. Don't adopt LangChain because it's popular if a flat agent with two tools is all you need.
  2. Your first agent should read input, use a tool, and return a result. Add multi-agent complexity only when you have a concrete reason.
  3. Pay attention to deployment. A framework that works in a notebook but takes two weeks to deploy to production isn't the right choice for a shipped product.
  4. Try the same task in two frameworks. The 30 minutes you spend will teach you more about your actual requirements than reading docs for an hour.

The right framework is the one that gets out of your way. For me, that means OpenAI Agents SDK for quick scripts and Nebula for production multi-agent systems. Your mileage will vary — and that's the point.

Top comments (0)