DEV Community

Jangwook Kim
Jangwook Kim

Posted on • Originally published at jangwook.net

Microsoft AutoGen 0.7.x Multi-Agent Tutorial — From AssistantAgent to GraphFlow From Scratch

When I first tried to pick up AutoGen, Google search results mixed 0.2.x and 0.4.x examples in the same page. One snippet configures an agent with llm_config={"model": "gpt-4"}, another with model_client=OpenAIChatCompletionClient(...). Those two patterns target completely different AutoGen versions and are not interchangeable.

The current latest stable release is 0.7.5 under the autogen-agentchat package. Its API is a full break from 0.2.x — following an old tutorial will get you nowhere. This post is based on direct installation and execution on macOS, walking through the new 0.7.x API from the ground up.

Why AutoGen 0.7.x Replaced the Entire API

In 0.2.x, creating an AssistantAgent looked like this:

# 0.2.x pattern (no longer works)
from autogen import AssistantAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"model": "gpt-4", "api_key": "..."}
)
Enter fullscreen mode Exit fullscreen mode

In 0.7.x, the model client is a separate injected object:

# 0.7.x pattern
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(model="gpt-4o", api_key="...")
assistant = AssistantAgent(name="assistant", model_client=model_client)
Enter fullscreen mode Exit fullscreen mode

The motivation is multi-backend model support. In 0.7.x you can swap between Anthropic Claude, Azure OpenAI, Ollama (local LLM), and LLaMA.cpp through the same interface. Change the model without touching agent code.

Installation (5 Minutes)

python3 -m pip install autogen-agentchat autogen-ext
Enter fullscreen mode Exit fullscreen mode

On my setup (macOS, Python 3.12.8), this installed autogen-agentchat-0.7.5, autogen-core-0.7.5, and autogen-ext-0.7.5 together. These three packages form a layered architecture:

  • autogen-core: message routing, runtime, base abstractions
  • autogen-agentchat: high-level agent/team API designed for human readability
  • autogen-ext: model clients (OpenAI, Anthropic, Ollama, etc.) + CodeExecutor

To use Anthropic Claude as the backend, no extra install is needed — autogen_ext.models.anthropic is already part of autogen-ext.

Three Core Building Blocks

1. AssistantAgent — The Fundamental Unit

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

developer = AssistantAgent(
    name="Developer",
    model_client=model_client,
    system_message="You are a senior Python engineer. Answer questions concisely.",
    tools=[],           # list of FunctionTools (optional)
    handoffs=[],        # other agent names for Swarm routing (optional)
)
Enter fullscreen mode Exit fullscreen mode

AssistantAgent does three things: LLM invocation, tool execution, and message buffer management. It holds state internally; once it joins a team, the team takes over message routing.

2. FunctionTool — Giving Agents Real Capabilities

from autogen_core.tools import FunctionTool

def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"{city}: 22°C, partly cloudy"

weather_tool = FunctionTool(
    get_weather,
    name="get_weather",
    description="Retrieve current weather conditions for a city"
)
Enter fullscreen mode Exit fullscreen mode

Type hints and the docstring are automatically converted into a JSON Schema. Here is what the generated schema actually looks like when you run it:

Tool Schema:
  name: get_weather
  description: Retrieve current weather conditions for a city
  parameters: {
    'city': {'description': 'city', 'title': 'City', 'type': 'string'}
  }
Enter fullscreen mode Exit fullscreen mode

The description feeds directly from the docstring, so write it clearly. A vague description causes the LLM to call the tool in unexpected ways.

3. Termination Conditions — Preventing Infinite Loops

from autogen_agentchat.conditions import (
    MaxMessageTermination,
    TextMentionTermination,
    TokenUsageTermination,
    TimeoutTermination,
)

# Combine with | (OR) or & (AND)
termination = (
    MaxMessageTermination(max_messages=10) | TextMentionTermination("TERMINATE")
)
Enter fullscreen mode Exit fullscreen mode

In practice the safest setup is a hard cap via MaxMessageTermination plus a task-completion signal via TextMentionTermination. That way a runaway conversation always stops eventually.

Four Team Types — When to Use Which

In AutoGen 0.7.x, a team determines the order and rules by which agents communicate.

AutoGen 0.7.x execution log — multi-agent code review session

RoundRobinGroupChat

Agents speak in order, one turn each. The most predictable pattern.

from autogen_agentchat.teams import RoundRobinGroupChat

team = RoundRobinGroupChat(
    participants=[developer, reviewer],
    termination_condition=MaxMessageTermination(4),
)
result = await team.run(task="Review this code: def add(a, b): return a + b")
Enter fullscreen mode Exit fullscreen mode

Here is the actual output from my run:

[USER] Is `def add(a, b): return a + b` production-ready Python?

[DEVELOPER]
...type hints like `int | float`, docstring, input validation...

[CODEREVIEWER]
...Union[int, float] for Python 3.9 compat... TERMINATE

[RESULT] Stop reason: Text 'TERMINATE' mentioned
[RESULT] Total messages: 3
Enter fullscreen mode Exit fullscreen mode

The Developer → Reviewer ordering was consistently enforced.

SelectorGroupChat

An LLM dynamically chooses who speaks next. Works well when roles are clearly differentiated.

from autogen_agentchat.teams import SelectorGroupChat

team = SelectorGroupChat(
    participants=[planner, coder, tester, reviewer],
    model_client=model_client,
    termination_condition=termination,
)
Enter fullscreen mode Exit fullscreen mode

More flexible than RoundRobin but harder to predict. When I have more than three agents, the dynamic selection pays off. For exactly two agents, RoundRobin is simpler and equally effective.

GraphFlow — The Standout New Feature in 0.7.x

DAG-based routing. Conditions determine which agent runs next.

from autogen_agentchat.teams import GraphFlow, DiGraphBuilder

builder = DiGraphBuilder()
builder.add_node(planner)
builder.add_node(coder)
builder.add_node(tester)

builder.add_edge(planner, coder)
builder.add_edge(coder, tester)

graph = builder.build()
team = GraphFlow(participants=[planner, coder, tester], graph=graph)
Enter fullscreen mode Exit fullscreen mode

Conditional edges are supported too. A feedback loop where a failing test sends execution back to the coder is expressible as a graph. For complex workflows this is far cleaner than hard-coding branching logic inside system prompts.

My honest take: the GraphFlow API is still a bit verbose. There is no equivalent of LangGraph's add_conditional_edges convenience method, so edge definitions get long. That said, explicit DAG routing in a Python agent framework is essentially unique to AutoGen. I compared this with LangGraph, CrewAI, and Dapr in the AI agent framework comparison post.

Swarm

Handoff-based routing. An agent decides "this task belongs to X, not me" and passes it along.

from autogen_agentchat.teams import Swarm
from autogen_agentchat.conditions import HandoffTermination

triage_agent = AssistantAgent(
    name="Triage",
    model_client=model_client,
    handoffs=["billing_agent", "technical_agent"],
)

team = Swarm(
    participants=[triage_agent, billing_agent, technical_agent],
    termination_condition=HandoffTermination(target="human") | MaxMessageTermination(10),
)
Enter fullscreen mode Exit fullscreen mode

Natural for customer support scenarios where the right agent depends on request type. Since the handoff decision is made by the LLM, the system prompt for each agent needs to define handoff criteria precisely.

Hierarchical Agents: SocietyOfMindAgent

The feature I found most interesting in 0.7.x. An entire agent team can be wrapped as a single agent and plugged into another team.

from autogen_agentchat.agents import SocietyOfMindAgent

inner_team = RoundRobinGroupChat(
    participants=[developer, tester],
    termination_condition=MaxMessageTermination(6),
)

coding_unit = SocietyOfMindAgent(
    name="CodingUnit",
    team=inner_team,
    model_client=model_client,
    response_prompt="Summarize the inner team discussion in one paragraph.",
)

outer_team = RoundRobinGroupChat(
    participants=[coding_unit, product_manager],
    termination_condition=MaxMessageTermination(4),
)
Enter fullscreen mode Exit fullscreen mode

From outside, coding_unit looks like a regular agent. Inside, a developer → tester loop is running. Only the summary surfaces to the outer team. The concept is similar to subagent orchestration in the Claude Agent SDK, but AutoGen makes the team structure more explicit in code.

Limitations I Hit in Practice

1. State is session-scoped

Agent memory in AutoGen 0.7.x only persists within a conversation session. There is no built-in cross-session memory. You need to wire in an external database or memory layer yourself.

2. Debugging is still awkward

Streaming with run_stream() shows each agent's messages, but seeing intermediate tool call results at a glance is difficult. Connecting an external tracing tool like Langfuse is practically essential. I covered the setup in the Langfuse self-hosted tracing guide.

3. Async only

Every API is async/await. Wrap with asyncio.run() for synchronous contexts, and be mindful of async handling when integrating with FastAPI or Django.

Complete Code — A Copy-Paste 2-Agent Review Team

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

async def main():
    model_client = AnthropicChatCompletionClient(
        model="claude-haiku-4-5-20251001",
    )

    developer = AssistantAgent(
        name="Developer",
        model_client=model_client,
        system_message="""You are a senior Python developer.
Suggest up to 3 code quality improvements, briefly.""",
    )

    reviewer = AssistantAgent(
        name="Reviewer",
        model_client=model_client,
        system_message="""You are a code reviewer.
Review the developer's suggestions, add your own comment, then say TERMINATE to end.""",
    )

    termination = (
        MaxMessageTermination(max_messages=6) |
        TextMentionTermination("TERMINATE")
    )

    team = RoundRobinGroupChat(
        participants=[developer, reviewer],
        termination_condition=termination,
    )

    async for message in team.run_stream(
        task="Review this code: def add(a, b): return a + b"
    ):
        from autogen_agentchat.base import TaskResult
        if not isinstance(message, TaskResult):
            print(f"[{message.source}]\n{message.content}\n")
        else:
            print(f"Stop reason: {message.stop_reason}")

    await model_client.close()

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Set ANTHROPIC_API_KEY in your environment and run it directly.

Testing Without an API Key: ReplayChatCompletionClient

To test agent logic without spending on API calls, ReplayChatCompletionClient returns predefined responses in sequence.

from autogen_ext.models.replay import ReplayChatCompletionClient
from autogen_core.models import CreateResult, RequestUsage

model_client = ReplayChatCompletionClient(
    [
        CreateResult(
            finish_reason="stop",
            content="I'd suggest adding type hints and a docstring.",
            usage=RequestUsage(prompt_tokens=50, completion_tokens=20),
            cached=False,
        ),
        CreateResult(
            finish_reason="stop",
            content="Agreed. Also worth considering Union types for broader compatibility. TERMINATE",
            usage=RequestUsage(prompt_tokens=70, completion_tokens=18),
            cached=False,
        ),
    ]
)
Enter fullscreen mode Exit fullscreen mode

Useful in unit tests and CI pipelines where you want to verify routing logic without a live LLM. One thing to watch: the replay client exhausts responses in order. If agents invoke the LLM more times than there are replay entries, you get StopIteration. Match MaxMessageTermination to the size of your replay list.

Migration Checklist: 0.2.x to 0.7.x

If you have existing 0.2.x code, these are the steps to work through:

  1. Package swap: pyautogen/autogenautogen-agentchat autogen-ext
  2. Import paths: from autogen import AssistantAgentfrom autogen_agentchat.agents import AssistantAgent
  3. Remove llm_config: Replace every llm_config dict with a model_client object
  4. UserProxyAgent role: In 0.7.x, UserProxyAgent no longer handles code execution. That is CodeExecutorAgent's job
  5. Go async: Replace all initiate_chat() calls with await team.run() or await team.run_stream()
  6. Explicit termination: The human_input_mode="NEVER" pattern is gone. Always pass a termination_condition to the team

The most time-consuming step is usually converting llm_config to model_client. If multiple agents use the same model, share a single client instance for efficiency.

When to Use AutoGen and When to Skip It

My honest position: AutoGen is strong when inter-agent collaboration protocols are complex. Team composition, cross-team routing, and hierarchical agent structures are first-class constructs in the API.

For a single agent with many tools, PydanticAI results in cleaner code — AutoGen's team abstraction becomes unnecessary overhead. The Python AI agent library comparison shows where each library fits.

If Kubernetes-level infrastructure durability is the concern, look at Dapr Agents instead. AutoGen focuses squarely on the agent conversation layer, not infrastructure.

Wrapping Up

AutoGen 0.7.x is a fundamentally different framework from the 0.2.x era. The new API is more explicit and type-safe. GraphFlow and SocietyOfMind are genuinely useful for complex multi-agent workflows — not just architectural showcases.

The ecosystem is still stabilizing. Official docs and examples are version-mixed, making the initial learning curve steeper than it needs to be. The practical first step: always check whether a search result targets 0.2.x or 0.7.x before copying code.


Test environment: macOS, Python 3.12.8, autogen-agentchat 0.7.5 (2026-05-19)

Install: pip install autogen-agentchat autogen-ext

Top comments (0)