Chung Duy

Posted on Feb 16

Building a Multi-Agent Orchestration System with AG2 (Agentic framework) and Local LLMs

#agents #ai #llm #tutorial

Ever wished you could simulate an entire software development team — a PM, architect, developer, code reviewer, and QA engineer — all collaborating on your project idea? In this tutorial, I'll walk you through building exactly that: a multi-agent orchestration system that transforms a simple project idea into a comprehensive, structured project plan.

We'll use AG2 (formerly AutoGen), a powerful multi-agent framework, paired with local LLMs running on Ollama or LM Studio. No cloud API keys needed.

What We're Building

Here's the big picture: you describe a project idea, and five AI agents take turns analyzing, designing, implementing, reviewing, and testing the plan — just like a real dev team would.

User Input (project idea)
    │
    ▼
   PM ──► Architect ──► Developer ──► Reviewer ──► QA
                            ▲              │
                            └──────────────┘
                      (REVISION NEEDED feedback loop)

Each agent has a specialized role, its own system prompt, and even its own LLM model configuration. The Reviewer can reject work back to the Developer, creating a realistic feedback loop.

Why multi-agent instead of a single prompt? A single LLM prompt trying to do requirements + architecture + implementation + review + testing would produce shallow, generic output. By splitting responsibilities across specialized agents, each one focuses deeply on its domain — and they build on each other's work through shared conversation history.

Prerequisites

Before we start, make sure you have:

Python 3.11+ installed
Ollama or LM Studio running locally with at least one model downloaded
Basic familiarity with Python and LLMs

Step 1: Project Setup

Create a new project directory and set up a virtual environment:

mkdir multi-agents && cd multi-agents
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

What's happening here? We create a folder for our project, then create an isolated Python environment (venv) so our dependencies don't conflict with other projects on your system. The source activate command switches your terminal into that isolated environment.

Install the dependencies:

pip install "ag2[ollama,openai]" python-dotenv

Why these packages?

ag2[ollama,openai] — This is the AG2 framework (Microsoft's AutoGen successor) with built-in Ollama and OpenAI integration. AG2 provides the core building blocks: agents, group chats, and orchestration logic. The [ollama] extra installs the adapter for talking to local Ollama models, and the [openai] extra is needed for LM Studio (which exposes an OpenAI-compatible API).

python-dotenv — A small utility that loads environment variables from a .env file. This lets us change LLM models and settings without modifying code.

Create a requirements.txt so others can reproduce your setup:

ag2[ollama,openai]
python-dotenv

Step 2: Configure Your LLM Provider

Create a .env file in your project root. This is where we tell the system which LLM provider and models to use.

Option A: Using Ollama

LLM_PROVIDER=ollama                    # Which LLM backend to use
LLM_BASE_URL=http://localhost:11434    # Ollama's default local address
REASONING_MODEL=qwen3:latest           # Model for analytical agents (PM, Architect, QA)
REASONING_TEMPERATURE=0.7              # Higher = more creative reasoning
CODE_MODEL=qwen3:latest               # Model for code-focused agents (Developer, Reviewer)
CODE_TEMPERATURE=0.3                   # Lower = more precise, deterministic code output
LLM_NUM_CTX=8192                       # Context window size in tokens

Option B: Using LM Studio

LLM_PROVIDER=lmstudio                          # Switch to LM Studio backend
LLM_BASE_URL=http://localhost:1234/v1           # LM Studio uses OpenAI-compatible endpoint
REASONING_MODEL=openai/gpt-oss-20b             # A larger model for complex reasoning
REASONING_TEMPERATURE=0.3                       # Lower temp for more consistent analysis
CODE_MODEL=qwen3-coder-next-mlx                # A code-specialized model
CODE_TEMPERATURE=0.1                            # Very low = highly focused code generation
LLM_NUM_CTX=60000                               # Larger context for complex projects

Why two different models? This is what we call a dual-model strategy. Not every agent needs the same kind of intelligence:

Reasoning agents (PM, Architect, QA) need to think analytically, weigh trade-offs, and make judgments. A higher temperature gives them more creative room.

Code agents (Developer, Reviewer) need precision and consistency. A very low temperature keeps them focused and reduces hallucination in code output.

What is temperature? It controls randomness in LLM output. 0.0 = always pick the most likely token (deterministic), 1.0 = more random/creative. For code, we want low randomness. For analysis, a bit more flexibility helps.

What is context window (LLM_NUM_CTX)? This is the maximum number of tokens the model can "see" at once — including the entire conversation history. Since all our agents share one conversation, a larger context window means agents can reference more of what previous agents said.

Now create config.py to load these settings and create LLM configuration objects:

import os
from dotenv import load_dotenv
from ag2 import LLMConfig

# Load variables from .env file into the environment.
# After this call, os.getenv("LLM_PROVIDER") will return "ollama" or "lmstudio"
# depending on what's in your .env file.
load_dotenv()

# Read each setting from the environment.
# The second argument to os.getenv() is a default value used if the variable isn't set.
provider = os.getenv("LLM_PROVIDER", "ollama")
base_url = os.getenv("LLM_BASE_URL", "http://localhost:11434")
num_ctx = int(os.getenv("LLM_NUM_CTX", "8192"))  # Convert string to integer

# Reasoning model settings — used by PM, Architect, and QA agents.
reasoning_model = os.getenv("REASONING_MODEL", "qwen3:latest")
reasoning_temp = float(os.getenv("REASONING_TEMPERATURE", "0.7"))

# Code model settings — used by Developer and Reviewer agents.
code_model = os.getenv("CODE_MODEL", "qwen3:latest")
code_temp = float(os.getenv("CODE_TEMPERATURE", "0.3"))

# Create LLMConfig objects based on the chosen provider.
# LLMConfig is AG2's way of telling agents how to connect to an LLM.
# We need different configurations because Ollama and LM Studio have different APIs.
if provider == "ollama":
    # Ollama uses its own API format with api_type="ollama" and client_host.
    reasoning_config = LLMConfig(
        model=reasoning_model,       # Which model to use
        api_type="ollama",           # Tell AG2 this is an Ollama backend
        client_host=base_url,        # Where Ollama is running
        temperature=reasoning_temp,  # Controls output randomness
        num_ctx=num_ctx,             # Context window size
    )
    code_config = LLMConfig(
        model=code_model,
        api_type="ollama",
        client_host=base_url,
        temperature=code_temp,
        num_ctx=num_ctx,
    )
else:
    # LM Studio exposes an OpenAI-compatible API, so we use api_key + base_url.
    # The api_key "lm-studio" is a dummy value — LM Studio doesn't require real auth.
    reasoning_config = LLMConfig(
        model=reasoning_model,
        api_key="lm-studio",         # Dummy key — LM Studio doesn't validate it
        base_url=base_url,           # Points to LM Studio's OpenAI-compatible endpoint
        temperature=reasoning_temp,
    )
    code_config = LLMConfig(
        model=code_model,
        api_key="lm-studio",
        base_url=base_url,
        temperature=code_temp,
    )

What does this file produce? Two objects — reasoning_config and code_config — that we'll import into other files. Think of them as "connection settings" that tell each agent which model to use and how to talk to it. By centralizing configuration here, changing a model is just editing .env — no code changes needed.

Step 3: Define the Agents

This is where things get interesting. Each agent is a ConversableAgent from AG2 — an autonomous entity that has its own personality (system prompt), its own LLM connection, and the ability to participate in group conversations.

Create agents.py:

from ag2 import ConversableAgent
from config import reasoning_config, code_config

What is ConversableAgent? It's AG2's core agent class. Each instance represents one "team member" that can:

Receive messages from other agents

Generate responses using its assigned LLM

Follow rules defined in its system prompt

Participate in group chats

The name "Conversable" means these agents are designed to have multi-turn conversations — they remember context and build on previous messages.

The Project Manager (PM)

The PM is the first agent to speak. It receives the user's raw project idea and transforms it into structured requirements:

PM_SYSTEM_MESSAGE = """You are a Senior Project Manager.
Your job is to analyze the user's project request and produce:
1. A clear list of functional and non-functional requirements
2. Project scope and boundaries (what's in, what's out)
3. A structured task breakdown with priorities

Format your response with these sections:
## Requirements
## Scope
## Task Breakdown

Be specific, practical, and prioritize MVP features.
Do NOT write any code. Focus on WHAT needs to be built, not HOW."""

Why this prompt structure? Notice three key design choices:

Role assignment ("You are a Senior Project Manager") — This anchors the LLM's behavior. It will respond as a PM, not a developer or general assistant.

Explicit output format (## Requirements, ## Scope, etc.) — By specifying exact markdown sections, we get consistent, parseable output every time. This matters because downstream agents need to find and reference specific sections.

Boundary instruction ("Do NOT write any code") — Without this, the LLM might jump ahead and start coding. We explicitly constrain each agent to its role.

The Architect

The Architect receives the PM's structured requirements and designs the technical blueprint:

ARCHITECT_SYSTEM_MESSAGE = """You are a Senior Software Architect.
Based on the PM's requirements, you must:
1. Propose a tech stack with justification for each choice
2. Design the system architecture (components, services, layers)
3. Define data models and their relationships
4. Describe the data flow and control flow

Format your response with these sections:
## Tech Stack
## Architecture
## Data Models
## Data Flow

Be practical and justify every technical decision.
Do NOT write implementation code — focus on design and structure."""

How does the Architect know what the PM said? All agents share the same conversation history through AG2's GroupChat. When it's the Architect's turn, it can see the full chat — including the user's original idea and the PM's analysis. The instruction "Based on the PM's requirements" tells the LLM to specifically reference and build upon the PM's output.

Why "justify every technical decision"? This produces higher-quality output. When forced to justify choices, the LLM is less likely to pick random technologies and more likely to consider actual trade-offs (e.g., "PostgreSQL for relational data with complex queries" vs. just "use PostgreSQL").

The Developer

The Developer takes the Architect's design and creates the concrete implementation plan:

DEVELOPER_SYSTEM_MESSAGE = """You are a Senior Full-Stack Developer.
Based on the Architect's design, you must:
1. Create a detailed file/folder structure
2. Write an implementation plan with clear ordering
3. Provide key code snippets for critical components
4. Define API endpoints with request/response formats

Format your response with these sections:
## File Structure
## Implementation Plan
## Key Code Snippets
## API Design

Write practical, production-ready code snippets.
Focus on critical paths and complex logic."""

Why "key code snippets" and not "full implementation"? A full implementation would be thousands of lines long and exceed the LLM's output limit. Instead, we ask for critical path code — the trickiest parts that a developer would actually need help with (auth middleware, database schemas, WebSocket handlers, etc.). The file structure and API design provide the roadmap for filling in the rest.

This agent uses code_config — the low-temperature, code-specialized model. This is where the dual-model strategy pays off: code snippets generated at temperature=0.1 are more syntactically correct and consistent than at 0.7.

The Reviewer

The Reviewer is the quality gate — the most important agent for ensuring plan quality:

REVIEWER_SYSTEM_MESSAGE = """You are a Senior Code Reviewer.
Review the entire plan (architecture + implementation) for:
1. Technical consistency between architecture and implementation
2. Feasibility — can this actually be built as described?
3. Missing pieces — gaps in the plan
4. Best practices — security, scalability, maintainability

Format your response with these sections:
## Review Summary
## Issues Found
## Suggestions
## Verdict

CRITICAL: End with exactly one of:
- "APPROVED" if the plan is solid
- "REVISION NEEDED: [specific issues]" if changes are required"""

The CRITICAL instruction is the most important line in the entire system. The words "APPROVED" and "REVISION NEEDED" aren't just text — they're control signals that our orchestrator checks to decide what happens next:

If the Reviewer says "APPROVED" → conversation moves forward to QA

If the Reviewer says "REVISION NEEDED" → conversation loops back to Developer for fixes

This is how we create a feedback loop using just keyword detection. The Reviewer essentially acts as a router, deciding whether the plan is ready or needs more work. This mirrors real code review workflows where PRs get approved or sent back with comments.

The QA Engineer

The QA agent provides the final sign-off with a testing strategy:

QA_SYSTEM_MESSAGE = """You are a Senior QA Engineer.
Create a comprehensive test strategy:
1. Define testing approach (unit, integration, e2e)
2. List key test cases for critical functionality
3. Define acceptance criteria for MVP
4. Recommend testing tools and frameworks

Format your response with these sections:
## Test Strategy
## Key Test Cases
## Acceptance Criteria
## Recommended Tools

End your response with exactly:
"FINAL SIGN-OFF: Project plan is complete."
"""

Why does QA need to say "FINAL SIGN-OFF" exactly? This phrase is the termination signal for the entire orchestration. Our chat manager (which we'll build in Step 4) constantly checks every message for this phrase. When it appears, the system knows the planning session is complete and stops the conversation. Without this, the agents would keep talking in circles.

We put the termination trigger on the QA agent because it's the last agent in the pipeline — only after requirements, architecture, implementation, AND review are all done should the session end.

Put It All Together

Now we create a factory function that instantiates all five agents and returns them:

def create_agents():
    # PM agent — receives user input, outputs structured requirements.
    # Uses reasoning_config because requirement analysis is analytical work.
    pm = ConversableAgent(
        name="pm",                              # Unique identifier used in routing
        system_message=PM_SYSTEM_MESSAGE,        # The "personality" and instructions
        description="Project Manager - analyzes requirements",  # Metadata for AG2
        human_input_mode="NEVER",                # Fully autonomous — no human prompts
        llm_config=reasoning_config,             # Connect to the reasoning model
    )

    # Architect agent — reads PM's output, designs technical architecture.
    # Uses reasoning_config because architecture requires analytical thinking.
    architect = ConversableAgent(
        name="architect",
        system_message=ARCHITECT_SYSTEM_MESSAGE,
        description="Architect - designs system architecture",
        human_input_mode="NEVER",
        llm_config=reasoning_config,
    )

    # Developer agent — reads Architect's design, creates implementation plan.
    # Uses code_config because this agent writes code snippets and technical specs.
    developer = ConversableAgent(
        name="developer",
        system_message=DEVELOPER_SYSTEM_MESSAGE,
        description="Developer - creates implementation plan",
        human_input_mode="NEVER",
        llm_config=code_config,                  # Code-specialized model
    )

    # Reviewer agent — reads everything above, approves or rejects.
    # Uses code_config because reviewing code requires precise technical judgment.
    reviewer = ConversableAgent(
        name="reviewer",
        system_message=REVIEWER_SYSTEM_MESSAGE,
        description="Reviewer - reviews and approves plans",
        human_input_mode="NEVER",
        llm_config=code_config,                  # Code-specialized model
    )

    # QA agent — creates test strategy and gives final sign-off.
    # Uses reasoning_config because test strategy is analytical/planning work.
    qa = ConversableAgent(
        name="qa",
        system_message=QA_SYSTEM_MESSAGE,
        description="QA - defines test strategy and sign-off",
        human_input_mode="NEVER",
        llm_config=reasoning_config,
    )

    return pm, architect, developer, reviewer, qa

Key parameters explained:

name — A unique string identifier. This is how the orchestrator knows which agent is which. It also appears in the chat log (e.g., "pm (to manager): ...").

system_message — The agent's "personality." This is prepended to every LLM call, so the model always knows its role.

description — Metadata used by AG2 internally. When send_introductions=True (which we'll set later), this text is shared with other agents so they know who their teammates are.

human_input_mode="NEVER" — This tells AG2 to never pause and ask a human for input. The agents run fully autonomously. Other options are "ALWAYS" (ask every turn) and "TERMINATE" (ask only at the end).

llm_config — Which LLM connection to use. This is where our dual-model strategy comes to life — different agents get different models and temperatures.

Step 4: Build the Orchestrator

This is the heart of the system. The orchestrator answers two fundamental questions: "Who speaks next?" and "When do we stop?"

Create orchestrator.py:

from ag2 import GroupChat, GroupChatManager
from agents import create_agents
from config import reasoning_config

# Create all five agents by calling our factory function.
# We unpack them into individual variables so we can reference them
# in the transition graph and speaker selection function.
pm, architect, developer, reviewer, qa = create_agents()

Why unpack into individual variables? We need to reference specific agents (like pm, reviewer) in our routing logic below. If we kept them in a list, the code would be less readable — agents[3] is much harder to understand than reviewer.

Define the Transition Graph

First, we declare which agent is allowed to speak after which. This creates a directed graph:

# This dictionary defines the "rules of conversation."
# Each key is an agent, and its value is a list of agents that can speak next.
# Think of it as a state machine: from state X, you can transition to states Y, Z.
allowed_transitions = {
    pm:        [architect],           # After PM speaks → only Architect can go next
    architect: [developer],           # After Architect → only Developer
    developer: [reviewer],            # After Developer → only Reviewer
    reviewer:  [developer, qa],       # After Reviewer → Developer (revise) OR QA (approve)
    qa:        [pm],                  # After QA → back to PM (but we'll terminate before this)
}

Why define transitions explicitly? Without this, AG2 would allow any agent to speak after any other agent. By constraining transitions, we ensure the conversation follows a logical workflow. The Reviewer having two possible next agents ([developer, qa]) is what creates our feedback loop — the actual choice between them is handled by the speaker selection function below.

Why does QA point back to PM? In practice, we terminate the conversation when QA speaks (via the "FINAL SIGN-OFF" signal). The qa: [pm] transition is just a safety fallback — if for some reason the termination doesn't trigger, the conversation loops back to the beginning rather than crashing.

Custom Speaker Selection

This function is called by AG2 after every message to determine who speaks next:

def select_next_speaker(last_speaker, groupchat):
    """Determine which agent speaks next based on who just spoke and what they said.

    Args:
        last_speaker: The agent object that just sent a message.
        groupchat: The GroupChat object containing the full message history.

    Returns:
        The next agent to speak, or None to end the conversation.
    """
    # Get the last message content and convert to lowercase for keyword matching.
    # We check keywords like "approved" to decide routing — case-insensitive
    # so it works whether the LLM outputs "APPROVED", "Approved", or "approved".
    last_msg = groupchat.messages[-1]["content"].lower()

    # Simple linear routing for most agents:
    if last_speaker == pm:
        return architect          # PM done → Architect designs
    elif last_speaker == architect:
        return developer          # Architecture done → Developer implements
    elif last_speaker == developer:
        return reviewer           # Implementation done → Reviewer checks quality

    # The critical branching point — Reviewer decides the path:
    elif last_speaker == reviewer:
        if "approved" in last_msg:
            return qa             # Plan approved → QA does final sign-off
        else:
            return developer      # Not approved → Developer must revise
            # This creates the feedback loop! The Developer will see the
            # Reviewer's feedback in the chat history and address the issues.

    # QA is the last agent — returning None signals "end of conversation"
    elif last_speaker == qa:
        return None

    return None  # Fallback: end conversation if something unexpected happens

Why deterministic routing instead of letting the LLM choose? AG2 supports speaker_selection_method="auto", where the LLM decides who speaks next. This sounds smart, but in practice:

The LLM might pick the wrong agent (e.g., QA before the Developer has spoken)

It adds an extra LLM call per turn just for routing (slower + more expensive)

The conversation order becomes unpredictable between runs

Our deterministic function gives us 100% predictable routing with one exception: the Reviewer's branch. And even that branch is controlled by a simple keyword check, not an LLM decision.

How does the feedback loop work in practice? When the Reviewer says "REVISION NEEDED: Missing input validation on the API endpoints," the conversation routes back to the Developer. The Developer sees the full history — including the Reviewer's feedback — and generates an updated implementation that addresses the issues. Then it goes back to the Reviewer, who checks again. This can repeat until the Reviewer says "APPROVED" or we hit the safety limit.

Create the GroupChat

Now we assemble everything into AG2's GroupChat — the container that holds our agents and conversation rules:

group_chat = GroupChat(
    # The list of all agents participating in this conversation.
    # Order doesn't matter here — routing is controlled by select_next_speaker.
    agents=[pm, architect, developer, reviewer, qa],

    # The transition graph we defined above.
    # This acts as a safety net: even if our speaker selection function has a bug,
    # AG2 will reject any transition not in this graph.
    allowed_or_disallowed_speaker_transitions=allowed_transitions,
    speaker_transitions_type="allowed",  # "allowed" means the dict defines PERMITTED transitions

    # Start with an empty message history. Messages accumulate as agents speak.
    messages=[],

    # Safety limit: stop after 15 messages maximum.
    # Without this, a picky Reviewer could send work back to the Developer
    # indefinitely, creating an infinite loop. 15 rounds is enough for
    # the full pipeline + a few revision cycles.
    max_round=15,

    # When True, each agent's description is shared with all others at the start.
    # This gives agents context about who their "teammates" are, leading to
    # better collaboration (e.g., the Architect knows a Reviewer will check its work).
    send_introductions=True,

    # Use our custom function instead of AG2's default LLM-based selection.
    speaker_selection_method=select_next_speaker,
)

What is GroupChat exactly? Think of it as a virtual meeting room. It holds:

A list of participants (agents)

The conversation rules (who can speak after whom)

The shared message history (all agents can read everything)

Settings like max rounds and speaker selection

The GroupChat itself doesn't run the conversation — that's the GroupChatManager's job (below). The GroupChat just defines the rules and holds the state.

The Chat Manager

The GroupChatManager is the "moderator" that actually runs the conversation:

manager = GroupChatManager(
    # Link to the GroupChat containing our agents and rules.
    groupchat=group_chat,

    # The manager itself needs an LLM config. Even though we use custom speaker
    # selection (so it doesn't need the LLM for routing), AG2 requires this.
    # We use reasoning_config since it's the more conservative configuration.
    llm_config=reasoning_config,

    # This lambda function is called after every message.
    # It checks if the message contains "final sign-off" (case-insensitive).
    # When QA outputs "FINAL SIGN-OFF: Project plan is complete.",
    # this returns True and the conversation stops gracefully.
    is_termination_msg=lambda msg: "final sign-off" in msg["content"].lower(),
)

How does is_termination_msg work? After every single message in the group chat, AG2 calls this function with the message. It's a simple lambda (one-line anonymous function) that:

Takes the message content: msg["content"]

Converts to lowercase: .lower()

Checks if "final sign-off" appears anywhere in the text

Returns True (stop) or False (continue)

This is why we told the QA agent to end with "FINAL SIGN-OFF: Project plan is complete." in its system prompt — it's the trigger that tells the manager the session is done.

What happens if QA doesn't say "FINAL SIGN-OFF"? The max_round=15 safety limit kicks in. After 15 messages, the conversation stops regardless. This prevents the system from running forever if the LLM doesn't follow instructions perfectly.

Step 5: Create the Entry Point

Finally, create main.py — the script that ties everything together and provides the user interface:

from orchestrator import pm, manager

def main():
    # Display a simple banner so the user knows what they're running.
    print("=" * 60)
    print("  Multi-Agent Software Project Planner")
    print("=" * 60)

    # Provide a default project idea so users can quickly test the system
    # without having to think of an idea first.
    default_idea = (
        "Build a REST API for a task management app with user auth, "
        "CRUD operations, and real-time notifications"
    )

    # Prompt the user for their project idea.
    # If they press Enter without typing anything, we use the default.
    user_input = input(f"\nDescribe your project idea (Enter for default):\n> ").strip()
    if not user_input:
        user_input = default_idea
        print(f"\nUsing default: {default_idea}")

    print("\n" + "=" * 60)
    print("  Starting Planning Session...")
    print("=" * 60 + "\n")

    # This is where the magic happens!
    # pm.initiate_chat() does the following:
    # 1. Sends the user's project idea as the first message
    # 2. The PM agent processes it and generates its response (requirements)
    # 3. The manager takes over, calling select_next_speaker() after each message
    # 4. Agents take turns: PM → Architect → Developer → Reviewer → QA
    # 5. If Reviewer rejects, it loops: Developer → Reviewer → Developer → ...
    # 6. When QA says "FINAL SIGN-OFF", is_termination_msg returns True and it stops
    # 7. The entire conversation history is returned in `result`
    result = pm.initiate_chat(
        manager,        # The GroupChatManager that orchestrates the conversation
        message=user_input,  # The user's project idea becomes the first message
    )

    # Display a summary after the session ends.
    # result.chat_history contains every message from every agent.
    # result.cost tracks token usage / API costs (useful for cloud LLMs).
    print("\n" + "=" * 60)
    print("  Session Complete!")
    print(f"  Messages: {len(result.chat_history)}")
    print(f"  Cost: {result.cost}")
    print("=" * 60)

# Standard Python idiom: only run main() when this file is executed directly,
# not when it's imported by another file.
if __name__ == "__main__":
    main()

What does pm.initiate_chat(manager, message=...) actually do under the hood?

This single line triggers the entire multi-agent pipeline:

The PM receives user_input as a message

The PM calls its LLM with: system prompt + the user's message

The PM's response is added to GroupChat.messages

The manager calls select_next_speaker(pm, groupchat) → returns architect

The Architect calls its LLM with: system prompt + entire chat history so far

Repeat steps 3-5 for each agent in sequence

Eventually QA speaks, is_termination_msg returns True, and the loop ends

Every agent sees the full conversation history when generating its response. This means the Developer can reference both the PM's requirements AND the Architect's design. This shared context is what makes the agents feel like they're truly collaborating.

Step 6: Run It!

Make sure your LLM provider is running (Ollama or LM Studio), then:

python main.py

You'll see something like:

============================================================
  Multi-Agent Software Project Planner
============================================================

Describe your project idea (Enter for default):
> Build a real-time chat application with rooms and file sharing

============================================================
  Starting Planning Session...
============================================================

pm (to manager):
## Requirements
- User registration and authentication
- Real-time messaging with WebSocket support
- Chat rooms (public and private)
- File upload and sharing within rooms
...

architect (to manager):
## Tech Stack
- Backend: Node.js with Express + Socket.io
- Database: PostgreSQL for users/rooms, Redis for pub/sub
- Storage: MinIO for file uploads
...

developer (to manager):
## File Structure
├── src/
│   ├── controllers/
│   ├── models/
│   ├── middleware/
│   ├── services/
│   └── websocket/
...

reviewer (to manager):
## Verdict
APPROVED
...

qa (to manager):
## Test Strategy
...
FINAL SIGN-OFF: Project plan is complete.

============================================================
  Session Complete!
  Messages: 6
============================================================

What to observe: Notice how each agent builds on the previous one's work. The Architect references the PM's requirements. The Developer follows the Architect's tech stack choices. The Reviewer checks consistency between all of them. And QA creates test cases that match the actual implementation plan. This emergent collaboration happens naturally through shared conversation history.

How It All Fits Together

Here's the final project structure:

multi-agents/
├── .env                # LLM provider configuration (models, URLs, temperatures)
├── config.py           # Reads .env → creates reasoning_config and code_config
├── agents.py           # Defines 5 agents with specialized system prompts
├── orchestrator.py     # Wires agents into GroupChat with routing + termination
├── main.py             # Entry point — takes user input, starts the session
└── requirements.txt    # Python dependencies (ag2, python-dotenv)

The data flow through these files:

.env  ──►  config.py  ──►  agents.py  ──►  orchestrator.py  ──►  main.py
(settings)  (LLMConfig)    (5 agents)     (GroupChat +        (user input
                                           Manager +           + run loop)
                                           routing logic)

.env holds all configurable settings (models, temperatures, URLs)
config.py reads .env and creates two LLMConfig objects
agents.py imports configs and creates five specialized ConversableAgent instances
orchestrator.py imports agents, defines the transition graph and speaker selection, creates GroupChat + GroupChatManager
main.py imports the PM and manager, gets user input, and kicks off the conversation

Key Takeaways

1. Deterministic Routing > LLM-Based Routing

Letting the LLM decide who speaks next sounds flexible, but in practice it leads to unpredictable behavior — agents speaking out of turn, skipping steps, or getting stuck in loops. Our custom select_next_speaker() function gives us full control over the conversation flow while still allowing dynamic branching (the Reviewer's approve/revise decision).

2. Dual-Model Strategy

Not every agent needs the same model. Analytical agents (PM, Architect, QA) benefit from reasoning-focused models with moderate temperature, while implementation agents (Developer, Reviewer) need precision with low temperature. Splitting configurations lets you optimize both quality and cost — use a cheaper model for simple tasks, a better one for complex reasoning.

3. Structured Output Formats

Each agent's system prompt specifies exact output sections (## Requirements, ## Tech Stack, etc.). This isn't just about readability — it makes outputs consistent and parseable. When the Developer needs to reference the Architect's tech stack, it knows exactly where to look in the conversation. Structured outputs also make it easier to extract and save results programmatically.

4. Keyword-Driven Control Flow

The Reviewer's "APPROVED" / "REVISION NEEDED" and QA's "FINAL SIGN-OFF" are more than just text — they're control signals that drive the orchestration logic. This is a simple but powerful pattern: use natural language keywords as routing triggers. The LLM generates them naturally as part of its response, and our code checks for them to make routing decisions. No complex parsing or additional LLM calls needed.

5. Safety Mechanisms Matter

The max_round=15 limit prevents infinite revision loops. Without it, a picky Reviewer could keep sending work back to the Developer forever, burning tokens and time. Always build in safety limits for multi-agent systems. Other safety patterns include timeout limits, cost caps, and fallback behaviors.

Source code

The complete source code for this project is available on GitHub.

DEV Community