<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chung Duy</title>
    <description>The latest articles on DEV Community by Chung Duy (@chung_duy_51a346946b27a3d).</description>
    <link>https://dev.to/chung_duy_51a346946b27a3d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3692659%2F8813d9cb-b85a-4fce-a811-701637eb094f.png</url>
      <title>DEV Community: Chung Duy</title>
      <link>https://dev.to/chung_duy_51a346946b27a3d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chung_duy_51a346946b27a3d"/>
    <language>en</language>
    <item>
      <title>Building a Multi-Agent Orchestration System with AG2 (Agentic framework) and Local LLMs</title>
      <dc:creator>Chung Duy</dc:creator>
      <pubDate>Mon, 16 Feb 2026 09:32:20 +0000</pubDate>
      <link>https://dev.to/chung_duy_51a346946b27a3d/building-a-multi-agent-orchestration-system-with-ag2-agentic-framework-and-local-llms-4d3g</link>
      <guid>https://dev.to/chung_duy_51a346946b27a3d/building-a-multi-agent-orchestration-system-with-ag2-agentic-framework-and-local-llms-4d3g</guid>
      <description>&lt;p&gt;Ever wished you could simulate an entire software development team — a PM, architect, developer, code reviewer, and QA engineer — all collaborating on your project idea? In this tutorial, I'll walk you through building exactly that: a &lt;strong&gt;multi-agent orchestration system&lt;/strong&gt; that transforms a simple project idea into a comprehensive, structured project plan.&lt;/p&gt;

&lt;p&gt;We'll use &lt;strong&gt;&lt;a href="https://github.com/ag2ai/ag2" rel="noopener noreferrer"&gt;AG2&lt;/a&gt;&lt;/strong&gt; (formerly AutoGen), a powerful multi-agent framework, paired with &lt;strong&gt;local LLMs&lt;/strong&gt; running on Ollama or LM Studio. No cloud API keys needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;Here's the big picture: you describe a project idea, and five AI agents take turns analyzing, designing, implementing, reviewing, and testing the plan — just like a real dev team would.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input (project idea)
    │
    ▼
   PM ──► Architect ──► Developer ──► Reviewer ──► QA
                            ▲              │
                            └──────────────┘
                      (REVISION NEEDED feedback loop)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent has a specialized role, its own system prompt, and even its own LLM model configuration. The Reviewer can reject work back to the Developer, creating a realistic feedback loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why multi-agent instead of a single prompt?&lt;/strong&gt; A single LLM prompt trying to do requirements + architecture + implementation + review + testing would produce shallow, generic output. By splitting responsibilities across specialized agents, each one focuses deeply on its domain — and they build on each other's work through shared conversation history.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before we start, make sure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.11+&lt;/strong&gt; installed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; or &lt;strong&gt;LM Studio&lt;/strong&gt; running locally with at least one model downloaded&lt;/li&gt;
&lt;li&gt;Basic familiarity with Python and LLMs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 1: Project Setup
&lt;/h2&gt;

&lt;p&gt;Create a new project directory and set up a virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;multi-agents &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;multi-agents
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate  &lt;span class="c"&gt;# On Windows: venv\Scripts\activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What's happening here?&lt;/strong&gt; We create a folder for our project, then create an isolated Python environment (&lt;code&gt;venv&lt;/code&gt;) so our dependencies don't conflict with other projects on your system. The &lt;code&gt;source activate&lt;/code&gt; command switches your terminal into that isolated environment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Install the dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"ag2[ollama,openai]"&lt;/span&gt; python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why these packages?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ag2[ollama,openai]&lt;/code&gt; — This is the AG2 framework (Microsoft's AutoGen successor) with built-in Ollama and OpenAI integration. AG2 provides the core building blocks: agents, group chats, and orchestration logic. The &lt;code&gt;[ollama]&lt;/code&gt; extra installs the adapter for talking to local Ollama models, and the &lt;code&gt;[openai]&lt;/code&gt; extra is needed for LM Studio (which exposes an OpenAI-compatible API).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;python-dotenv&lt;/code&gt; — A small utility that loads environment variables from a &lt;code&gt;.env&lt;/code&gt; file. This lets us change LLM models and settings without modifying code.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Create a &lt;code&gt;requirements.txt&lt;/code&gt; so others can reproduce your setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ag2[ollama,openai]
python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Configure Your LLM Provider
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file in your project root. This is where we tell the system which LLM provider and models to use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Using Ollama&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM_PROVIDER=ollama                    # Which LLM backend to use
LLM_BASE_URL=http://localhost:11434    # Ollama's default local address
REASONING_MODEL=qwen3:latest           # Model for analytical agents (PM, Architect, QA)
REASONING_TEMPERATURE=0.7              # Higher = more creative reasoning
CODE_MODEL=qwen3:latest               # Model for code-focused agents (Developer, Reviewer)
CODE_TEMPERATURE=0.3                   # Lower = more precise, deterministic code output
LLM_NUM_CTX=8192                       # Context window size in tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B: Using LM Studio&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM_PROVIDER=lmstudio                          # Switch to LM Studio backend
LLM_BASE_URL=http://localhost:1234/v1           # LM Studio uses OpenAI-compatible endpoint
REASONING_MODEL=openai/gpt-oss-20b             # A larger model for complex reasoning
REASONING_TEMPERATURE=0.3                       # Lower temp for more consistent analysis
CODE_MODEL=qwen3-coder-next-mlx                # A code-specialized model
CODE_TEMPERATURE=0.1                            # Very low = highly focused code generation
LLM_NUM_CTX=60000                               # Larger context for complex projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why two different models?&lt;/strong&gt; This is what we call a &lt;strong&gt;dual-model strategy&lt;/strong&gt;. Not every agent needs the same kind of intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning agents&lt;/strong&gt; (PM, Architect, QA) need to think analytically, weigh trade-offs, and make judgments. A higher temperature gives them more creative room.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code agents&lt;/strong&gt; (Developer, Reviewer) need precision and consistency. A very low temperature keeps them focused and reduces hallucination in code output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What is temperature?&lt;/strong&gt; It controls randomness in LLM output. &lt;code&gt;0.0&lt;/code&gt; = always pick the most likely token (deterministic), &lt;code&gt;1.0&lt;/code&gt; = more random/creative. For code, we want low randomness. For analysis, a bit more flexibility helps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is context window (&lt;code&gt;LLM_NUM_CTX&lt;/code&gt;)?&lt;/strong&gt; This is the maximum number of tokens the model can "see" at once — including the entire conversation history. Since all our agents share one conversation, a larger context window means agents can reference more of what previous agents said.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now create &lt;code&gt;config.py&lt;/code&gt; to load these settings and create LLM configuration objects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ag2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMConfig&lt;/span&gt;

&lt;span class="c1"&gt;# Load variables from .env file into the environment.
# After this call, os.getenv("LLM_PROVIDER") will return "ollama" or "lmstudio"
# depending on what's in your .env file.
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Read each setting from the environment.
# The second argument to os.getenv() is a default value used if the variable isn't set.
&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM_PROVIDER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM_BASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;num_ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM_NUM_CTX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8192&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Convert string to integer
&lt;/span&gt;
&lt;span class="c1"&gt;# Reasoning model settings — used by PM, Architect, and QA agents.
&lt;/span&gt;&lt;span class="n"&gt;reasoning_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REASONING_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;reasoning_temp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REASONING_TEMPERATURE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Code model settings — used by Developer and Reviewer agents.
&lt;/span&gt;&lt;span class="n"&gt;code_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CODE_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;code_temp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CODE_TEMPERATURE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Create LLMConfig objects based on the chosen provider.
# LLMConfig is AG2's way of telling agents how to connect to an LLM.
# We need different configurations because Ollama and LM Studio have different APIs.
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Ollama uses its own API format with api_type="ollama" and client_host.
&lt;/span&gt;    &lt;span class="n"&gt;reasoning_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Which model to use
&lt;/span&gt;        &lt;span class="n"&gt;api_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Tell AG2 this is an Ollama backend
&lt;/span&gt;        &lt;span class="n"&gt;client_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Where Ollama is running
&lt;/span&gt;        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Controls output randomness
&lt;/span&gt;        &lt;span class="n"&gt;num_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# Context window size
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;code_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;api_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;client_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code_temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;num_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# LM Studio exposes an OpenAI-compatible API, so we use api_key + base_url.
&lt;/span&gt;    &lt;span class="c1"&gt;# The api_key "lm-studio" is a dummy value — LM Studio doesn't require real auth.
&lt;/span&gt;    &lt;span class="n"&gt;reasoning_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lm-studio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# Dummy key — LM Studio doesn't validate it
&lt;/span&gt;        &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Points to LM Studio's OpenAI-compatible endpoint
&lt;/span&gt;        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;code_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lm-studio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code_temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What does this file produce?&lt;/strong&gt; Two objects — &lt;code&gt;reasoning_config&lt;/code&gt; and &lt;code&gt;code_config&lt;/code&gt; — that we'll import into other files. Think of them as "connection settings" that tell each agent which model to use and how to talk to it. By centralizing configuration here, changing a model is just editing &lt;code&gt;.env&lt;/code&gt; — no code changes needed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 3: Define the Agents
&lt;/h2&gt;

&lt;p&gt;This is where things get interesting. Each agent is a &lt;code&gt;ConversableAgent&lt;/code&gt; from AG2 — an autonomous entity that has its own personality (system prompt), its own LLM connection, and the ability to participate in group conversations.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;agents.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ag2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversableAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;reasoning_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code_config&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What is &lt;code&gt;ConversableAgent&lt;/code&gt;?&lt;/strong&gt; It's AG2's core agent class. Each instance represents one "team member" that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receive messages from other agents&lt;/li&gt;
&lt;li&gt;Generate responses using its assigned LLM&lt;/li&gt;
&lt;li&gt;Follow rules defined in its system prompt&lt;/li&gt;
&lt;li&gt;Participate in group chats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The name "Conversable" means these agents are designed to have multi-turn conversations — they remember context and build on previous messages.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Project Manager (PM)
&lt;/h3&gt;

&lt;p&gt;The PM is the first agent to speak. It receives the user's raw project idea and transforms it into structured requirements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PM_SYSTEM_MESSAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a Senior Project Manager.
Your job is to analyze the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s project request and produce:
1. A clear list of functional and non-functional requirements
2. Project scope and boundaries (what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s in, what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s out)
3. A structured task breakdown with priorities

Format your response with these sections:
## Requirements
## Scope
## Task Breakdown

Be specific, practical, and prioritize MVP features.
Do NOT write any code. Focus on WHAT needs to be built, not HOW.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why this prompt structure?&lt;/strong&gt; Notice three key design choices:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Role assignment&lt;/strong&gt; (&lt;code&gt;"You are a Senior Project Manager"&lt;/code&gt;) — This anchors the LLM's behavior. It will respond as a PM, not a developer or general assistant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit output format&lt;/strong&gt; (&lt;code&gt;## Requirements&lt;/code&gt;, &lt;code&gt;## Scope&lt;/code&gt;, etc.) — By specifying exact markdown sections, we get consistent, parseable output every time. This matters because downstream agents need to find and reference specific sections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boundary instruction&lt;/strong&gt; (&lt;code&gt;"Do NOT write any code"&lt;/code&gt;) — Without this, the LLM might jump ahead and start coding. We explicitly constrain each agent to its role.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Architect
&lt;/h3&gt;

&lt;p&gt;The Architect receives the PM's structured requirements and designs the technical blueprint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ARCHITECT_SYSTEM_MESSAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a Senior Software Architect.
Based on the PM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s requirements, you must:
1. Propose a tech stack with justification for each choice
2. Design the system architecture (components, services, layers)
3. Define data models and their relationships
4. Describe the data flow and control flow

Format your response with these sections:
## Tech Stack
## Architecture
## Data Models
## Data Flow

Be practical and justify every technical decision.
Do NOT write implementation code — focus on design and structure.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How does the Architect know what the PM said?&lt;/strong&gt; All agents share the same conversation history through AG2's &lt;code&gt;GroupChat&lt;/code&gt;. When it's the Architect's turn, it can see the full chat — including the user's original idea and the PM's analysis. The instruction &lt;code&gt;"Based on the PM's requirements"&lt;/code&gt; tells the LLM to specifically reference and build upon the PM's output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why "justify every technical decision"?&lt;/strong&gt; This produces higher-quality output. When forced to justify choices, the LLM is less likely to pick random technologies and more likely to consider actual trade-offs (e.g., "PostgreSQL for relational data with complex queries" vs. just "use PostgreSQL").&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Developer
&lt;/h3&gt;

&lt;p&gt;The Developer takes the Architect's design and creates the concrete implementation plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DEVELOPER_SYSTEM_MESSAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a Senior Full-Stack Developer.
Based on the Architect&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s design, you must:
1. Create a detailed file/folder structure
2. Write an implementation plan with clear ordering
3. Provide key code snippets for critical components
4. Define API endpoints with request/response formats

Format your response with these sections:
## File Structure
## Implementation Plan
## Key Code Snippets
## API Design

Write practical, production-ready code snippets.
Focus on critical paths and complex logic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why "key code snippets" and not "full implementation"?&lt;/strong&gt; A full implementation would be thousands of lines long and exceed the LLM's output limit. Instead, we ask for &lt;strong&gt;critical path code&lt;/strong&gt; — the trickiest parts that a developer would actually need help with (auth middleware, database schemas, WebSocket handlers, etc.). The file structure and API design provide the roadmap for filling in the rest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This agent uses &lt;code&gt;code_config&lt;/code&gt;&lt;/strong&gt; — the low-temperature, code-specialized model. This is where the dual-model strategy pays off: code snippets generated at &lt;code&gt;temperature=0.1&lt;/code&gt; are more syntactically correct and consistent than at &lt;code&gt;0.7&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Reviewer
&lt;/h3&gt;

&lt;p&gt;The Reviewer is the quality gate — the most important agent for ensuring plan quality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;REVIEWER_SYSTEM_MESSAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a Senior Code Reviewer.
Review the entire plan (architecture + implementation) for:
1. Technical consistency between architecture and implementation
2. Feasibility — can this actually be built as described?
3. Missing pieces — gaps in the plan
4. Best practices — security, scalability, maintainability

Format your response with these sections:
## Review Summary
## Issues Found
## Suggestions
## Verdict

CRITICAL: End with exactly one of:
- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;APPROVED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; if the plan is solid
- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REVISION NEEDED: [specific issues]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; if changes are required&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The &lt;code&gt;CRITICAL&lt;/code&gt; instruction is the most important line in the entire system.&lt;/strong&gt; The words "APPROVED" and "REVISION NEEDED" aren't just text — they're &lt;strong&gt;control signals&lt;/strong&gt; that our orchestrator checks to decide what happens next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the Reviewer says "APPROVED" → conversation moves forward to QA&lt;/li&gt;
&lt;li&gt;If the Reviewer says "REVISION NEEDED" → conversation loops back to Developer for fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how we create a &lt;strong&gt;feedback loop&lt;/strong&gt; using just keyword detection. The Reviewer essentially acts as a router, deciding whether the plan is ready or needs more work. This mirrors real code review workflows where PRs get approved or sent back with comments.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The QA Engineer
&lt;/h3&gt;

&lt;p&gt;The QA agent provides the final sign-off with a testing strategy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;QA_SYSTEM_MESSAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a Senior QA Engineer.
Create a comprehensive test strategy:
1. Define testing approach (unit, integration, e2e)
2. List key test cases for critical functionality
3. Define acceptance criteria for MVP
4. Recommend testing tools and frameworks

Format your response with these sections:
## Test Strategy
## Key Test Cases
## Acceptance Criteria
## Recommended Tools

End your response with exactly:
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FINAL SIGN-OFF: Project plan is complete.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why does QA need to say "FINAL SIGN-OFF" exactly?&lt;/strong&gt; This phrase is the &lt;strong&gt;termination signal&lt;/strong&gt; for the entire orchestration. Our chat manager (which we'll build in Step 4) constantly checks every message for this phrase. When it appears, the system knows the planning session is complete and stops the conversation. Without this, the agents would keep talking in circles.&lt;/p&gt;

&lt;p&gt;We put the termination trigger on the QA agent because it's the last agent in the pipeline — only after requirements, architecture, implementation, AND review are all done should the session end.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Put It All Together
&lt;/h3&gt;

&lt;p&gt;Now we create a factory function that instantiates all five agents and returns them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_agents&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# PM agent — receives user input, outputs structured requirements.
&lt;/span&gt;    &lt;span class="c1"&gt;# Uses reasoning_config because requirement analysis is analytical work.
&lt;/span&gt;    &lt;span class="n"&gt;pm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversableAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                              &lt;span class="c1"&gt;# Unique identifier used in routing
&lt;/span&gt;        &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PM_SYSTEM_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# The "personality" and instructions
&lt;/span&gt;        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Project Manager - analyzes requirements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Metadata for AG2
&lt;/span&gt;        &lt;span class="n"&gt;human_input_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;# Fully autonomous — no human prompts
&lt;/span&gt;        &lt;span class="n"&gt;llm_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;             &lt;span class="c1"&gt;# Connect to the reasoning model
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Architect agent — reads PM's output, designs technical architecture.
&lt;/span&gt;    &lt;span class="c1"&gt;# Uses reasoning_config because architecture requires analytical thinking.
&lt;/span&gt;    &lt;span class="n"&gt;architect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversableAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ARCHITECT_SYSTEM_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Architect - designs system architecture&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;human_input_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Developer agent — reads Architect's design, creates implementation plan.
&lt;/span&gt;    &lt;span class="c1"&gt;# Uses code_config because this agent writes code snippets and technical specs.
&lt;/span&gt;    &lt;span class="n"&gt;developer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversableAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DEVELOPER_SYSTEM_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Developer - creates implementation plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;human_input_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="c1"&gt;# Code-specialized model
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Reviewer agent — reads everything above, approves or rejects.
&lt;/span&gt;    &lt;span class="c1"&gt;# Uses code_config because reviewing code requires precise technical judgment.
&lt;/span&gt;    &lt;span class="n"&gt;reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversableAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;REVIEWER_SYSTEM_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reviewer - reviews and approves plans&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;human_input_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                  &lt;span class="c1"&gt;# Code-specialized model
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# QA agent — creates test strategy and gives final sign-off.
&lt;/span&gt;    &lt;span class="c1"&gt;# Uses reasoning_config because test strategy is analytical/planning work.
&lt;/span&gt;    &lt;span class="n"&gt;qa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversableAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;QA_SYSTEM_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QA - defines test strategy and sign-off&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;human_input_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key parameters explained:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;name&lt;/code&gt; — A unique string identifier. This is how the orchestrator knows which agent is which. It also appears in the chat log (e.g., &lt;code&gt;"pm (to manager): ..."&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;system_message&lt;/code&gt; — The agent's "personality." This is prepended to every LLM call, so the model always knows its role.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;description&lt;/code&gt; — Metadata used by AG2 internally. When &lt;code&gt;send_introductions=True&lt;/code&gt; (which we'll set later), this text is shared with other agents so they know who their teammates are.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;human_input_mode="NEVER"&lt;/code&gt; — This tells AG2 to never pause and ask a human for input. The agents run fully autonomously. Other options are &lt;code&gt;"ALWAYS"&lt;/code&gt; (ask every turn) and &lt;code&gt;"TERMINATE"&lt;/code&gt; (ask only at the end).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;llm_config&lt;/code&gt; — Which LLM connection to use. This is where our dual-model strategy comes to life — different agents get different models and temperatures.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 4: Build the Orchestrator
&lt;/h2&gt;

&lt;p&gt;This is the heart of the system. The orchestrator answers two fundamental questions: &lt;strong&gt;"Who speaks next?"&lt;/strong&gt; and &lt;strong&gt;"When do we stop?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;orchestrator.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ag2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GroupChat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GroupChatManager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_agents&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;reasoning_config&lt;/span&gt;

&lt;span class="c1"&gt;# Create all five agents by calling our factory function.
# We unpack them into individual variables so we can reference them
# in the transition graph and speaker selection function.
&lt;/span&gt;&lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agents&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why unpack into individual variables?&lt;/strong&gt; We need to reference specific agents (like &lt;code&gt;pm&lt;/code&gt;, &lt;code&gt;reviewer&lt;/code&gt;) in our routing logic below. If we kept them in a list, the code would be less readable — &lt;code&gt;agents[3]&lt;/code&gt; is much harder to understand than &lt;code&gt;reviewer&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Define the Transition Graph
&lt;/h3&gt;

&lt;p&gt;First, we declare which agent is allowed to speak after which. This creates a directed graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This dictionary defines the "rules of conversation."
# Each key is an agent, and its value is a list of agents that can speak next.
# Think of it as a state machine: from state X, you can transition to states Y, Z.
&lt;/span&gt;&lt;span class="n"&gt;allowed_transitions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;           &lt;span class="c1"&gt;# After PM speaks → only Architect can go next
&lt;/span&gt;    &lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;           &lt;span class="c1"&gt;# After Architect → only Developer
&lt;/span&gt;    &lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;            &lt;span class="c1"&gt;# After Developer → only Reviewer
&lt;/span&gt;    &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;       &lt;span class="c1"&gt;# After Reviewer → Developer (revise) OR QA (approve)
&lt;/span&gt;    &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;                  &lt;span class="c1"&gt;# After QA → back to PM (but we'll terminate before this)
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why define transitions explicitly?&lt;/strong&gt; Without this, AG2 would allow any agent to speak after any other agent. By constraining transitions, we ensure the conversation follows a logical workflow. The Reviewer having two possible next agents (&lt;code&gt;[developer, qa]&lt;/code&gt;) is what creates our feedback loop — the actual choice between them is handled by the speaker selection function below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does QA point back to PM?&lt;/strong&gt; In practice, we terminate the conversation when QA speaks (via the "FINAL SIGN-OFF" signal). The &lt;code&gt;qa: [pm]&lt;/code&gt; transition is just a safety fallback — if for some reason the termination doesn't trigger, the conversation loops back to the beginning rather than crashing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Custom Speaker Selection
&lt;/h3&gt;

&lt;p&gt;This function is called by AG2 after every message to determine who speaks next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;select_next_speaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_speaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupchat&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Determine which agent speaks next based on who just spoke and what they said.

    Args:
        last_speaker: The agent object that just sent a message.
        groupchat: The GroupChat object containing the full message history.

    Returns:
        The next agent to speak, or None to end the conversation.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Get the last message content and convert to lowercase for keyword matching.
&lt;/span&gt;    &lt;span class="c1"&gt;# We check keywords like "approved" to decide routing — case-insensitive
&lt;/span&gt;    &lt;span class="c1"&gt;# so it works whether the LLM outputs "APPROVED", "Approved", or "approved".
&lt;/span&gt;    &lt;span class="n"&gt;last_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;groupchat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Simple linear routing for most agents:
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;last_speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;architect&lt;/span&gt;          &lt;span class="c1"&gt;# PM done → Architect designs
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;last_speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;developer&lt;/span&gt;          &lt;span class="c1"&gt;# Architecture done → Developer implements
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;last_speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;           &lt;span class="c1"&gt;# Implementation done → Reviewer checks quality
&lt;/span&gt;
    &lt;span class="c1"&gt;# The critical branching point — Reviewer decides the path:
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;last_speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;             &lt;span class="c1"&gt;# Plan approved → QA does final sign-off
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;developer&lt;/span&gt;      &lt;span class="c1"&gt;# Not approved → Developer must revise
&lt;/span&gt;            &lt;span class="c1"&gt;# This creates the feedback loop! The Developer will see the
&lt;/span&gt;            &lt;span class="c1"&gt;# Reviewer's feedback in the chat history and address the issues.
&lt;/span&gt;
    &lt;span class="c1"&gt;# QA is the last agent — returning None signals "end of conversation"
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;last_speaker&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# Fallback: end conversation if something unexpected happens
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why deterministic routing instead of letting the LLM choose?&lt;/strong&gt; AG2 supports &lt;code&gt;speaker_selection_method="auto"&lt;/code&gt;, where the LLM decides who speaks next. This sounds smart, but in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The LLM might pick the wrong agent (e.g., QA before the Developer has spoken)&lt;/li&gt;
&lt;li&gt;It adds an extra LLM call per turn just for routing (slower + more expensive)&lt;/li&gt;
&lt;li&gt;The conversation order becomes unpredictable between runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our deterministic function gives us &lt;strong&gt;100% predictable routing&lt;/strong&gt; with one exception: the Reviewer's branch. And even that branch is controlled by a simple keyword check, not an LLM decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does the feedback loop work in practice?&lt;/strong&gt; When the Reviewer says "REVISION NEEDED: Missing input validation on the API endpoints," the conversation routes back to the Developer. The Developer sees the full history — including the Reviewer's feedback — and generates an updated implementation that addresses the issues. Then it goes back to the Reviewer, who checks again. This can repeat until the Reviewer says "APPROVED" or we hit the safety limit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Create the GroupChat
&lt;/h3&gt;

&lt;p&gt;Now we assemble everything into AG2's &lt;code&gt;GroupChat&lt;/code&gt; — the container that holds our agents and conversation rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;group_chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GroupChat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# The list of all agents participating in this conversation.
&lt;/span&gt;    &lt;span class="c1"&gt;# Order doesn't matter here — routing is controlled by select_next_speaker.
&lt;/span&gt;    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;architect&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;developer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="c1"&gt;# The transition graph we defined above.
&lt;/span&gt;    &lt;span class="c1"&gt;# This acts as a safety net: even if our speaker selection function has a bug,
&lt;/span&gt;    &lt;span class="c1"&gt;# AG2 will reject any transition not in this graph.
&lt;/span&gt;    &lt;span class="n"&gt;allowed_or_disallowed_speaker_transitions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;allowed_transitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speaker_transitions_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;allowed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# "allowed" means the dict defines PERMITTED transitions
&lt;/span&gt;
    &lt;span class="c1"&gt;# Start with an empty message history. Messages accumulate as agents speak.
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;

    &lt;span class="c1"&gt;# Safety limit: stop after 15 messages maximum.
&lt;/span&gt;    &lt;span class="c1"&gt;# Without this, a picky Reviewer could send work back to the Developer
&lt;/span&gt;    &lt;span class="c1"&gt;# indefinitely, creating an infinite loop. 15 rounds is enough for
&lt;/span&gt;    &lt;span class="c1"&gt;# the full pipeline + a few revision cycles.
&lt;/span&gt;    &lt;span class="n"&gt;max_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# When True, each agent's description is shared with all others at the start.
&lt;/span&gt;    &lt;span class="c1"&gt;# This gives agents context about who their "teammates" are, leading to
&lt;/span&gt;    &lt;span class="c1"&gt;# better collaboration (e.g., the Architect knows a Reviewer will check its work).
&lt;/span&gt;    &lt;span class="n"&gt;send_introductions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Use our custom function instead of AG2's default LLM-based selection.
&lt;/span&gt;    &lt;span class="n"&gt;speaker_selection_method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;select_next_speaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What is &lt;code&gt;GroupChat&lt;/code&gt; exactly?&lt;/strong&gt; Think of it as a virtual meeting room. It holds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A list of participants (agents)&lt;/li&gt;
&lt;li&gt;The conversation rules (who can speak after whom)&lt;/li&gt;
&lt;li&gt;The shared message history (all agents can read everything)&lt;/li&gt;
&lt;li&gt;Settings like max rounds and speaker selection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;GroupChat&lt;/code&gt; itself doesn't run the conversation — that's the &lt;code&gt;GroupChatManager&lt;/code&gt;'s job (below). The &lt;code&gt;GroupChat&lt;/code&gt; just defines the rules and holds the state.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Chat Manager
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;GroupChatManager&lt;/code&gt; is the "moderator" that actually runs the conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GroupChatManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Link to the GroupChat containing our agents and rules.
&lt;/span&gt;    &lt;span class="n"&gt;groupchat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;group_chat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# The manager itself needs an LLM config. Even though we use custom speaker
&lt;/span&gt;    &lt;span class="c1"&gt;# selection (so it doesn't need the LLM for routing), AG2 requires this.
&lt;/span&gt;    &lt;span class="c1"&gt;# We use reasoning_config since it's the more conservative configuration.
&lt;/span&gt;    &lt;span class="n"&gt;llm_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reasoning_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# This lambda function is called after every message.
&lt;/span&gt;    &lt;span class="c1"&gt;# It checks if the message contains "final sign-off" (case-insensitive).
&lt;/span&gt;    &lt;span class="c1"&gt;# When QA outputs "FINAL SIGN-OFF: Project plan is complete.",
&lt;/span&gt;    &lt;span class="c1"&gt;# this returns True and the conversation stops gracefully.
&lt;/span&gt;    &lt;span class="n"&gt;is_termination_msg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final sign-off&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How does &lt;code&gt;is_termination_msg&lt;/code&gt; work?&lt;/strong&gt; After every single message in the group chat, AG2 calls this function with the message. It's a simple lambda (one-line anonymous function) that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Takes the message content: &lt;code&gt;msg["content"]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Converts to lowercase: &lt;code&gt;.lower()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Checks if "final sign-off" appears anywhere in the text&lt;/li&gt;
&lt;li&gt;Returns &lt;code&gt;True&lt;/code&gt; (stop) or &lt;code&gt;False&lt;/code&gt; (continue)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is why we told the QA agent to end with &lt;code&gt;"FINAL SIGN-OFF: Project plan is complete."&lt;/code&gt; in its system prompt — it's the trigger that tells the manager the session is done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens if QA doesn't say "FINAL SIGN-OFF"?&lt;/strong&gt; The &lt;code&gt;max_round=15&lt;/code&gt; safety limit kicks in. After 15 messages, the conversation stops regardless. This prevents the system from running forever if the LLM doesn't follow instructions perfectly.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 5: Create the Entry Point
&lt;/h2&gt;

&lt;p&gt;Finally, create &lt;code&gt;main.py&lt;/code&gt; — the script that ties everything together and provides the user interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;manager&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Display a simple banner so the user knows what they're running.
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Multi-Agent Software Project Planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Provide a default project idea so users can quickly test the system
&lt;/span&gt;    &lt;span class="c1"&gt;# without having to think of an idea first.
&lt;/span&gt;    &lt;span class="n"&gt;default_idea&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build a REST API for a task management app with user auth, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRUD operations, and real-time notifications&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Prompt the user for their project idea.
&lt;/span&gt;    &lt;span class="c1"&gt;# If they press Enter without typing anything, we use the default.
&lt;/span&gt;    &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Describe your project idea (Enter for default):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;default_idea&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Using default: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;default_idea&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Starting Planning Session...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# This is where the magic happens!
&lt;/span&gt;    &lt;span class="c1"&gt;# pm.initiate_chat() does the following:
&lt;/span&gt;    &lt;span class="c1"&gt;# 1. Sends the user's project idea as the first message
&lt;/span&gt;    &lt;span class="c1"&gt;# 2. The PM agent processes it and generates its response (requirements)
&lt;/span&gt;    &lt;span class="c1"&gt;# 3. The manager takes over, calling select_next_speaker() after each message
&lt;/span&gt;    &lt;span class="c1"&gt;# 4. Agents take turns: PM → Architect → Developer → Reviewer → QA
&lt;/span&gt;    &lt;span class="c1"&gt;# 5. If Reviewer rejects, it loops: Developer → Reviewer → Developer → ...
&lt;/span&gt;    &lt;span class="c1"&gt;# 6. When QA says "FINAL SIGN-OFF", is_termination_msg returns True and it stops
&lt;/span&gt;    &lt;span class="c1"&gt;# 7. The entire conversation history is returned in `result`
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initiate_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# The GroupChatManager that orchestrates the conversation
&lt;/span&gt;        &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# The user's project idea becomes the first message
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Display a summary after the session ends.
&lt;/span&gt;    &lt;span class="c1"&gt;# result.chat_history contains every message from every agent.
&lt;/span&gt;    &lt;span class="c1"&gt;# result.cost tracks token usage / API costs (useful for cloud LLMs).
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Session Complete!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Messages: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Cost: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Standard Python idiom: only run main() when this file is executed directly,
# not when it's imported by another file.
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What does &lt;code&gt;pm.initiate_chat(manager, message=...)&lt;/code&gt; actually do under the hood?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This single line triggers the entire multi-agent pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The PM receives &lt;code&gt;user_input&lt;/code&gt; as a message&lt;/li&gt;
&lt;li&gt;The PM calls its LLM with: system prompt + the user's message&lt;/li&gt;
&lt;li&gt;The PM's response is added to &lt;code&gt;GroupChat.messages&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The manager calls &lt;code&gt;select_next_speaker(pm, groupchat)&lt;/code&gt; → returns &lt;code&gt;architect&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The Architect calls its LLM with: system prompt + entire chat history so far&lt;/li&gt;
&lt;li&gt;Repeat steps 3-5 for each agent in sequence&lt;/li&gt;
&lt;li&gt;Eventually QA speaks, &lt;code&gt;is_termination_msg&lt;/code&gt; returns &lt;code&gt;True&lt;/code&gt;, and the loop ends&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every agent sees the &lt;strong&gt;full conversation history&lt;/strong&gt; when generating its response. This means the Developer can reference both the PM's requirements AND the Architect's design. This shared context is what makes the agents feel like they're truly collaborating.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 6: Run It!
&lt;/h2&gt;

&lt;p&gt;Make sure your LLM provider is running (Ollama or LM Studio), then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;============================================================
  Multi-Agent Software Project Planner
============================================================

Describe your project idea (Enter for default):
&amp;gt; Build a real-time chat application with rooms and file sharing

============================================================
  Starting Planning Session...
============================================================

pm (to manager):
## Requirements
- User registration and authentication
- Real-time messaging with WebSocket support
- Chat rooms (public and private)
- File upload and sharing within rooms
...

architect (to manager):
## Tech Stack
- Backend: Node.js with Express + Socket.io
- Database: PostgreSQL for users/rooms, Redis for pub/sub
- Storage: MinIO for file uploads
...

developer (to manager):
## File Structure
├── src/
│   ├── controllers/
│   ├── models/
│   ├── middleware/
│   ├── services/
│   └── websocket/
...

reviewer (to manager):
## Verdict
APPROVED
...

qa (to manager):
## Test Strategy
...
FINAL SIGN-OFF: Project plan is complete.

============================================================
  Session Complete!
  Messages: 6
============================================================
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What to observe:&lt;/strong&gt; Notice how each agent builds on the previous one's work. The Architect references the PM's requirements. The Developer follows the Architect's tech stack choices. The Reviewer checks consistency between all of them. And QA creates test cases that match the actual implementation plan. This emergent collaboration happens naturally through shared conversation history.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How It All Fits Together
&lt;/h2&gt;

&lt;p&gt;Here's the final project structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;multi-agents/
├── .env                # LLM provider configuration (models, URLs, temperatures)
├── config.py           # Reads .env → creates reasoning_config and code_config
├── agents.py           # Defines 5 agents with specialized system prompts
├── orchestrator.py     # Wires agents into GroupChat with routing + termination
├── main.py             # Entry point — takes user input, starts the session
└── requirements.txt    # Python dependencies (ag2, python-dotenv)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data flow through these files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.env  ──►  config.py  ──►  agents.py  ──►  orchestrator.py  ──►  main.py
(settings)  (LLMConfig)    (5 agents)     (GroupChat +        (user input
                                           Manager +           + run loop)
                                           routing logic)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.env&lt;/code&gt;&lt;/strong&gt; holds all configurable settings (models, temperatures, URLs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;config.py&lt;/code&gt;&lt;/strong&gt; reads &lt;code&gt;.env&lt;/code&gt; and creates two &lt;code&gt;LLMConfig&lt;/code&gt; objects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;agents.py&lt;/code&gt;&lt;/strong&gt; imports configs and creates five specialized &lt;code&gt;ConversableAgent&lt;/code&gt; instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;orchestrator.py&lt;/code&gt;&lt;/strong&gt; imports agents, defines the transition graph and speaker selection, creates &lt;code&gt;GroupChat&lt;/code&gt; + &lt;code&gt;GroupChatManager&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;main.py&lt;/code&gt;&lt;/strong&gt; imports the PM and manager, gets user input, and kicks off the conversation&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Deterministic Routing &amp;gt; LLM-Based Routing
&lt;/h3&gt;

&lt;p&gt;Letting the LLM decide who speaks next sounds flexible, but in practice it leads to unpredictable behavior — agents speaking out of turn, skipping steps, or getting stuck in loops. Our custom &lt;code&gt;select_next_speaker()&lt;/code&gt; function gives us full control over the conversation flow while still allowing dynamic branching (the Reviewer's approve/revise decision).&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Dual-Model Strategy
&lt;/h3&gt;

&lt;p&gt;Not every agent needs the same model. Analytical agents (PM, Architect, QA) benefit from reasoning-focused models with moderate temperature, while implementation agents (Developer, Reviewer) need precision with low temperature. Splitting configurations lets you optimize both quality and cost — use a cheaper model for simple tasks, a better one for complex reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Structured Output Formats
&lt;/h3&gt;

&lt;p&gt;Each agent's system prompt specifies exact output sections (&lt;code&gt;## Requirements&lt;/code&gt;, &lt;code&gt;## Tech Stack&lt;/code&gt;, etc.). This isn't just about readability — it makes outputs &lt;strong&gt;consistent and parseable&lt;/strong&gt;. When the Developer needs to reference the Architect's tech stack, it knows exactly where to look in the conversation. Structured outputs also make it easier to extract and save results programmatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Keyword-Driven Control Flow
&lt;/h3&gt;

&lt;p&gt;The Reviewer's "APPROVED" / "REVISION NEEDED" and QA's "FINAL SIGN-OFF" are more than just text — they're &lt;strong&gt;control signals&lt;/strong&gt; that drive the orchestration logic. This is a simple but powerful pattern: use natural language keywords as routing triggers. The LLM generates them naturally as part of its response, and our code checks for them to make routing decisions. No complex parsing or additional LLM calls needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Safety Mechanisms Matter
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;max_round=15&lt;/code&gt; limit prevents infinite revision loops. Without it, a picky Reviewer could keep sending work back to the Developer forever, burning tokens and time. Always build in safety limits for multi-agent systems. Other safety patterns include timeout limits, cost caps, and fallback behaviors.&lt;/p&gt;




&lt;h2&gt;
  
  
  Source code
&lt;/h2&gt;

&lt;p&gt;The complete source code for this project is available on &lt;a href="https://github.com/duymap/software-team-planner" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Use OpenCode with local LLM, not bad all at</title>
      <dc:creator>Chung Duy</dc:creator>
      <pubDate>Thu, 05 Feb 2026 14:58:38 +0000</pubDate>
      <link>https://dev.to/chung_duy_51a346946b27a3d/use-opencode-with-local-llm-not-bad-all-at-5cdm</link>
      <guid>https://dev.to/chung_duy_51a346946b27a3d/use-opencode-with-local-llm-not-bad-all-at-5cdm</guid>
      <description>&lt;h1&gt;
  
  
  Local LLM Coding Setup: LMStudio + OpenCode
&lt;/h1&gt;

&lt;p&gt;A guide to setting up a local AI coding assistant using LMStudio and OpenCode — a solid alternative to Claude Code when you run out of daily usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Install LMStudio
&lt;/h2&gt;

&lt;p&gt;Download and install from &lt;a href="https://lmstudio.ai/" rel="noopener noreferrer"&gt;https://lmstudio.ai/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Select a Model
&lt;/h2&gt;

&lt;p&gt;Choose an appropriate model depending on your hardware. In this case, I chose &lt;strong&gt;Qwen3-Coder-Next-MLX-6bit&lt;/strong&gt; because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It fits within my available RAM&lt;/li&gt;
&lt;li&gt;It's optimized for macOS with Apple Silicon (M4 chip)&lt;/li&gt;
&lt;li&gt;It can leverage the M4 GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghmbbtklh8tdug29x7gg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghmbbtklh8tdug29x7gg.png" alt=" " width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You may need to wait a bit for the model to fully download.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Load and Configure the Model
&lt;/h2&gt;

&lt;p&gt;Load the model you selected in Step 2 (e.g., &lt;code&gt;Qwen3-Coder-Next-MLX-6bit&lt;/code&gt;) and configure the following:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Temperature&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1.0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Length&lt;/td&gt;
&lt;td&gt;&lt;code&gt;80000&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ky8plvt850zjn0fnyhj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ky8plvt850zjn0fnyhj.png" alt=" " width="408" height="714"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk9wizhi3g1otq5ytx38o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk9wizhi3g1otq5ytx38o.png" alt=" " width="427" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Do &lt;strong&gt;not&lt;/strong&gt; leave the context length at the default &lt;code&gt;16000&lt;/code&gt; — it's too small.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  4. Install OpenCode
&lt;/h2&gt;

&lt;p&gt;Install from &lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;https://opencode.ai/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Configure OpenCode to Use LMStudio
&lt;/h2&gt;

&lt;p&gt;Open the config file at &lt;code&gt;~/.config/opencode/opencode.jsonc&lt;/code&gt; and paste the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://opencode.ai/config.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"theme"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tokyonight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"disabled_providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"localllm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Local LLM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"npm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@ai-sdk/openai-compatible"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"qwen3-coder-next-mlx"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Qwen3-Coder-Next"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://127.0.0.1:1234/v1"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ Make sure the key &lt;code&gt;"qwen3-coder-next-mlx"&lt;/code&gt; matches the model name in LMStudio exactly, otherwise you'll get an error: &lt;em&gt;"can not load model..."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  6. Run OpenCode
&lt;/h2&gt;

&lt;p&gt;Open a new terminal, navigate to your project directory, and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;opencode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;E.g: "help me to understand the code base", while opencode running, you can watch out LMstudio server log to see it really works&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F606vp3651rc6wbv3twud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F606vp3651rc6wbv3twud.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Results
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllucarzhqnnazzoahgby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllucarzhqnnazzoahgby.png" alt=" " width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tested with a demo project and the results are &lt;strong&gt;not bad at all&lt;/strong&gt; compared to Sonnet 4.5. More testing on larger projects is needed, but the output quality makes it a worthwhile alternative:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔄 Use as a fallback when you run out of daily Claude Code usage&lt;/li&gt;
&lt;li&gt;💡 Explore other use cases where a local LLM fits your workflow&lt;/li&gt;
&lt;li&gt;💰 Zero API cost — everything runs locally&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>tooling</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Running Mistral Vibe CLI with Local LLMs: A Complete Guide</title>
      <dc:creator>Chung Duy</dc:creator>
      <pubDate>Sun, 04 Jan 2026 13:32:34 +0000</pubDate>
      <link>https://dev.to/chung_duy_51a346946b27a3d/running-mistral-vibe-with-local-llms-a-complete-guide-1mde</link>
      <guid>https://dev.to/chung_duy_51a346946b27a3d/running-mistral-vibe-with-local-llms-a-complete-guide-1mde</guid>
      <description>&lt;h2&gt;
  
  
  Why run Mistral Vibe CLI with local LLM?
&lt;/h2&gt;

&lt;p&gt;So, Mistral Vibe is this amazing CLI tool that usually talks to Mistral's cloud. But honestly? Running it &lt;strong&gt;locally&lt;/strong&gt; is a total game changer. It feels so much cooler when the "brain" is actually inside your own computer!&lt;/p&gt;

&lt;p&gt;Here's why I'm loving the local setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Privacy Core&lt;/strong&gt;: My code stays right here on my machine. No "sending to the cloud" anxiety! 🔒&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free Forever&lt;/strong&gt;: Zero API bills. My wallet is so happy right now. 💸&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internet? Optional&lt;/strong&gt;: I can literally code in a cabin in the woods (if I had one). 🌲&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Hopping&lt;/strong&gt;: I can try out whatever models I want just by pulling them from Ollama.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm using &lt;strong&gt;Ollama&lt;/strong&gt; with the &lt;strong&gt;devstral-small-2&lt;/strong&gt; model for this guide because it's super snappy for coding.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;MacOS at least 32GB, chip M4 (I never try with lower chip). Disk space should have at least 20GB since devstral-small-2 have size 15GB.&lt;/li&gt;
&lt;li&gt;Git&lt;/li&gt;
&lt;li&gt;Python 3.12 or higher&lt;/li&gt;
&lt;li&gt;pip or pipx&lt;/li&gt;
&lt;li&gt;ollama (&lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;https://ollama.com/&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've not tried yet in Linux/Window but assume that should be same. If you are using Linux/Window ensure you have appropriate GPU hardware good enough to run Ollama devstral-small-2 24B.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installation Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Install Ollama
&lt;/h3&gt;

&lt;p&gt;Ollama is a tool for running large language models locally with ease.&lt;/p&gt;

&lt;h4&gt;
  
  
  On macOS
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download and install from official website&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Or using Homebrew&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Verify Installation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Pull the Model
&lt;/h3&gt;

&lt;p&gt;Download the &lt;code&gt;devstral-small-2&lt;/code&gt; model (or any model you prefer):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull the model (this may take a few minutes depending on your internet speed)&lt;/span&gt;
ollama pull devstral-small-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the model is downloaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Start Ollama Server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the Ollama server&lt;/span&gt;
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Keep this terminal window open. The server needs to run while you use Vibe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Default endpoint:&lt;/strong&gt; &lt;code&gt;http://localhost:11434&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To run Ollama in the background on macOS/Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a background service (optional)&lt;/span&gt;
ollama serve &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Install Mistral Vibe
&lt;/h3&gt;

&lt;p&gt;Check out the repo : &lt;a href="https://github.com/mistralai/mistral-vibe" rel="noopener noreferrer"&gt;https://github.com/mistralai/mistral-vibe&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using pipx&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install pipx if you don't have it&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;pipx  &lt;span class="c"&gt;# macOS&lt;/span&gt;
&lt;span class="c"&gt;# or&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;pipx   &lt;span class="c"&gt;# Linux/Windows&lt;/span&gt;

&lt;span class="c"&gt;# Ensure pipx path is configured&lt;/span&gt;
pipx ensurepath

&lt;span class="c"&gt;# Install Mistral Vibe from source&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/mistral-vibe
pipx &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Open new tab terminal and verify the installation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vibe &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see: &lt;code&gt;vibe 1.3.3&lt;/code&gt; (or your current version)&lt;/p&gt;




&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;Mistral Vibe uses a TOML configuration file located at &lt;code&gt;~/.vibe/config.toml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Locate or Create Config File
&lt;/h3&gt;

&lt;p&gt;When you first run &lt;code&gt;vibe&lt;/code&gt;, it creates a default config file. However, we need to modify it to use Ollama.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Configure for Ollama
&lt;/h3&gt;

&lt;p&gt;Create or edit &lt;code&gt;~/.vibe/config.toml&lt;/code&gt; with the following configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.vibe/config.toml&lt;/span&gt;

&lt;span class="c"&gt;# Set Ollama model as default&lt;/span&gt;
&lt;span class="py"&gt;active_model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama-devstral-small-2"&lt;/span&gt;

&lt;span class="c"&gt;# UI and behavior settings&lt;/span&gt;
&lt;span class="py"&gt;textual_theme&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"terminal"&lt;/span&gt;
&lt;span class="py"&gt;vim_keybindings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;disable_welcome_banner_animation&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;auto_compact_threshold&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200000&lt;/span&gt;
&lt;span class="py"&gt;context_warnings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;system_prompt_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"cli"&lt;/span&gt;
&lt;span class="py"&gt;include_commit_signature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;include_model_info&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;include_project_context&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;enable_update_checks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;api_timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;720.0&lt;/span&gt;

&lt;span class="c"&gt;# Tool configurations (optional - customize as needed)&lt;/span&gt;
&lt;span class="py"&gt;enabled_tools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="py"&gt;disabled_tools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# PROVIDERS CONFIGURATION&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Ollama Provider (Local)&lt;/span&gt;
&lt;span class="nn"&gt;[[providers]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama"&lt;/span&gt;
&lt;span class="py"&gt;api_base&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:11434/v1"&lt;/span&gt;
&lt;span class="py"&gt;api_key_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;  &lt;span class="c"&gt;# No API key needed for local&lt;/span&gt;
&lt;span class="py"&gt;api_style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai"&lt;/span&gt;
&lt;span class="py"&gt;backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"generic"&lt;/span&gt;
&lt;span class="py"&gt;reasoning_field_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"reasoning_content"&lt;/span&gt;

&lt;span class="c"&gt;# Mistral Cloud Provider (Optional - keep for fallback)&lt;/span&gt;
&lt;span class="nn"&gt;[[providers]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mistral"&lt;/span&gt;
&lt;span class="py"&gt;api_base&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.mistral.ai/v1"&lt;/span&gt;
&lt;span class="py"&gt;api_key_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"MISTRAL_API_KEY"&lt;/span&gt;
&lt;span class="py"&gt;api_style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai"&lt;/span&gt;
&lt;span class="py"&gt;backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mistral"&lt;/span&gt;
&lt;span class="py"&gt;reasoning_field_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"reasoning_content"&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# MODELS CONFIGURATION&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="c"&gt;# Ollama Models (Local)&lt;/span&gt;
&lt;span class="nn"&gt;[[models]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"devstral-small-2"&lt;/span&gt;
&lt;span class="py"&gt;provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama"&lt;/span&gt;
&lt;span class="py"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama-devstral-small-2"&lt;/span&gt;
&lt;span class="py"&gt;temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="py"&gt;input_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;  &lt;span class="c"&gt;# Free - local model&lt;/span&gt;
&lt;span class="py"&gt;output_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="nn"&gt;[[models]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mistral"&lt;/span&gt;
&lt;span class="py"&gt;provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama"&lt;/span&gt;
&lt;span class="py"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama-mistral"&lt;/span&gt;
&lt;span class="py"&gt;temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="py"&gt;input_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;span class="py"&gt;output_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="nn"&gt;[[models]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"devstral-2"&lt;/span&gt;
&lt;span class="py"&gt;provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama"&lt;/span&gt;
&lt;span class="py"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama-devstral-2"&lt;/span&gt;
&lt;span class="py"&gt;temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="py"&gt;input_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;span class="py"&gt;output_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="c"&gt;# Cloud Models (Optional - for fallback)&lt;/span&gt;
&lt;span class="nn"&gt;[[models]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mistral-vibe-cli-latest"&lt;/span&gt;
&lt;span class="py"&gt;provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"mistral"&lt;/span&gt;
&lt;span class="py"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"devstral-2-cloud"&lt;/span&gt;
&lt;span class="py"&gt;temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="py"&gt;input_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;
&lt;span class="py"&gt;output_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# PROJECT CONTEXT SETTINGS&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nn"&gt;[project_context]&lt;/span&gt;
&lt;span class="py"&gt;max_chars&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;40000&lt;/span&gt;
&lt;span class="py"&gt;default_commit_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="py"&gt;max_doc_bytes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32768&lt;/span&gt;
&lt;span class="py"&gt;truncation_buffer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="py"&gt;max_depth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;max_files&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="py"&gt;max_dirs_per_level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="py"&gt;timeout_seconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# SESSION LOGGING&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nn"&gt;[session_logging]&lt;/span&gt;
&lt;span class="py"&gt;save_dir&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/.vibe/logs/session"&lt;/span&gt;
&lt;span class="py"&gt;session_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"session"&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# TOOL PERMISSIONS&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;

&lt;span class="nn"&gt;[tools.read_file]&lt;/span&gt;
&lt;span class="py"&gt;permission&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"always"&lt;/span&gt;
&lt;span class="py"&gt;max_read_bytes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64000&lt;/span&gt;

&lt;span class="nn"&gt;[tools.write_file]&lt;/span&gt;
&lt;span class="py"&gt;permission&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ask"&lt;/span&gt;
&lt;span class="py"&gt;max_write_bytes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64000&lt;/span&gt;

&lt;span class="nn"&gt;[tools.search_replace]&lt;/span&gt;
&lt;span class="py"&gt;permission&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ask"&lt;/span&gt;
&lt;span class="py"&gt;max_content_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;

&lt;span class="nn"&gt;[tools.bash]&lt;/span&gt;
&lt;span class="py"&gt;permission&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ask"&lt;/span&gt;
&lt;span class="py"&gt;max_output_bytes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;
&lt;span class="py"&gt;default_timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

&lt;span class="nn"&gt;[tools.grep]&lt;/span&gt;
&lt;span class="py"&gt;permission&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"always"&lt;/span&gt;
&lt;span class="py"&gt;max_output_bytes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64000&lt;/span&gt;
&lt;span class="py"&gt;default_max_matches&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

&lt;span class="nn"&gt;[tools.todo]&lt;/span&gt;
&lt;span class="py"&gt;permission&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"always"&lt;/span&gt;
&lt;span class="py"&gt;max_todos&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Understanding Key Configuration Options
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Provider Configuration
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[[providers]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama"&lt;/span&gt;                              &lt;span class="c"&gt;# Provider identifier&lt;/span&gt;
&lt;span class="py"&gt;api_base&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:11434/v1"&lt;/span&gt;       &lt;span class="c"&gt;# Ollama's OpenAI-compatible endpoint&lt;/span&gt;
&lt;span class="py"&gt;api_key_env_var&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;                         &lt;span class="c"&gt;# Empty = no API key required&lt;/span&gt;
&lt;span class="py"&gt;api_style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai"&lt;/span&gt;                         &lt;span class="c"&gt;# Use OpenAI-compatible API format&lt;/span&gt;
&lt;span class="py"&gt;backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"generic"&lt;/span&gt;                          &lt;span class="c"&gt;# Use generic HTTP backend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Model Configuration
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[[models]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"devstral-small-2"&lt;/span&gt;                    &lt;span class="c"&gt;# Exact model name in Ollama&lt;/span&gt;
&lt;span class="py"&gt;provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama"&lt;/span&gt;                          &lt;span class="c"&gt;# Links to provider above&lt;/span&gt;
&lt;span class="py"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ollama-devstral-small-2"&lt;/span&gt;            &lt;span class="c"&gt;# Friendly name you'll use&lt;/span&gt;
&lt;span class="py"&gt;temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;                            &lt;span class="c"&gt;# Lower = more deterministic&lt;/span&gt;
&lt;span class="py"&gt;input_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;                            &lt;span class="c"&gt;# Free for local&lt;/span&gt;
&lt;span class="py"&gt;output_price&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;                           &lt;span class="c"&gt;# Free for local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Usage Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Usage
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Start Vibe
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Navigate to your project&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/your/project

&lt;span class="c"&gt;# Start Vibe&lt;/span&gt;
vibe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the Vibe interface and use it with local LLM. So any prompts will be processed by local LLM instead of cloud models and does not require any API key. &lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
