DEV Community

Sumesh Ramasamy
Sumesh Ramasamy

Posted on

Building a Multi-Agent PR Reviewer in Microsoft Agent Framework 1.10

Microsoft Agent Framework (MAF) reached 1.0 GA on April 3, 2026. I wanted to try it on something small, useful, and honest to my own workflow rather than a hello-world demo, so I built pr-triage-agent: a CLI that takes a GitHub PR URL and runs three specialist agents concurrently (Security, Performance, Style), then hands their findings to a Consolidator agent that produces a structured markdown report.

Repo: https://github.com/sumesh-ramasamy/pr-triage-agent

This post walks through what it does, why MAF was a natural fit, the architecture, and the four places the installed 1.10 API diverges from the docs (which cost me an hour before I figured out what was going on).

What it does

pr-triage-agent runs like this:

python triage.py https://github.com/pallets/flask/pull/6013
Enter fullscreen mode Exit fullscreen mode

It fetches the PR diff, fans it out to three specialist agents concurrently via MAF's ConcurrentBuilder, then consolidates their JSON findings into a single markdown report printed with rich. Full run in ~10 seconds, ~$0.004 on gpt-4o-mini.

Example output:

Example Run

Why Microsoft Agent Framework

I ship small PRs constantly, both to my own repos and to codebases at work. Triaging them for the obvious stuff, hardcoded secrets, N+1 queries, functions that should have been split, is exactly the kind of pattern-matching work I've been curious to see LLMs handle in a structured, multi-agent way.

Two things drew me to MAF specifically. First, the concurrent workflow primitive is exactly the shape of the PR triage problem: fan out to N specialists, collect their outputs, hand them to a consolidator. In frameworks without this primitive, you're gluing together async patterns yourself. Second, MAF is Microsoft's first-party framework for agents, and I wanted to build fluency in it while it's still fresh, before the ecosystem crowds up.

I also just wanted to build something in it rather than only read the docs. Reading and building are very different levels of understanding.

Architecture

GitHub PR URL
     │
     ▼
[github_utils.fetch_pr_diff]  ← filters binaries, caps diff at ~20K tokens
     │
     ▼
[MAF ConcurrentBuilder]
     ├──► SecurityAgent      (hardcoded secrets, injection, auth)
     ├──► PerformanceAgent   (N+1, unbounded loops, sync I/O)
     └──► StyleAgent         (naming, docstrings, magic numbers)
     │
     ▼
[Aggregator collects specialist JSON]
     │
     ▼
[ConsolidatorAgent]           ← merges into structured markdown
     │
     ▼
[rich renderer to stdout]
Enter fullscreen mode Exit fullscreen mode

Each specialist returns a JSON block with severity, findings, and a summary. The Consolidator merges the three JSONs into a single markdown report. The whole pipeline coordinator is under 100 lines because ConcurrentBuilder does the heavy lifting.

What surprised me about MAF 1.10

This is the section I want to spend the most time on, because I lost about an hour to it and the fix wasn't obvious from the docs.

I started by following the official Microsoft Learn quickstart. Import statements from the docs failed. Class names in the docs didn't exist in the installed package. When I introspected agent-framework==1.10.0 directly, I found the actual API surface, and it turned out to be cleaner than the docs suggest, but different enough that copying and pasting sample code doesn't work.

Four specific divergences I hit, and the fixes:

1. Agent class name. Docs show ChatAgent. The installed 1.10.0 package exposes Agent:

# Docs
from agent_framework import ChatAgent
agent = client.create_agent(...)

# Installed 1.10.0
from agent_framework import Agent
agent = Agent(client=..., instructions=..., name=...)
Enter fullscreen mode Exit fullscreen mode

2. OpenAI chat client model kwarg. Docs say model_id. Installed package raises TypeError on that. Actual kwarg is model:

# Docs
OpenAIChatClient(api_key=..., model_id="gpt-4o")

# Installed 1.10.0
OpenAIChatClient(api_key=..., model="gpt-4o")
Enter fullscreen mode Exit fullscreen mode

3. ConcurrentBuilder location. Not at the package root. Lives in agent_framework.orchestrations:

from agent_framework.orchestrations import ConcurrentBuilder
Enter fullscreen mode Exit fullscreen mode

4. Aggregator response shape. Docs show r.agent_run_response.messages[-1].text. Installed version delivers list[AgentExecutorResponse] where each has .agent_response.text.

The way I found these was by introspecting the installed package with dir() and reading a few open GitHub issues on the microsoft/agent-framework repo. Once I stopped trusting the docs and started trusting the installed package, the friction dropped to zero.

This is what working with a 1.0 framework three months out from GA looks like: the shape is right, the API is stable enough to build on, but the surface is still moving faster than the documentation site. That's a normal tradeoff of being early, not a criticism. I'd rather build fluency now than wait for every corner of the docs to catch up.

I documented all four in the repo's MAF 1.10.0 API notes section, in case anyone else hits the same friction.

What I'd do differently

Two things I'd change with more time.

Structured output. MAF supports response_format on the run call for native Pydantic-typed structured outputs, and I'd prefer to use it over asking the model to "return only JSON" in the instructions. I skipped it because of open issue #3325, where response.value returns None in versions after 1.0.0b260107. My workaround was a small tolerant JSON parser that handles both markdown-fenced JSON and JSON-with-preamble. It works reliably, but it's a workaround. Once the upstream fix lands, I'll swap to native structured outputs and write about the migration.

Specialist prompts. The three specialist prompts (Security, Performance, Style) are strong opinion but not tuned. On a proper next iteration, I'd build a small evaluation harness with a set of golden PRs where I already know what the specialists should flag, then adjust the prompts against that. Right now the prompts are shaped by intuition, not measurement. That's fine for a first version. It won't scale.

I also cut scope hard. This was a one-evening build, not a production tool. GitHub Action integration, webhook mode, and multi-repo support are all on the roadmap but not in v0.1.

What's next

Public roadmap on the repo:

  • GitHub Models support (one PAT for both PR fetching and LLM inference)
  • Azure OpenAI + Foundry variant
  • GitHub Action mode (post triage as a PR comment automatically)
  • Native structured output via response_format (pending upstream fix for issue #3325)
  • Custom specialist agents via plugin config
  • Evaluation harness with golden PRs

Each of these is a small commit and a small follow-up post.

Try it yourself

Repo: github.com/sumesh-ramasamy/pr-triage-agent

Quickstart:

git clone https://github.com/sumesh-ramasamy/pr-triage-agent
cd pr-triage-agent
python3.13 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env  # add OPENAI_API_KEY and GITHUB_TOKEN
python triage.py https://github.com/pallets/flask/pull/6013
Enter fullscreen mode Exit fullscreen mode

Preview cost without making any LLM calls:

python triage.py <URL> --dry-run
Enter fullscreen mode Exit fullscreen mode

If you're building on MAF and hitting similar friction between docs and the installed API, I'd like to hear about it. Issues and PRs welcome.

Top comments (0)