DEV Community

Cover image for I Built a Framework for Multi-Agent MCP Servers in Python — Here's How
Jarrad Bermingham
Jarrad Bermingham

Posted on

I Built a Framework for Multi-Agent MCP Servers in Python — Here's How

Most MCP servers do one thing: wrap a single API call as a tool. But what if your tool needs multiple AI agents collaborating — analyzing, scoring, and reporting — before returning a result?
That's the problem I solved with agent-mcp-framework, an open-source Python library for building multi-agent MCP servers. Define agents, compose them into pipelines, and expose the whole thing as MCP tools that Claude, VSCode, or any MCP client can call.
Here's how it works and why I built it this way.

The Problem
I was building an internal tool that analyzes codebases — think automated code review with multiple specialized agents: one for quality issues, one for security vulnerabilities, one for architecture patterns, and one that combines everything into a scored report.
The MCP SDK gives you FastMCP for exposing tools, but there's no built-in way to:

Define reusable agent abstractions with lifecycle hooks
Compose agents into sequential, parallel, or conditional workflows
Handle errors gracefully across a multi-step pipeline
Format results consistently

I needed infrastructure. So I built it.

The Architecture
Agent (unit of work)
→ Pipeline (composition pattern)
→ AgentMCPServer (MCP exposure layer)
Three layers, each doing one thing.

  1. Agents — The Building Blocks Every agent subclasses Agent and implements run(): pythonfrom agent_mcp_framework import Agent, AgentContext, AgentResult

class SecurityScanner(Agent):
async def run(self, context: AgentContext) -> AgentResult:
code = context.get("code", "")
findings = []

    dangerous_patterns = ["eval(", "exec(", "os.system("]
    for pattern in dangerous_patterns:
        if pattern in code:
            findings.append(f"Found {pattern} — potential injection risk")

    context.set("security_findings", findings)
    return AgentResult(
        success=True,
        output={"findings": findings, "count": len(findings)},
    )
Enter fullscreen mode Exit fullscreen mode

The AgentContext is a shared data store that agents read from and write to. Each agent is self-contained — it pulls what it needs, does its work, and pushes results back.
There are three agent types:

Agent — subclass and implement run()
LLMAgent — built-in Anthropic client for Claude-powered agents
FunctionAgent — wrap any async function without subclassing

All agents get lifecycle hooks (before_run, after_run, on_error), automatic timing, and error handling for free.

  1. Pipelines — The Composition Layer This is where it gets interesting. Four patterns: Sequential — agents run one after another, each seeing the updated context: pythonfrom agent_mcp_framework import SequentialPipeline

pipeline = SequentialPipeline("review", agents=[
QualityAnalyzer("quality"),
SecurityScanner("security"),
ReportGenerator("reporter"),
])
Parallel — agents run concurrently with isolated context copies (no race conditions), merged back after completion:
pythonfrom agent_mcp_framework import ParallelPipeline

All three analyze simultaneously, results merge

analysis = ParallelPipeline("analysis", agents=[
QualityAnalyzer("quality"),
SecurityScanner("security"),
ArchitectureReviewer("architecture"),
], max_concurrency=3)
Conditional — route to different agents based on context:
pythonfrom agent_mcp_framework import ConditionalPipeline

def router(ctx):
if ctx.get("language") == "python":
return "python-analyzer"
return "generic-analyzer"

pipeline = ConditionalPipeline("route", agents=[
PythonAnalyzer("python-analyzer"),
GenericAnalyzer("generic-analyzer"),
], router=router)
MapReduce — split work across agents, then reduce:
pythonfrom agent_mcp_framework import MapReducePipeline, AgentContext

pipeline = MapReducePipeline("batch",
agents=[FileAnalyzer(f"worker-{i}") for i in range(4)],
splitter=lambda ctx: [
AgentContext(data={"file": f}) for f in ctx.get("files")
],
reducer=lambda results, ctx: ctx.set(
"all_results", [r.output for r in results]
),
)

  1. MCP Server — The Exposure Layer One line to turn any pipeline into an MCP tool: pythonfrom agent_mcp_framework import AgentMCPServer

server = AgentMCPServer("code-review", description="Multi-agent code review")
server.add_pipeline_tool(
pipeline,
name="review_code",
description="Analyze code for quality, security, and architecture issues.",
)

server.run() # Starts MCP server on stdio
Now any MCP client can call review_code and get a multi-agent analysis back.

Design Decision: Context Isolation in Parallel Pipelines
The trickiest part was parallel execution. When multiple agents run concurrently on the same context, you get race conditions — two agents writing to the same key, lost updates, stale reads.
My solution: each parallel agent gets a deep copy of the context. After all agents complete, their contexts merge back into the original. This means:

No locks, no mutexes, no shared mutable state
Each agent writes freely without stepping on others
The merge is deterministic (last-write-wins per key)

python# Inside ParallelPipeline.execute():
snapshots = [ctx.model_copy(deep=True) for _ in self.agents]

results = await asyncio.gather(
*[a.execute(s) for a, s in zip(self.agents, snapshots)]
)

Merge back

for snap in snapshots:
ctx.data.update(snap.data)
Simple, correct, no surprises.

Real-World Use Case: Code Review Server
The repo includes a complete code review server example with four agents:

QualityAnalyzer — checks line length, wildcard imports, missing docstrings
SecurityScanner — detects eval(), exec(), os.system(), pickle.loads()
ArchitectureReviewer — flags too many classes, global state, deep nesting
ReportGenerator — combines findings into a scored report (A through F)

The analysis agents run in parallel (they're independent), then the report generator runs sequentially (it needs all findings).
Here's what a scan of insecure code produces:
json{
"score": 52,
"grade": "C",
"quality": {"count": 3, "issues": ["..."]},
"security": {"count": 1, "findings": ["eval() — potential code injection"]},
"architecture": {"count": 1, "notes": ["Global state detected"]}
}
I've used this same pattern to build internal tools that analyze entire repositories — scanning tech stacks, detecting anti-patterns, and producing readiness assessments. The framework handles the orchestration; the domain logic lives in the agents.

What I'd Build Next
The framework is intentionally minimal right now — agents, pipelines, MCP server. Things I'm considering:

Agent-to-agent messaging — let agents communicate mid-pipeline
Retry policies — configurable retry with backoff for flaky LLM calls
Streaming results — progressive output as agents complete
Pipeline visualization — render the DAG of agent dependencies

Try It
bashpip install agent-mcp-framework
pythonfrom agent_mcp_framework import Agent, AgentContext, AgentResult, SequentialPipeline

class MyAgent(Agent):
async def run(self, context: AgentContext) -> AgentResult:
data = context.get("input", "")
return AgentResult(success=True, output=f"Processed: {data}")

pipeline = SequentialPipeline("demo", agents=[MyAgent("worker")])
80 tests. Zero lint errors. Typed with py.typed marker. MIT licensed.

GitHub: github.com/Jbermingham1/agent-mcp-framework
PyPI: pypi.org/project/agent-mcp-framework

If you're building multi-agent systems with MCP, I'd love to hear how you're approaching composition and orchestration. Drop a comment or open an issue on the repo.

Top comments (0)