- Book: AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
- Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
- My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
- Me: xgabriel.com | GitHub
A friend rebuilt his internal coding agent over a long weekend. He'd shipped the first version with a hardcoded tools=[...] list of nine functions baked into the agent process. By month four he was up to twenty-three tools. Every new tool meant a redeploy of the agent. Every new tool also meant the system prompt grew, the schema list grew, and tool selection got noticeably wobblier. He spent the weekend porting the whole thing onto the Model Context Protocol and split the tools into four servers owned by four teams. Selection got crisper. Redeploys stopped blocking other teams.
That story is one half of the answer. The other half is the team next to him that wrote three tools, ships them in process, and absolutely should not adopt MCP. Both are right. The decision comes down to how your tool catalog grows, who owns it, and where you want the cost to live.
This post puts the same task through both shapes: a list_files tool that returns the contents of a directory. Plain tool calling first. Then the MCP version. Then the table that tells you which one your situation wants.
The plain version: tools live next to the agent
Plain tool calling is the shape most agent demos use. You declare a JSON schema, send it on every messages.create call, and dispatch in process when the model returns a tool_use block. The runtime cost is one Python function call per tool turn.
import json
import os
from pathlib import Path
import anthropic
client = anthropic.Anthropic()
LIST_FILES_TOOL = {
"name": "list_files",
"description": (
"List files in a directory on the local "
"filesystem. Returns names and sizes."
),
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
},
"required": ["path"],
},
}
The handler is a normal Python function. No protocol, no daemon, no transport.
def list_files(path: str) -> dict:
p = Path(path).expanduser().resolve()
if not p.is_dir():
return {"error": f"not a directory: {p}"}
items = []
for entry in sorted(p.iterdir()):
items.append({
"name": entry.name,
"size": (
entry.stat().st_size
if entry.is_file() else None
),
"is_dir": entry.is_dir(),
})
return {"path": str(p), "items": items}
The agent loop wires the schema and the handler together.
def run_turn(user_msg: str) -> str:
messages = [{"role": "user", "content": user_msg}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=[LIST_FILES_TOOL],
messages=messages,
)
if resp.stop_reason != "tool_use":
return "".join(
b.text for b in resp.content
if b.type == "text"
)
messages.append(
{"role": "assistant", "content": resp.content}
)
results = []
for block in resp.content:
if block.type != "tool_use":
continue
out = list_files(**block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(out),
})
messages.append(
{"role": "user", "content": results}
)
That is the whole thing. The agent loop is about thirty lines. The schema and the implementation live in the same file. Adding a second tool is one more dict and one more function. Shipping is whatever your agent's deploy story already is.
The MCP version: tool host and agent client
The MCP version splits the world in two. The tool runs in a server process. The agent runs as a client that connects to one or more servers, lists their tools, and forwards calls. The agent never imports list_files. It only knows how to speak the protocol.
The server uses FastMCP (originally jlowin/fastmcp, now bundled in the official Python SDK as mcp.server.fastmcp), with the @mcp.tool() decorator generating the JSON schema from your type hints.
from pathlib import Path
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("filesystem")
@mcp.tool()
def list_files(path: str) -> dict:
"""List files in a directory. Returns names and sizes."""
p = Path(path).expanduser().resolve()
if not p.is_dir():
return {"error": f"not a directory: {p}"}
items = []
for entry in sorted(p.iterdir()):
items.append({
"name": entry.name,
"size": (
entry.stat().st_size
if entry.is_file() else None
),
"is_dir": entry.is_dir(),
})
return {"path": str(p), "items": items}
if __name__ == "__main__":
mcp.run(transport="stdio")
That file is the entire server. Save it as fs_server.py. The stdio transport means the agent will spawn this script as a child process and talk to it over its stdin and stdout. No port. No daemon. No auth story. The MCP spec also defines a Streamable HTTP transport for remote servers, but stdio is the right default for local development and for tools that ship alongside the agent.
The client side is where the cost shows up. The agent has to launch the server, list its tools, translate them into Anthropic's tool schema shape, and forward tool_use calls back through the MCP session.
import json
import asyncio
from contextlib import AsyncExitStack
import anthropic
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
client = anthropic.Anthropic()
async def list_remote_tools(session):
listed = await session.list_tools()
tools = []
for t in listed.tools:
tools.append({
"name": t.name,
"description": t.description or "",
"input_schema": t.inputSchema,
})
return tools
list_tools is the discovery call. Whatever the server registers shows up here at runtime. New tool on the server means new tool in the agent without redeploying the agent.
async def run_turn(user_msg: str) -> str:
params = StdioServerParameters(
command="python", args=["fs_server.py"]
)
async with AsyncExitStack() as stack:
read, write = await stack.enter_async_context(
stdio_client(params)
)
session = await stack.enter_async_context(
ClientSession(read, write)
)
await session.initialize()
tools = await list_remote_tools(session)
messages = [{"role": "user", "content": user_msg}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=messages,
)
if resp.stop_reason != "tool_use":
return "".join(
b.text for b in resp.content
if b.type == "text"
)
messages.append(
{"role": "assistant",
"content": resp.content}
)
results = []
for block in resp.content:
if block.type != "tool_use":
continue
out = await session.call_tool(
block.name, block.input
)
payload = [
c.text for c in out.content
if c.type == "text"
]
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(payload),
})
messages.append(
{"role": "user", "content": results}
)
Roughly fifty lines, plus the server, plus the protocol library. The shape is the same agent loop you saw in the plain version with a different dispatch line. session.call_tool(name, args) replaces the in-process function call. Everything else is unchanged: the model, the loop, the message format.
One thing worth flagging in the dispatch block: out.content is the protocol's list of typed content blocks (text, image, resource), not a plain dict. The code extracts just the text chunks and re-encodes them as a JSON array so the model sees a flat string. If your server returns non-text blocks, you'll want to handle them explicitly here instead of dropping them.
You can connect to multiple servers from the same client. Spin up the filesystem server, a GitHub server, a Postgres server, merge their tool lists, and the model picks across all of them. That is the moment MCP starts paying for itself.
When MCP earns its keep
Four signals. If two or more land for your project, the daemon is worth it.
The tool catalog is dynamic across sessions. New tools get added without shipping a new agent build. Plugin marketplaces, internal tool registries that other teams own, anything where the agent process and the tool process have different release cadences.
Multiple agents share the same tools. A coding agent and a support agent both want search_jira. Hosting that tool in one MCP server and connecting both agents to it beats duplicating the implementation twice and then watching them drift.
You want third-party tool integration. The official MCP servers cover filesystem, GitHub, Slack, Postgres, browser automation, and more. Adding "the agent can read files and open issues" is now a config change instead of an engineering project.
You want a clean separation of concerns. The team that owns the database knows how search_orders should rank. The team that owns the agent knows how to prompt. MCP draws a line between them. Each side owns one process and one contract.
When plain tool calling is the right boring answer
The mirror of the above. The catalog is small and known at build time. One agent, one team, three or four tools that change at the same cadence as the agent itself. You do not want to ship a daemon. You do not want to learn a transport. You want one process, one deploy, one log stream.
In-process performance matters. MCP's stdio round trip is cheap, but it is not free. Each call pays for JSON encode, pipe write, decode on the other side, and the same on the way back. Inside a tight inner loop with many tool calls per turn the round trips add up. Plain tool calling stays in the same process and pays nothing.
The tool needs the agent's runtime context. Shared state, an open database connection, a request-scoped trace ID. You can pass these into an MCP server through env vars or initialization arguments, but the in-process version is one variable away.
You are running an evaluation harness or a batch job. No human is waiting on a UI, no plugin ecosystem, no other team. The simplest thing that works is the right thing.
How to choose without overthinking it
Default to plain tool calling on day one. The agent loop is small and the deploy story is whatever you already have. You will know within a sprint or two whether the catalog is going to stay small or start growing across teams. Once it starts growing across owners, port the heaviest tools to MCP servers, leave the lightweight ones in process, and let the agent client hold both at once. Anthropic's Python SDK does not care which side a tool came from once tools=[...] is built.
The protocol you used is not the win condition. What matters is whether tool selection stays accurate, latency stays acceptable, and the team that owns the data also owns the tool description. MCP helps with the third when the catalog gets big. Plain tool calling is the right tool when it doesn't.
If this was useful
The AI Agents Pocket Guide covers the full split between agent host and tool host: where to draw the line, how to keep tool selection accurate as the catalog grows, and how to instrument both sides so a slow tool does not look like a slow agent. It pairs the MCP picture with the plain tools=[...] patterns and shows where each one breaks under production load.

Top comments (2)
This is a fire article—you nailed the 'redeploy tax' for growing tool catalogs. I hit this exact wall while building my own agentic radio suite (TaterWave AI).
The real shift for me wasn't just moving tools to MCP, but moving state. I actually just shipped a local memory backend called Mnemosyne that uses Hebbian decay to manage long-term associative recall via MCP.
It effectively solves the 'system prompt bloat' you mentioned: instead of loading every relevant fact into the context, the agent uses the MCP server to dynamically pull in only the strongly-activated memories needed for the current turn. It makes the agent feel more like it has a brain and less like it's reading a filing cabinet. Definitely agree that once you pass that ~15 tool threshold, MCP is the only way to scale.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.