An AI model without tools is a brain without hands. It can analyze, summarize, and reason about anything you put in front of it -- but it cannot reach into the world and change anything.
Before 2025, every AI platform had its own proprietary way of defining and calling tools. A tool you built for Claude could not work with GPT-4 without rewriting the adapter layer. A tool server you built for one application could not be shared with another. The ecosystem was fragmented in the same way USB replaced the tangle of serial ports, parallel ports, and proprietary connectors that preceded it.
MCP is the USB port for AI agents.
Anthropic created the Model Context Protocol and donated it to the Linux Foundation in December 2025. The Linux Foundation's involvement matters: it means no single company controls the standard. Claude, GPT-5.4, Gemini 3.1, Cursor, Windsurf, and dozens of other tools all speak MCP natively as of 2026.
How Tool Calling Works (The Foundation)
Before getting into MCP specifically, it helps to understand the underlying mechanism.
When an agent decides to use a tool, the exchange follows a four-step cycle: schema, invocation, execution, and reasoning.
The schema is a description of the tool included in the model's context. It tells the model what the tool does, what parameters it accepts, and what those parameters mean. The model never sees the tool's source code -- only this description. Your schema is your interface, and a bad schema produces bad tool calls.
Invocation happens when the model generates a structured tool call (JSON) rather than plain text. The runtime intercepts this, validates the arguments, and executes the tool.
The result gets appended to the conversation as a tool result message. The model can now see what happened.
Reasoning is what the model does with that result: decides whether the task is done, whether to call more tools, or whether to take a different approach.
Here is the basic pattern in Python:
import anthropic
client = anthropic.Anthropic()
web_search_tool = {
"name": "web_search",
"description": "Search the web for current information. Use when you need facts, news, or data you don't already have. Returns relevant snippets and URLs.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query. Use keywords, not natural language questions."
}
},
"required": ["query"]
}
}
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-opus-4.6",
max_tokens=4096,
tools=[web_search_tool],
messages=messages
)
if response.stop_reason == "end_turn":
return response.content[0].text
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
The model is never "running" the tool itself. It generates structured output that says "I want to call this tool with these arguments." Your runtime does the actual execution. This distinction matters for security.
What MCP Actually Is
MCP defines a protocol for how AI agents communicate with tool servers. The architecture has two sides:
An MCP server exposes tools, resources, and prompts. It is a process (local or remote) that the agent can connect to.
An MCP client (your agent runtime) connects to one or more servers and makes their capabilities available to the model.
This separation is what makes MCP composable. You can connect your agent to a database MCP server, a GitHub MCP server, and a custom internal API MCP server simultaneously -- and the model sees all their tools as a single flat list.
The protocol defines three primitives:
Tools are callable functions with a defined schema. The agent can invoke a tool to do something. This is the core primitive.
Resources are URI-addressable data the agent can read: files, database records, API responses. Resources are not callable -- they are readable context. A GitHub MCP server might expose resources at URIs like github://repo/myorg/repo/file/src/main.py. When the agent reads a resource, it is more like reading context than performing an action, which keeps the tool-call trace clean.
Prompts are reusable prompt templates stored on the server. Less commonly used, but valuable for standardizing how agents approach specific tasks across an organization. Instead of copy-pasting prompt instructions across five agent configurations, you put them in an MCP server and every agent fetches them at runtime.
MCP supports three transport mechanisms:
- stdio: client and server communicate over standard input/output. Simplest option, works for local processes.
- HTTP/SSE: server runs remotely and pushes events back to the client.
- Streamable HTTP: bidirectional streaming over a single HTTP connection. Preferred for production deployments.
Building a Custom MCP Server
Here is a minimal but complete MCP server in TypeScript. It exposes a single tool that queries a PostgreSQL database and returns results in a format the agent can reason about:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { Pool } from "pg";
const db = new Pool({ connectionString: process.env.DATABASE_URL });
const server = new McpServer({
name: "database-tools",
version: "1.0.0",
});
server.tool(
"query_database",
"Run a read-only SQL query against the application database. Only SELECT statements are allowed. Use this to look up user data, analytics, or any structured information.",
{
sql: z.string().describe("The SQL SELECT query. Must be a SELECT statement."),
limit: z.number().optional().default(100).describe("Max rows to return. Defaults to 100."),
},
async ({ sql, limit }) => {
// Security: reject anything that isn't a SELECT
const normalized = sql.trim().toUpperCase();
if (!normalized.startsWith("SELECT")) {
return {
content: [{ type: "text", text: "Error: Only SELECT queries are allowed." }],
isError: true,
};
}
try {
const safeSql = normalized.includes("LIMIT") ? sql : `${sql} LIMIT ${limit}`;
const result = await db.query(safeSql);
return {
content: [{
type: "text",
text: JSON.stringify({ rows: result.rows, count: result.rowCount }, null, 2)
}]
};
} catch (error) {
// Return structured errors the agent can reason about
return {
content: [{ type: "text", text: `Query failed: ${(error as Error).message}` }],
isError: true,
};
}
}
);
const transport = new StdioServerTransport();
await server.connect(transport);
Three things to notice in this implementation:
First, input validation happens at the tool layer, not in the agent prompt. The SQL check is a hard guard -- no matter what the agent generates, it will never execute a non-SELECT statement through this tool.
Second, errors are returned as structured content rather than thrown exceptions. The agent can read "Query failed: column does not exist" and adjust its next query. An unhandled exception gives the agent nothing to reason about.
Third, the LIMIT clause is injected automatically, preventing the agent from accidentally pulling an entire large table into context.
Tool Design Patterns That Actually Matter
How you design your tools shapes how well your agent performs. Models interact with tools through schemas, so tool design is really prompt engineering for your interface layer.
Description quality is everything. The description field on each parameter is not documentation for human developers -- it is the signal the model uses to decide when and how to call the tool. Write descriptions that answer three questions: what does this tool do, when should I use it instead of other tools, and what should the input look like?
A tool named get_data with no description will be called incorrectly. A tool named get_customer_orders with a description like "Retrieves all orders for a specific customer ID. Use this when you need purchase history, outstanding orders, or spending patterns for a known customer. Do not use this for searching by email or name -- use search_customers instead" will be called correctly.
Design for idempotency. Agents retry. Networks fail. Tools should be safe to call multiple times with the same inputs. A tool that sends an email every time it is called is dangerous in a retry scenario. A tool that checks whether the email was already sent before sending is safe.
Return structured errors the agent can reason about. Compare:
# Bad: agent gets nothing to work with
raise Exception("Failed")
# Good: agent can diagnose and adapt
return {
"error": "rate_limit_exceeded",
"message": "GitHub API rate limit reached",
"retry_after_seconds": 60,
"requests_remaining": 0
}
The structured response lets the agent decide: wait and retry, try a different approach, or escalate to the user.
Keep tool counts manageable. Models perform best with 5 to 10 active tools. Above 15, performance degrades because the model spends reasoning capacity on tool selection instead of the actual task. Use dynamic tool loading to give the agent only the tools relevant to the current phase of its work.
Security at the Tool Layer
Tools are where agents touch the real world. A poorly secured tool is not a software bug -- it is a security incident.
Blast radius thinking. For every tool, ask: what is the worst that can happen if this tool is called incorrectly? A read-only search tool has a small blast radius. A tool that can delete database records has a large one. Tools with large blast radii need more protection: schema validation, server-side permission checks, human approval flows.
Sandbox code execution. Tools that run arbitrary code need process isolation -- containers or VMs with no access to the host filesystem, no unrestricted network access, and resource limits. Without sandboxing, a prompt injection attack that causes your agent to generate malicious code could execute it with full host privileges.
Permissions in code, not prompts. A system prompt instruction that says "do not delete files" can be overridden by a sufficiently clever user input. A runtime check that says "this agent token has no write access to the filesystem" cannot.
Human-in-the-loop for irreversible actions. Sending emails to external users, spending money, deleting data -- these should require explicit human confirmation regardless of what the agent decides:
async def send_email_tool(to: str, subject: str, body: str) -> dict:
pending_id = await create_pending_action({
"type": "send_email", "to": to, "subject": subject, "body": body
})
approval = await request_human_approval(pending_id, timeout_seconds=300)
if not approval.approved:
return {"status": "rejected", "reason": approval.reason}
result = await email_service.send(to=to, subject=subject, body=body)
return {"status": "sent", "message_id": result.id}
The agent calls this tool freely. The human confirmation is enforced at the implementation level.
Connecting MCP Servers: The Practical Part
In Claude Code or any MCP-compatible runtime, you declare server connections in a JSON configuration:
{
"mcpServers": {
"database": {
"command": "node",
"args": ["./mcp-servers/database/index.js"],
"env": { "DATABASE_URL": "${DATABASE_URL}" }
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" }
}
}
}
When the agent starts, the runtime launches each server as a subprocess, establishes the connection, and queries each server for its tool list. From the model's perspective, all tools from all connected servers appear as a single flat list.
For remote servers using HTTP transport:
{
"mcpServers": {
"company-api": {
"url": "https://mcp.internal.company.com/api",
"headers": { "Authorization": "Bearer ${API_TOKEN}" }
}
}
}
Here is where MCP's ecosystem value becomes concrete. There are already hundreds of publicly available MCP servers for common services: GitHub, Slack, Linear, Notion, Stripe, PostgreSQL, Google Drive, and many more. For most external services your agent needs to interact with, you do not need to build a server from scratch -- install an existing one and configure credentials. Your development effort goes into the custom tools that do not yet have a server.
Using MCP with the Vercel AI SDK
If you are building with Next.js or the Vercel AI SDK, MCP integration is straightforward:
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { getMcpTools } from "@ai-sdk/mcp";
// Connect to MCP servers and get all available tools
const allTools = await getMcpTools([
{ url: "stdio://./mcp-servers/database", name: "database" },
{ url: "stdio://./mcp-servers/github", name: "github" },
]);
// Filter to task-appropriate tools before invoking the model
const taskPhase = "research";
const activeTools = Object.fromEntries(
Object.entries(allTools).filter(([name]) =>
taskPhase === "research"
? name.startsWith("database.query") || name.startsWith("github.read")
: name.startsWith("database.write")
)
);
const result = await generateText({
model: anthropic("claude-opus-4.6"),
tools: activeTools,
prompt: "Research the bug report and summarize findings",
});
This keeps the active tool count manageable and reduces the chance of the model using the wrong tool at the wrong time.
Why This Matters More Than It Seems
The real value of MCP is not technical -- it is economic. Before standardization, every tool integration was custom work that had to be rebuilt for each agent and each framework. After standardization, a tool built once works everywhere.
The database MCP server you build today for your customer support agent can be reused for your analytics agent and your admin dashboard without modification. The GitHub MCP server someone else built and open-sourced works in your agent without any adapter code.
This is how ecosystems work. The standardization of HTTP enabled the web. The standardization of USB enabled modern peripherals. MCP is doing the same thing for agent tooling -- enabling a composable ecosystem where tools are built once and shared broadly.
Tool use is not just a feature of agents. It is the defining capability. The quality of your tools determines the quality of your agent. Design tools for the model, validate inputs aggressively at the implementation level, return structured errors the agent can reason about, and treat the security of your tool layer with the same seriousness you would treat any external API endpoint.
Because that is effectively what it is.
This post is adapted from Production AI Agents: Build, Deploy, and Monetize Autonomous Systems, available on Amazon Kindle. The book goes deeper with 12 chapters of real code, battle-tested patterns, and a complete hands-on tutorial.
I build production AI systems. More at astraedus.dev.
Top comments (0)