Alex Cloudstar

Posted on May 22 • Originally published at alexcloudstar.com

How to Build Your First MCP Server in 2026: A Practical Developer Guide

#ai #agents #mcp #devtools

The first MCP server I wrote did one thing. It read my Postgres database and returned the schema as structured JSON. That was it. No fancy joins, no query builder, just a list of tables, columns, and types.

It took me an afternoon. Two weeks later it had saved me hours. Every time I asked Claude Code to add a new feature that touched the database, it pulled the schema through the MCP server instead of hallucinating column names. The bug rate on AI-generated migrations dropped to roughly zero. The pattern was so obviously useful that I now write a small MCP server for almost every project I work on.

If you have read my MCP developer guide, you know what the protocol is. This post is the part I did not cover there: actually building one. Not theoretical, not waving at the spec. The exact steps I take when I sit down and decide a project needs its own MCP server, and the mistakes I keep watching other developers make on the way.

Why You Should Build One Even If You Are Not At Anthropic

There is a weird assumption I keep running into. People think MCP servers are something big AI companies write for their integrations, and that the rest of us just install them. That is half right. The official servers (GitHub, Linear, Notion, Postgres) are excellent. They cover the obvious cases. But they cannot cover your case.

Your case is your repo's weird internal CLI. Your case is the JSON schema you keep pasting into prompts by hand. Your case is the three internal APIs nobody outside your team will ever wrap. Your case is the build script that ships your product, the migration tool you wrote in 2023, the analytics dashboard you query through a homemade SQL view.

Every one of those things is a candidate for an MCP server. Not because the protocol is glamorous, but because once it is wrapped, every AI agent you use can hit it. Claude Code, Cursor, Windsurf, Zed, the official Anthropic chat app, any future agent runtime. One server, many consumers. The leverage is hard to overstate.

The other reason to build is simpler. You will understand MCP at a level that is not available from reading the spec. The first time you have to decide what your searchTickets tool returns and what it does not, you learn more about agent design than a year of theory.

What You Are Actually Building

An MCP server is a process that speaks JSON-RPC and exposes a small set of primitives. The protocol calls them tools, resources, and prompts. Most servers ship only tools. Some ship resources. Almost nobody ships prompts. Start with tools.

A tool is a function with a name, a description, an input schema (JSON Schema), and an implementation. When the model decides to call your tool, the runtime serialises the arguments, sends them to your server, your server runs the function, and the result comes back as text or structured content. That is the whole loop.

A resource is something the model can read but not write. Think of it as a file the agent can fetch by URI. A common pattern is to expose your project's docs, your database schema, or a snapshot of system state as resources. The model pulls them when relevant.

A prompt is a templated instruction the user can invoke by name. They are useful for codifying common workflows. In practice almost every server I have seen skips them and lets the user invoke slash commands or skills instead.

The protocol does not care what language you write the server in. There are mature SDKs for TypeScript, Python, Go, Rust, Kotlin, C#, and Java. I write almost all of mine in TypeScript because the toolchain matches the rest of my stack and because the official @modelcontextprotocol/sdk is the most actively maintained. Pick whatever language gets you to a working server fastest. The model does not know or care.

Pick Your Transport: stdio Versus HTTP

There are two transports that matter in 2026, and the choice between them shapes everything else.

The stdio transport is the one you want for local-only servers. The agent runtime spawns your server as a child process and pipes JSON-RPC over stdin and stdout. There is no port, no auth, no network. The server lives and dies with the agent session. Most local development tools (Postgres helpers, git wrappers, file system tools, build runners) ship as stdio servers because the security model is dead simple: if the agent can run a process on your machine, it already has the same trust level as that process.

The streamable HTTP transport is the one you want for hosted servers. It runs over HTTP with Server-Sent Events for the streaming half. You stand it up on a real server (Fluid Compute, Lambda, a VM, whatever), give it a URL, and any agent that supports remote MCP can connect. Use this when the server needs to be shared across machines, when it needs centralised auth, or when it wraps an API that should not have its credentials on every developer's laptop.

There is a third option, the deprecated SSE-only transport, which you should ignore. The 2025 spec consolidated on streamable HTTP. New servers should not implement SSE-only.

For your first MCP server I strongly recommend stdio. The feedback loop is fast, the auth story is non-existent, and the deployment story is "drop the binary on your laptop." You can graduate to HTTP later when you have something worth hosting.

The Minimum Viable MCP Server

Here is the smallest useful TypeScript server I would actually ship. It exposes one tool that returns the current git status of the repo it was launched in.


import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';

const server = new Server(
  { name: 'git-status-mcp', version: '0.1.0' },
  { capabilities: { tools: {} } }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: 'git_status',
      description:
        'Return the porcelain git status of the current working directory. Use this before suggesting commits.',
      inputSchema: {
        type: 'object',
        properties: {},
      },
    },
  ],
}));

server.setRequestHandler(CallToolRequestSchema, async (req) => {
  if (req.params.name !== 'git_status') {
    throw new Error(`Unknown tool: ${req.params.name}`);
  }
  const output = execSync('git status --porcelain=v1', {
    encoding: 'utf-8',
  });
  return {
    content: [{ type: 'text', text: output || '(clean)' }],
  };
});

const transport = new StdioServerTransport();
await server.connect(transport);

That is a complete, working MCP server. Around forty lines. It does one job, but the job is real, and the model can call it the moment the server is wired in. The pattern scales: every tool you add follows the same shape. List the tool in ListToolsRequestSchema. Handle it in CallToolRequestSchema. Return text or structured content. Done.

The two things I want you to notice. First, the description on the tool. It is not "returns git status." It is "Return the porcelain git status of the current working directory. Use this before suggesting commits." That last sentence is what the model reads when it decides whether to call the tool. Tool descriptions are not documentation for humans. They are activation prompts for the agent. Treat them accordingly.

Second, the input schema is empty. The tool takes no arguments. If your tool takes arguments, define them as a proper JSON Schema with required, types, and description fields on every property. The model uses the schema to construct calls. Fuzzy schemas produce fuzzy calls.

Designing Tools The Model Will Actually Use

This is the part nobody tells you, and the part that makes the difference between a server you wrote and a server you actually use.

Tools are not API endpoints. The temptation is to expose every method of your underlying system one to one. Resist it. A server with forty tools is worse than a server with eight, because the model has to read every tool description on every call and the noise drowns out the signal. I have hit this directly. A server I wrote with twenty-five tools was used less than a server I refactored to expose eight tools that grouped related operations.

The rule I follow: tools should match user intentions, not implementation details. A search_tickets tool that takes a natural-language query and returns ranked results is better than five tools called filter_by_status, filter_by_assignee, filter_by_label, filter_by_date, and combine_filters. The model can compose the natural-language query. It cannot reliably compose five micro-tools without getting confused.

The second rule: return what the model needs to decide what to do next, not the raw payload. If your tool returns a 10,000-line JSON blob, the model will spend half its context reading it. Trim aggressively. Summarise. Paginate. If the user really needs the full payload, expose a second tool that fetches by ID. Default to small, focused responses.

The third rule: make errors educational, not stack-tracey. When a tool fails, return a message that tells the model how to recover. "User not found. Try search_users first to get a valid user id." is fifty times more useful than Error: 404 Not Found. The model is not your operator. It cannot read your terminal. The error message is its entire view of the failure.

The fourth rule: idempotency where you can swing it. Models retry. Sometimes for good reasons, sometimes because they got confused. A tool that creates duplicate records on retry will burn you. Either return existing records when called with the same arguments, or expose a dry_run mode that the agent can call first.

If you internalise those four rules before writing any code, you will skip the year of pain I went through.

Resources, Prompts, And The Other Primitives

I said start with tools. I meant it. But there is a small, specific case where resources earn their keep, and it is worth covering.

Resources are for things the model needs to read but never modify. The canonical example is project context: a README.md, a CLAUDE.md, an OpenAPI spec, a database schema dump, a CHANGELOG. The model fetches them by URI, reads them, uses them, and moves on. Resources are cheaper than tools because they do not require a function call round trip. The agent loads them, often automatically, when starting a session.

If your server wraps a system with structured context that does not change per-request (the schema, the docs, the config), expose it as resources. Otherwise stick to tools.

Prompts are the third primitive and I genuinely have not seen a useful one in production. The idea is that you ship reusable prompt templates that the user can invoke. In practice, slash commands in Claude Code and rules in Cursor cover the same ground with less friction. Skip prompts for your first server. Revisit them if you ever feel the lack.

There is also a notifications/sampling/elicitation surface in the protocol that I am skipping entirely here. It is not necessary for any normal server. If you find yourself needing it, you have already outgrown this guide.

Authentication And Secrets

The auth model is where local and hosted servers diverge sharply.

For stdio servers, you mostly do not have an auth problem. The server inherits the user's environment. If your tool needs an API key, the standard pattern is to read it from an env var the user sets in their shell or in the agent runtime's config. There is no token exchange, no OAuth, no session. The trust boundary is the user's local machine.

For hosted HTTP servers, you have a real auth problem and you should treat it as such. The MCP spec aligned in 2025 on OAuth 2.1 with Dynamic Client Registration. The agent presents a token. Your server validates it. There is also a simpler bearer-token pattern for first-party servers, where the user pastes a token into their agent config and your server checks it on every request. Both are fine for different use cases.

The mistake I keep seeing: developers ship an HTTP MCP server with no auth at all because they assume only they will hit it. Then six weeks later they leave the URL in a tweet, somebody scrapes their database, and there is a bad afternoon. If your server is on the public internet and it does anything beyond returning constants, it needs auth. No exceptions.

The other mistake: storing the wrong secrets. Your MCP server is going to handle the model's queries, which include user data. Treat the server like any other production service. Use env vars, not hardcoded values. Use a secrets manager for production. Rotate credentials. Log auth failures. The fact that the consumer is an AI agent does not change the security model. If anything, it raises the stakes, because the agent will retry queries automatically and amplify any leak.

Testing Without A Model In The Loop

The single biggest mistake I made on my first three MCP servers was testing them only through Claude Code. The feedback loop is too slow and too coarse. The model decides whether to call your tool, what to call it with, and how to interpret the result, which means a single end-to-end test exercises ten degrees of freedom you cannot isolate.

Test the server like a normal HTTP API. The official SDKs ship inspector tools that let you send raw JSON-RPC messages and see the responses. Use them.

The workflow that saved me hours.

Step one, run npx @modelcontextprotocol/inspector against your server. It opens a UI where you can list tools, call them with arbitrary arguments, and inspect the responses. Every tool gets exercised here first.

Step two, write integration tests against your tool handlers directly. They are normal async functions. Call them from a test runner. Assert on the output. This catches schema mismatches, edge cases on inputs, and regressions when you refactor.

Step three, only after the above two pass, wire the server into Claude Code and exercise the actual model loop. The model will do things you did not expect with your tools. That is fine. You are watching for "did the model find this tool" and "did it use the tool sensibly," not for "did this tool work." Those questions were already answered.

If you follow that order, you will catch ninety percent of bugs without ever burning a token.

Wiring It Into Claude Code

The integration surface for Claude Code is ~/.claude/mcp_servers.json (or the project-scoped equivalent). For a stdio server, the config looks like this.

{
  "mcpServers": {
    "git-status": {
      "command": "node",
      "args": ["/absolute/path/to/your/server/index.js"],
      "env": {}
    }
  }
}

For a hosted HTTP server.

{
  "mcpServers": {
    "my-internal-api": {
      "type": "http",
      "url": "https://mcp.example.com",
      "headers": {
        "Authorization": "Bearer ${MY_INTERNAL_TOKEN}"
      }
    }
  }
}

Restart Claude Code. Run /mcp to see the server's status. If it shows up green, your tools are loaded. If it shows up red, the bottom of the panel tells you why. The most common failure is a wrong path or a missing env var. The second most common is the server crashing on startup. Run the server manually from the same shell Claude Code launches from to confirm it boots.

Cursor and Windsurf both ship MCP support with similar config files. The surface is identical. Whatever language and transport you picked, the server runs the same against every consumer.

What I Wish I Knew Before Shipping The First Version

Six things, in order of how much pain they would have saved me.

Pick a clear scope before writing the first tool. "An MCP server for my project" is not a scope. "An MCP server that exposes our Postgres schema and lets the agent run safe read-only queries" is a scope. The narrow servers I have shipped have all been useful. The "general-purpose" ones have all been deleted within a month.

Write the tool descriptions before the tool implementations. This is the single highest-leverage practice I have found. If you cannot describe what a tool does in two sentences that include when the model should call it, the tool is wrong. Rewrite. The description is the API contract with the agent. The code is just implementation.

Log every tool call with arguments and results. Even in development. You need to see what the model is actually doing with your server. The patterns are not what you expect. I have watched models call my tools with arguments I would never have predicted, and the logs are how I find out. Without logging you are guessing.

Version your server from day one. Use semver. Tag releases. When you make a breaking change to a tool's schema, bump the major version, and add a deprecation period for the old schema if the server is shared. Agents do not handle silent breaking changes well. They will keep calling the old shape until the descriptions tell them otherwise.

Cap response sizes. Set a hard ceiling on how much text any tool can return (I default to 16 KB, less for noisy tools). When you hit the cap, return a truncation notice with a hint about how to fetch more. Letting a tool dump 200 KB into the model context once will teach you why this matters.

Treat the server like a product. Your future self is its first user. Your team is its second. The agent is its third. Write a README that explains what the server does, what tools it exposes, and what to do when something breaks. Six months from now you will be grateful.

Where To Take It Next

Once you have shipped one MCP server you will see candidates everywhere. The instinct is to write more servers. Resist a little. The better move is usually to grow the first server with carefully chosen tools until it covers most of your daily workflow, then split out a second server only when the first server starts to feel unfocused.

A few directions worth exploring once you are comfortable.

Combine your MCP server with Claude Code skills by packaging both into a single plugin. Skills tell the model when to reach for the tools your server exposes. The combination is dramatically more reliable than either piece in isolation.

If your server wraps a third-party API, look at the agent tool design patterns for what production-grade tool signatures look like, especially around pagination, partial failure, and rate limiting.

If you are running the server in production for multiple users, look at agent observability for what you actually need to monitor. The interesting metrics are not the ones you would track for a normal HTTP API. They are things like "what percentage of tool calls resulted in the agent making a follow-up call to the same tool with corrected arguments," which is a strong signal that your descriptions are unclear.

If you are wondering how MCP fits next to direct agent-to-agent communication, the A2A vs MCP comparison is the right next read. Short version: MCP exposes capabilities to a single agent, A2A coordinates multiple agents. They solve adjacent problems.

The Honest Read On MCP Servers In 2026

Most developers will never write one. That is fine. The official integrations are good enough for most use cases. But the developers who do write their own end up with a leverage advantage that is hard to describe until you have it. Every model gets your tool. Every agent runtime gets your tool. Every future tool gets your tool, for free, the moment they ship MCP support, which they all will because the protocol won.

The cost of entry is one afternoon. The payoff is permanent.

If you are sitting on a project that has any repetitive interaction you wish the agent could automate, write the server. Start with one tool. Make sure the tool description is clear enough that the model uses it without prompting. Wire it in. See what happens.

The first time the agent calls your tool unprompted, in the middle of a task, and uses the result correctly, the rest of the post will make sense. That is the moment MCP stops being a protocol you read about and starts being something you build with.

That moment is one afternoon away. Worth the afternoon.

DEV Community