Sreeharsha

Posted on Feb 25

Shipping a MCP Server alongside your API

#mcp #ai #fastapi #python

TLDR;

MCP tools and REST APIs serve fundamentally different consumers — humans navigate discovery, agents need curated capabilities.

Instead of deploying a separate MCP service or auto-converting your existing API, you can mount an MCP server directly alongside your FastAPI application using FastMCP 3.x.

Both interfaces share the same services, database, and authentication, but each is designed for its own consumer.

This article covers the design philosophy, architecture, and engineering challenges of shipping this dual-interface pattern.

Introduction

MCP (Model Context Protocol) is an open standard that defines how AI systems interact with external tools, data sources, and applications. Rather than building bespoke integrations for each model-service pair, MCP provides a single protocol that any AI agent can speak.

An MCP server exposes three types of capabilities:

Tools — Functions the LLM calls autonomously to take action. The model decides when and how to invoke them based on conversation context.
Resources — Structured data the application provides as context. These are read-only and addressed via URIs.
Prompts — Pre-built templates that guide users through structured workflows with validated inputs.

Of these, tools are the primary interface for agentic workflows — and they're where the design challenge lives.

Deployment

When teams adopt MCP, the default instinct is to deploy the MCP server as a separate service alongside the existing API. This mirrors the microservices pattern: independent scaling, isolated failure domains.

But for most applications, the MCP tools and REST endpoints operate on the same data, apply the same business rules, and authenticate the same users. A separate service means duplicated logic, duplicated auth configuration, and additional infrastructure — all for a protocol adapter.

The alternative — and the pattern this article explores — is mounting the MCP server as a sub-application inside your existing API server. FastMCP 3.x generates a standard ASGI application, which means it slots natively into FastAPI or Starlette with a single mount() call.

The result:

Path	Consumer	Interface
`/api/*`	Web/mobile clients, curl	REST (FastAPI)
`/mcp`	AI agents, LLM clients	MCP (FastMCP)
`/health`	Infrastructure	HTTP GET

One deployment. One database. One authentication secret. Two interfaces, each designed for its consumer.

When This Pattern Fits

Good fit: Your MCP tools and API endpoints serve the same domain with the same data. You want fast time-to-market without new infrastructure.

Less ideal: You need independent scaling for heavy MCP traffic, or your MCP tools integrate data sources that your API doesn't touch.

Design

REST APIs Are Not MCP Tools

The fundamental insight, articulated well by Jlowin (FastMCP's creator), is that REST APIs and MCP tools are designed for consumers with opposite needs:

REST APIs are designed for human developers. They optimize for discoverability and atomicity. Hundreds of single-purpose endpoints are a feature, because human developers do discovery once, write code that chains atomic calls together, and iterate cheaply. More choice is good.

MCP tools are designed for LLM agents. Every tool an LLM has access to is processed in its context window on every reasoning step — every name, description, and parameter schema. More tools means more tokens, more latency, and more opportunities for the model to make wrong choices. Atomicity is an antipattern: each tool call triggers a full reasoning cycle that costs time and money.

This creates a concrete design mismatch:

Dimension	REST API	MCP Tool
Audience	Human developer writing code	LLM agent reasoning in context
Cost of choice	Cheap — ignored at compile time	Expensive — processed every inference
Iteration cost	Fast — network hops are milliseconds	Slow — each call is a full reasoning cycle
Design ideal	Rich, composable, atomic	Ruthlessly curated, minimal
Failure mode	404 / 500 errors	Hallucinated tools, wrong tool selection

Designing Tools from Agent Stories

The right approach is to start not from your API spec, but from the agent story: "As an agent, given {context}, I use {tools} to achieve {outcome}."

For a todos application, the agent story is simple: "As an agent, given an authenticated user, I use todo management tools to help the user create, view, update, and complete tasks."

This leads to five curated tools — not nine REST endpoints:

MCP Tool	Purpose	Why it exists
`get_all_todos`	Overview of all tasks	Agent needs full picture to reason about priorities
`get_todo_by_id`	Inspect a specific task	Agent needs current state before suggesting updates
`create_todo`	Create a task	Core capability for task management
`update_todo`	Modify a task	Includes completion — no separate "mark done" endpoint
`delete_todo`	Remove a task	Destructive — tool description tells agent to confirm first

Each tool gets a rich description that instructs the LLM on when and how to use it. For example, update_todo's description states: "All fields are required — provide current values for fields you do not want to change. Use get_todo_by_id first to fetch the current values before updating." This kind of guidance is meaningless in REST but critical for MCP.

Implementation Overview

Architecture

github: todos

The application follows a layered architecture where the services and data access layers are shared, and only the interface layer differs:

┌──────────────────────┐    ┌──────────────────────┐
│   REST API (api.py)  │    │  MCP Tools (mcp.py)  │
│   9 endpoints        │    │   5 tools            │
│   FastAPI Depends()  │    │   FastMCP Depends()  │
└──────────┬───────────┘    └──────────┬───────────┘
           │                           │
           └─────────┬─────────────────┘
                     ▼
           ┌─────────────────────┐
           │   Services Layer    │
           │ AuthService          │
           │ TodoService          │
           └─────────┬───────────┘
                     ▼
           ┌─────────────────────┐
           │  Repository Layer   │
           │ UserRepository       │
           │ TodoRepository       │
           └─────────┬───────────┘
                     ▼
           ┌─────────────────────┐
           │   SQLAlchemy/SQLite │
           └─────────────────────┘

How It Comes Together

The root application (main.py) composes everything:

Creates the MCP ASGI app from the FastMCP server instance
Combines lifespans — database table initialization and MCP session management are merged using combine_lifespans()
Applies middleware at root level — CORS (including MCP-specific headers like mcp-session-id) and request logging apply uniformly to all sub-apps
Registers OAuth discovery routes at the root (these must live outside the /mcp prefix)
Mounts both interfaces — REST API at /api, MCP server at /mcp

Shared Authentication

The REST API issues JWTs at /api/auth/token. The MCP server validates those same tokens using FastMCP's JWTVerifier, configured with the same HS256 secret. Users authenticate once through the REST API and can use the token with either interface.

Inside MCP tools, user identity is extracted from the verified token claims — the same sub claim that the REST API reads, resolving to the same user record in the same database.

Shared Business Logic

Both interfaces resolve to the same AuthService and TodoService classes. The service and repository layers are written once and used by both. Only the dependency injection wiring differs between the two frameworks (see Challenges below).

Challenges

1. OAuth Discovery Routes Must Live at Root

MCP clients authenticate via Streamable HTTP and expect OAuth discovery endpoints at well-known paths:

GET /.well-known/oauth-protected-resource/mcp
GET /.well-known/oauth-authorization-server/mcp

These cannot be nested under the /mcp mount prefix — they must be registered on the root application. Missing these routes caused silent authentication failures that were difficult to diagnose. FastMCP provides mcp.auth.get_well_known_routes() to generate these routes, but you must explicitly append them to the root app.

2. Middleware Configuration at Root

The initial implementation had separate middleware for both the FastAPI and FastMCP sub-applications. This caused CORS issues — the MCP protocol uses custom headers (mcp-protocol-version, mcp-session-id) that need to be allowed and exposed:

app.add_middleware(
    CORSMiddleware,
    allow_headers=["mcp-protocol-version", "mcp-session-id", "Authorization", "Content-Type"],
    expose_headers=["mcp-session-id"],
)

Moving CORS and logging middleware to the root-level app (so it applies to all sub-apps uniformly) resolved the issue. This is also the approach recommended by FastMCP's documentation.

3. Dependency Injection — Two Systems, One Service Layer

FastAPI and FastMCP each provide their own Depends() function, and they are not interchangeable:

FastAPI's Depends uses generator-based resolution (yield in a generator function)
FastMCP's Depends uses its own resolver that expects a @contextmanager decorated function

Using FastAPI's DI in FastMCP tools caused resolution errors like Failed to resolve dependency 'user'. The fix was defining separate dependency providers in mcp.py using fastmcp.dependencies.Depends and @contextmanager for DB sessions, while keeping the FastAPI providers in services/__init__.py.

Both resolve to the same AuthService and TodoService classes — the dependency wiring is separate, but the business logic is fully shared.

Conclusion

The dual-interface pattern — a REST API and MCP server sharing a single deployment — is a pragmatic choice for teams adding agent capabilities to an existing backend. You avoid duplicating business logic, authentication, and infrastructure. You ship one container instead of two.

But the design lesson runs deeper than deployment topology. REST APIs and MCP tools serve fundamentally different consumers. The endpoints that make your API discoverable and composable for human developers can drown an LLM in context. The right approach is to curate your MCP tools independently — start from the agent's story, not from your OpenAPI spec.

The code reuse happens at the service layer, not the interface layer. That's the key architectural insight: share the logic, design the surfaces separately.

The official MCP guide to building a server references from mcp.server.fastmcp import FastMCP (the low-level SDK). As of February 2026, FastMCP v3 (from fastmcp import FastMCP) is the recommended high-level framework, offering Streamable HTTP, ASGI integration, web framework mounting, and a CLI.

DEV Community