DEV Community

韩

Posted on

FastMCP Self-Hosted MCP Framework: 5 Hidden Uses for Production Agent Stacks

FastMCP is a Python framework, downloaded a million times a day, that powers 70% of MCP servers across all languages. You may know it for the decorator-on-a-function quick start, but underneath the surface sits a machine designed for production MCP work — context-aware tools, server composition, in-process sandboxes, OpenAPI auto-conversion, and a Cloud platform that turns a Python file into a hosted MCP server in under thirty seconds.

The repository at PrefectHQ/fastmcp has climbed to 25,548 stars (2,056 forks) and was last pushed 2026-06-06. The accompanying story "Welcome to FastMCP" on Hacker News pulled in 80 points and 68 comments in March 2026, and FastMCP 3.0 GA reached HN's front page in February. Yet most tutorials still only cover @mcp.tool plus mcp.run() — the surface area that fits in a tweet. This article walks through five techniques the README buries two clicks deep, each one turning a one-file toy into something a real team would deploy.

Context paragraph. In 2026, the Model Context Protocol is the de-facto integration layer between LLM agents and external systems. Every agent framework — Claude Code, Cursor, Goose, Cline, OpenHands, and roughly seventy percent of community servers — speaks MCP. FastMCP is no longer just a wrapper. It now ships its own client, its own composition primitives, its own image/audio types, an OpenAPI-to-MCP converter, a Pythonic code-mode sandbox, and a hosted Cloud runtime. The leverage you get from a 200-line script today would have taken a full SDK and a deployment platform twelve months ago.


Hidden Use #1: Elicitation — Make the LLM Ask the User Mid-Tool-Call

What most people do: build a tool that takes every parameter up front, then rage-quit when the LLM hallucinates a missing argument. They hand-write a separate ask_user tool and re-route the conversation through it.

The hidden trick: FastMCP 2.13+ ships ctx.elicit() inside the Context object. A tool can pause its own execution, send a structured question to the client, and resume with the typed answer. The schema is the type — pass a @dataclass and you get a form, not a free-text prompt.

from dataclasses import dataclass
from fastmcp import Context, FastMCP

mcp = FastMCP("Elicitation Demo")


@mcp.tool
async def plan_dinner(ctx: Context) -> str:
    """Plan a dinner menu, asking the user what they're in the mood for."""
    @dataclass
    class DinnerPrefs:
        cuisine: str
        vegetarian: bool

    result = await ctx.elicit(
        "What kind of dinner are you in the mood for?",
        response_type=DinnerPrefs,
    )
    if result.action == "accept":
        prefs = result.data
        veg = "vegetarian " if prefs.vegetarian else ""
        return f"Tonight's menu: a lovely {veg}{prefs.cuisine} dinner!"
    return "Dinner cancelled!"
Enter fullscreen mode Exit fullscreen mode

The result: a single tool handles a multi-turn interaction. The Claude Desktop or Cursor client renders a native form with the DinnerPrefs shape, the LLM never sees the question, and your tool's return type stays str. There is no parallel ask_user plumbing.

Data sources: PrefectHQ/fastmcp examples/elicitation.py (verified 2026-06-09 against main branch); HN "Welcome to FastMCP" thread 80 pts (2026-03-24, objectID 47508149).


Hidden Use #2: Image Return Type — Give the LLM Eyes Without Re-Encoding

What most people do: capture a screenshot with Pillow, base64-encode the bytes, return a JSON string, and tell the model "this is an image." Then debug for an hour when the model says it sees a "long string of characters."

The hidden trick: FastMCP defines fastmcp.utilities.types.Image (and Audio, and File). Returning one of these tells the MCP transport to ship the bytes as a native multimodal content block, which the model receives as actual pixels.

import io
import pyautogui
from fastmcp import FastMCP
from fastmcp.utilities.types import Image

mcp = FastMCP("Screenshot Demo")


@mcp.tool
def take_screenshot() -> Image:
    """Take a screenshot of the user's screen and return it as an image.
    Use this tool anytime the user wants to look at something."""
    buffer = io.BytesIO()
    screenshot = pyautogui.screenshot()
    screenshot.convert("RGB").save(buffer, format="JPEG", quality=60, optimize=True)
    return Image(data=buffer.getvalue(), format="jpeg")
Enter fullscreen mode Exit fullscreen mode

The result: Claude Desktop renders the screenshot directly, and the model can describe what is on screen. Returning bytes or a data: URL string instead produces a client-side error or a hallucinated description.

Data sources: PrefectHQ/fastmcp examples/screenshot.py (verified 2026-06-09 against main branch); the fastmcp.utilities.types module exposes Image, Audio, and File as the canonical multimodal carriers.


Hidden Use #3: CodeMode — Collapse 50 Tools Into Two Meta-Tools

What most people do: ship a server with forty @mcp.tool functions, watch the context window melt when the LLM tries to enumerate them, and bolt on a RAG over tool descriptions hack to recover tokens.

The hidden trick: FastMCP ships an experimental CodeMode transform that replaces the entire tool catalog with two meta-tools — search (keyword discovery) and execute (run a Python snippet that calls the real tools in a sandbox). The model writes one round-trip of Python instead of orchestrating dozens of tool calls.

from fastmcp import FastMCP
from fastmcp.experimental.transforms.code_mode import CodeMode

mcp = FastMCP("CodeMode Demo")

@mcp.tool
def list_files(directory: str) -> list[str]:
    """List files in a directory."""
    import os
    return os.listdir(directory)

@mcp.tool
def read_file(path: str) -> str:
    """Read the contents of a file."""
    with open(path) as f:
        return f.read()

# CodeMode collapses all 8 tools into just `search` + `execute`.
# The LLM discovers tools via keyword search, then writes Python
# scripts that chain calls inside a pydantic-monty sandbox.
mcp.add_transform(CodeMode())

if __name__ == "__main__":
    mcp.run()
Enter fullscreen mode Exit fullscreen mode

After install (pip install "fastmcp[code-mode]"), the LLM sees only search_code_mode and execute_code_mode. A query like "find the largest .py file under /repo" becomes a single execute call running max((f for f in __list_files('/repo') if f.endswith('.py')), key=lambda p: __read_file(f).__len__()).

The result: a 40-tool server becomes a 2-tool surface, the LLM uses the same Python it already knows, and round-trips drop from N to 1. The sandbox is pydantic-monty, which is a Rust-backed Python interpreter — not a subprocess shim.

Data sources: PrefectHQ/fastmcp examples/code_mode/server.py (verified 2026-06-09); fastmcp.experimental.transforms.code_mode module introduced in FastMCP 2.x.


Hidden Use #4: Background Tasks with task=True — Long-Running Tools That Survive Disconnects

What most people do: spawn a tool that takes 10 minutes, watch the MCP request timeout at 60 seconds, and bolt on a polling endpoint plus a Redis backend they maintain by hand.

The hidden trick: FastMCP integrates with Docket, a Redis-backed task queue, via the task=True decorator flag. The tool runs asynchronously, reports progress through the Progress dependency, and the client can disconnect and reconnect without losing state.

import asyncio
from typing import Annotated
from docket import Logged
from fastmcp import FastMCP
from fastmcp.dependencies import Progress

mcp = FastMCP("Tasks Example")


@mcp.tool(task=True)
async def slow_computation(
    duration: Annotated[int, Logged],
    progress: Progress = Progress(),
) -> str:
    """Perform a slow computation that takes `duration` seconds."""
    if duration < 1 or duration > 60:
        raise ValueError("Duration must be between 1 and 60 seconds")

    await progress.set_total(duration)
    for i in range(duration):
        await asyncio.sleep(1)
        await progress.increment()
        await progress.set_message(
            f"Working... {i+1}/{duration}s ({duration-i-1}s remaining)"
        )
    return f"Completed in {duration}s"
Enter fullscreen mode Exit fullscreen mode

The result: the client receives a task handle, polls progress over the same MCP connection, and gets the final return value when the worker finishes. The same decorator works for CPU-bound work (offloaded to a worker) and I/O-bound work (kept in the event loop). The same task_elicitation.py example shows ctx.elicit() working inside a background task — the user is asked the question 30 seconds into a 5-minute job, answers it, and the task resumes.

Data sources: PrefectHQ/fastmcp examples/tasks/server.py + examples/task_elicitation.py (verified 2026-06-09); HN "FastMCP 3.0 Is GA" thread (2026-02-18, objectID 47068067).


Hidden Use #5: Server Composition — mount and import_server Stitch Servers Together

What most people do: copy-paste tools from one server into another, then update both files when the tool signature changes. They end up with five "kitchen sink" servers that all do the same thing.

The hidden trick: FastMCP servers are first-class values. mount(prefix, subserver) exposes every tool, resource, and prompt of subserver under a namespaced URL prefix, and import_server(subserver) does the same without namespacing (for merging into the parent). Both work across transports — you can mount a local in-memory server under a stdio parent, or a remote HTTP server under a streamable-http parent.

import asyncio
from fastmcp import FastMCP, Client

# Subserver A — git tools
git_server = FastMCP("Git Tools")

@git_server.tool
def git_status(repo_path: str) -> str:
    """Return `git status --short` for a repo."""
    import subprocess
    return subprocess.check_output(
        ["git", "-C", repo_path, "status", "--short"], text=True
    )

# Subserver B — file tools
file_server = FastMCP("File Tools")

@file_server.tool
def read_file(path: str) -> str:
    """Read a UTF-8 text file."""
    return open(path, encoding="utf-8").read()

# Parent server composes both via mount
main = FastMCP("Main")
main.mount("git", git_server)
main.mount("fs", file_server)

# The combined server exposes:
#   - git.git_status(repo_path)
#   - fs.read_file(path)
# in one tool catalog, with one auth boundary, one deployment target.
if __name__ == "__main__":
    asyncio.run(main.run_async())
Enter fullscreen mode Exit fullscreen mode

The result: three small, focused, independently testable servers collapse into one tool catalog with namespaced names. Re-deploying a sub-tool only requires restarting the sub-server; the parent picks up the change without a code edit (when mounted over HTTP). The same trick composes a third-party hosted server (e.g., the Notion MCP server) into your own tool set without forking it.

Data sources: PrefectHQ/fastmcp examples/mount_example.py (verified 2026-06-09); FastMCP.mount() and FastMCP.import_server() documented in the official server composition guide.


Summary — 5 Hidden Uses for FastMCP in 2026

  1. Elicitationawait ctx.elicit(prompt, response_type=Dataclass) lets a tool ask the user a structured question and resume execution. No parallel ask_user plumbing.
  2. Image / Audio / File return types — return fastmcp.utilities.types.Image (or Audio / File) to ship multimodal content natively instead of base64-encoding into a string.
  3. CodeMode transform — collapse N tools into a search + execute pair backed by a pydantic-monty sandbox. Cuts round-trips and context-window pressure.
  4. Background tasks (task=True) — long-running tools survive disconnects, report Progress, and work with ctx.elicit() mid-task. Powered by Docket.
  5. Server compositionmount(prefix, subserver) and import_server(subserver) stitch independently maintained servers into one tool catalog.

If you want to dig deeper, these past Dev.to articles explore the broader MCP ecosystem:

What hidden use of FastMCP are you running in production? Drop it in the comments — I will pull the best ones into a follow-up article.

Top comments (0)