Gabriel Melendez

Posted on Sep 17

Building MCP Tools: A PDF Processing Server

#ai #mcp #pdf #python

Model Context Protocol (MCP) has emerged as a game-changing standard for connecting AI models with external tools and services to enhance their capabilities. I'll take you through a high-level overview of the development journey for building a comprehensive PDF processing server using FastMCP, with proper architecture, error handling, and production-grade features.

Available Tools at a Glance

Server & File Utilities

server_info(): Get the server's configuration and status.
list_temp_resources(): List files currently in the server's temporary directory.
upload_file(), upload_file_base64(), upload_file_url(): Upload files to the server from your local machine or a URL.
get_resource_base64(): Download a file from the server's temp directory.

Text & Metadata

get_pdf_info(): Quickly get page count, file size, and encryption status.
extract_text(): Extract the full text content from a PDF.
extract_text_by_page(): Extract text from specific pages or page ranges.
extract_metadata(): Read the PDF's metadata (author, title, creation date, etc.).

PDF Manipulation

merge_pdfs(): Combine several PDF files into a single document.
split_pdf(): Split a PDF into multiple smaller files based on page ranges.
rotate_pages(): Rotate specific pages within a PDF.

Conversion

pdf_to_images(): Convert specified PDF pages into image files (PNG, JPEG).
images_to_pdf(): Create a new PDF from a list of image files.

You can find the codebase in the GitHub Repo 📁 MCP PDF Server

Our Case Study: Tracing the "extract_text" Tool

We'll explore 'extract_test'; all other tools share a consistent workflow and are easily accessible in the repo, if you'd like to check it out.

Pattern

By separating the logic into "Service" -> "Tool" -> "Registration", we keep the code clean, testable, and easy to extend. You can add your own tool by following this exact pattern.

Step 1: The Core Logic - the "Service"

Before we think about servers, tools, or protocols, we need a simple realible Python function that can perform our core task. This is then "Service Layer" the engine

File: src/fastmcp_pdf_server/services/pdf_processor.py

Our first step is to write a function that takes a file path and returns the text, we use "pdfplumber" library for this. Note that the function returns a "TextExtractionResult" dataclass, which helps ensure a consistent data structure.

from __future__ import annotations

from dataclasses import dataclass
from typing import List

import pdfplumber

from ..utils.validators import validate_pdf

# A dataclass provides a structured, predictable return type for our service.
# It's like a lightweight, self-documenting class.
@dataclass
class TextExtractionResult:
    text: str
    page_count: int
    char_count: int


def extract_text(file_path: str, encoding: str = "utf-8") -> TextExtractionResult:
    # First, run the file through a validator to ensure it exists, is a PDF,
    # and is within the allowed size limits. This fails early if the input is bad.
    pdf_path = validate_pdf(file_path)

    # Use pdfplumber to robustly open and process the PDF.
    with pdfplumber.open(str(pdf_path)) as pdf:
        texts: List[str] = []
        for page in pdf.pages:
            # Extract text, defaulting to an empty string if a page has no text.
            texts.append(page.extract_text() or "")

        # Join the text from all pages into a single string.
        text = "\n".join(texts)

    # Return an instance of our dataclass, ensuring the contract is met.
    return TextExtractionResult(text=text, page_count=len(texts), char_count=len(text))

This function is pure Python. It knows nothing about FastMCP. It could be unit-tested with "pytest" or used in a completely different application. This separation is the foundation of a maintainable system. Once we have done our service logic, we continue with the MCP "Tool".

Step 2: The Bridge - The "Tool"

Now we need to expose our service function to the outside world as an MCP Tool. This "Tool Layer" acts as a bridge. It handles the messy reality of a tool call and translates it into a clean call to our service.

File: src/fastmcp_pdf_server/tools/text_extraction.py

This is the most critical piece of the puzzle. It will handle the tool call, resolve the file, call the service, and format the response.

# Inside src/fastmcp_pdf_server/tools/text_extraction.py

from __future__ import annotations

import time
import uuid
from typing import Any

from fastmcp import FastMCP  # type: ignore

from ..services import pdf_processor
from ..services.file_manager import resolve_to_path
from ..utils.logger import get_logger

logger = get_logger(__name__)


# The 'register' function is a convention to group tool registrations.
# The main app will call this function, passing itself as an argument.
def register(app: FastMCP) -> None:
    # The @app.tool() decorator is what officially registers this function as an MCP tool.
    @app.tool()
    async def extract_text(file: Any, encoding: str | None = "utf-8") -> dict:
        """Extract all text from a PDF.

        Accepts:
        - Full path string
        - Short filename previously written to temp storage
        - Bytes / file-like / dict with base64 (will be saved to temp)
        """
        # 1. Generate a unique ID for this specific operation. This is crucial for
        #    tracing a single request through logs.
        op_id = uuid.uuid4().hex
        start = time.perf_counter()

        try:
            # 2. Resolve the flexible 'file' input (which could be a path, filename, or
            #    base64 object) into a concrete, validated absolute file path.
            resolved = resolve_to_path(file, filename_hint="uploaded.pdf")

            # 3. Call the clean, testable service function with the resolved path.
            #    This is where the actual PDF processing happens.
            res = pdf_processor.extract_text(str(resolved), encoding or "utf-8")

            # 4. The service returns a dataclass. We now format this into the final
            #    JSON-friendly dictionary for the client.
            duration_ms = int((time.perf_counter() - start) * 1000)
            return {
                "text": res.text,
                "page_count": res.page_count,
                "char_count": res.char_count,
                # The 'meta' block provides valuable operational data to the client.
                "meta": {
                    "operation_id": op_id,
                    "execution_ms": duration_ms,
                    "resolved_path": str(resolved),
                },
            }
        except Exception as e:  # noqa: BLE001
            # 5. This is the safety net. If any part of the process fails,
            #    log the full error for debugging...
            logger.error("extract_text error: %s", e)
            hint = (
                "Provide a full path, upload the file first via 'upload_file', "
                "or pass bytes/base64. Example payload:\n"
                "{\n"
                "  \"name\": \"upload_file\",\n"
                "  \"arguments\": {\n"
                "    \"file\": { \"base64\": \"<...>\", \"filename\": \"my.pdf\" }\n"
                "  }\n"
                "}"
            )
            # ...and raise a simple ValueError. FastMCP will turn this into a
            # clean, structured error response for the LLM, preventing a crash.
            raise ValueError(f"extract_text failed: {e}. {hint}")

The tool is just a wrapper. It's a manager that coordinates other parts of the code. It handles messy inputs, calls the clean service logic, and packages the final response. The 'try...except ValueError' pattern is a critical best practice.

Step 3: The Final Wiring - The "Registration"

Our tool function is defined, but the server application doesn't know it exists yet. The final step is to connect, or register, our tool module with the main "FastMCP" application instance.

File: src/fastmcp_pdf_server/main.py

This file is the entry point of our entire server. Its job is to build the application object and register all the toolsets.

# Inside src/fastmcp_pdf_server/main.py

from __future__ import annotations

from typing import Any

from .config import settings
from .utils.logger import get_logger

logger = get_logger(__name__)


def build_app() -> Any:
    # This try/except block provides a user-friendly error if the user
    # forgot to install the dependencies from requirements.txt.
    try:
        from fastmcp import FastMCP  # type: ignore
    except Exception as exc:  # pragma: no cover
        raise SystemExit(
            "fastmcp is not installed. Please install dependencies first."
        ) from exc

    # Initialize the main application, pulling name and version from config.
    app = FastMCP(settings.server_name, version=settings.server_version)

    # --- Tool Registration ---
    # Import the modules that contain our tool definitions.
    from .tools import utilities, text_extraction, pdf_manipulation, conversion, uploads
    from .services.file_manager import cleanup_expired

    # Call the 'register' function from each module to attach its tools to the app.
    # This modular approach keeps the main file clean.
    utilities.register(app)
    text_extraction.register(app)
    pdf_manipulation.register(app)
    conversion.register(app)
    uploads.register(app)

    # --- Startup Tasks ---
    # It's a good practice to run cleanup tasks on startup.
    # Here, we delete any old files from the temporary directory.
    try:
        cleanup_expired()
    except Exception as exc:  # noqa: BLE001
        logger.error("cleanup_expired at startup failed: %s", exc)

    return app

By importing modules and calling a "register" function from each. The main file stays clean and acts as a high-level summary of the server's capabilities. Adding or removing a whole category of tools is as simple as adding or removing one line here.

The Complete Picture

Now, let's trace a request from start to finish:

An LLM calls the extract_text tool.
The FastMCP app, built in main.py, routes the call to the extract_text async function inside text_tools.py.
The tool function calls resolve_to_path to get a clean file path.
The tool function then calls the pdf_processor.extract_text service with that clean path.
The service does the heavy lifting and returns a simple dictionary: {'text': ..., 'page_count': ...}.
The tool function receives this dictionary, adds the char_count and meta block, and returns the final, enriched dictionary.
FastMCP sends this final dictionary back to the LLM as a JSON response.

The Final Result

Using Claude Desktop as MCP Client we can test our "extract_text" tool from our server, simply by registering the MCP, adding it to the configuration file "claude_desktop_config.json"

{
  "mcpServers": {
    "pdf-processor-server": {
      "command": "D:\\Github Projects\\mcp_pdf_server\\.venv\\Scripts\\python.exe",
      "args": [
        "-m",
        "fastmcp_pdf_server"
      ],
      "env": {
        "TEMP_DIR": "D:\\Github Projects\\mcp_pdf_server\\temp_files"
      }
    }
  }
}

Once you have added the MCP it should look like this

Usually, for this type of MCP Clients, you should add to your prompt the use of the MCP Server, in this case, our "PDF Processor Server"; sometimes, you must also specify the full path of the file.

Where to Go From Here?

You've done it! You've set up a server, learned how to connect to it, commanded it to extract text, and even peeked under the hood to see how it all works.

What's next?

Explore Other Tools: Look at the README.md file. You'll find a whole list of other tools you can call, like merge_pdfs, split_pdf, and pdf_to_images.
Extend the Server: Try adding your own tool! Follow the pattern.
Automate Your Life: Think about your own workflows. Could you use this server to automatically extract text from invoices? Or to combine your weekly reports into a single PDF? The power is yours.

Happy coding! 🤖

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.