Dennis

Posted on May 1

Building an MCP Server for Screenshot Capture: Step by Step

#mcp

A screenshot MCP server gives AI agents the ability to capture any web page as an image by calling a tool. You define a capture_screenshot tool with parameters like URL, format, and viewport size, wire it to a rendering backend (Puppeteer or a screenshot API), and connect it to any MCP client. This guide covers both approaches: building from scratch with the MCP SDK and Puppeteer, and using SnapRender's published screenshot MCP server that is already available on npm.

What You Are Building

The end result is an MCP server that exposes a capture_screenshot tool. When an AI agent (Claude, GPT via an MCP bridge, or any MCP client) needs to see a web page, it calls this tool with a URL and optional parameters. The server renders the page, returns the screenshot as a base64-encoded image, and the agent can analyze the visual content.

The tool schema looks like this:

Parameter	Type	Required	Description
`url`	string	yes	The URL to capture
`format`	enum	no	`png`, `jpeg`, `webp`, or `pdf`
`width`	number	no	Viewport width in pixels
`height`	number	no	Viewport height in pixels
`full_page`	boolean	no	Capture the entire scrollable page

This is a natural fit for MCP. The inputs are well-defined, the output is structured (an image), and AI agents regularly need visual information about web pages.

Approach 1: Build Your Own with Puppeteer

This approach gives you full control. You run Chromium on the same machine as the MCP server and use Puppeteer to render pages.

Project Setup

mkdir screenshot-mcp-server
cd screenshot-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk zod puppeteer

Add "type": "module" to your package.json since the MCP SDK uses ES modules.

The Complete Server

// server.js
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import puppeteer from "puppeteer";

const server = new McpServer({
  name: "screenshot-server",
  version: "1.0.0",
});

// Reuse a single browser instance across calls
let browser = null;

async function getBrowser() {
  if (!browser || !browser.connected) {
    browser = await puppeteer.launch({
      headless: true,
      args: [
        "--no-sandbox",
        "--disable-setuid-sandbox",
        "--disable-dev-shm-usage",
        "--disable-gpu",
      ],
    });
  }
  return browser;
}

server.tool(
  "capture_screenshot",
  "Capture a screenshot of a web page. Returns the image as base64-encoded data.",
  {
    url: z.string().url().describe("The URL to capture"),
    format: z
      .enum(["png", "jpeg", "webp"])
      .optional()
      .describe("Image format (default: png)"),
    width: z
      .number()
      .int()
      .min(320)
      .max(3840)
      .optional()
      .describe("Viewport width in pixels (default: 1280)"),
    height: z
      .number()
      .int()
      .min(200)
      .max(2160)
      .optional()
      .describe("Viewport height in pixels (default: 720)"),
    full_page: z
      .boolean()
      .optional()
      .describe("Capture the full scrollable page (default: false)"),
  },
  async ({ url, format = "png", width = 1280, height = 720, full_page = false }) => {
    const instance = await getBrowser();
    const page = await instance.newPage();

    try {
      await page.setViewport({ width, height });
      await page.goto(url, {
        waitUntil: "networkidle2",
        timeout: 30000,
      });

      const screenshot = await page.screenshot({
        type: format === "webp" ? "webp" : format,
        fullPage: full_page,
        encoding: "base64",
      });

      return {
        content: [
          {
            type: "image",
            data: screenshot,
            mimeType: `image/${format}`,
          },
        ],
      };
    } catch (error) {
      return {
        content: [
          {
            type: "text",
            text: `Screenshot failed: ${error.message}`,
          },
        ],
        isError: true,
      };
    } finally {
      await page.close();
    }
  }
);

// Graceful shutdown
process.on("SIGINT", async () => {
  if (browser) await browser.close();
  process.exit(0);
});

const transport = new StdioServerTransport();
await server.connect(transport);

This is a working screenshot MCP server. The key decisions:

Browser Instance Management

The getBrowser() function reuses a single Puppeteer browser instance across tool calls. Launching a new browser per screenshot takes 2 to 5 seconds. Reusing one brings that down to the page load time only. The browser.connected check handles cases where the browser process crashes between calls.

Each screenshot gets its own page (tab) via newPage(), and the finally block ensures it closes even if the screenshot fails. This prevents tab accumulation, which is one of the most common Puppeteer memory leak sources.

Error Handling

MCP tools can return errors by setting isError: true in the response. The client (and the AI model) sees the error message and can decide what to do. Common failures include:

Navigation timeouts (page takes too long to load)
Invalid URLs
Pages that require authentication
Chromium crashes under memory pressure

The try/catch/finally pattern here handles all of these gracefully.

Input Validation

Zod schemas handle input validation before your handler runs. If the AI sends width: -100, the MCP SDK rejects the call with a validation error before Puppeteer ever gets involved. This is one of MCP's advantages over raw function calling, where you typically validate inputs yourself.

Testing with Claude Desktop

Configure the server in Claude Desktop's MCP settings file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "screenshot": {
      "command": "node",
      "args": ["/absolute/path/to/server.js"]
    }
  }
}

Restart Claude Desktop. You should see the screenshot tool in the tools panel. Ask Claude to "take a screenshot of https://example.com" and it will call your MCP server.

Limitations of the Puppeteer Approach

Running Puppeteer locally works for development and personal use. In production, the problems pile up:

Resource consumption: Each Chromium instance uses 100 to 300 MB of RAM. Concurrent screenshots multiply this.
Chrome installation: The MCP server host needs Chromium installed. On some systems (minimal Docker containers, serverless environments), this is nontrivial.
Font rendering: Different operating systems render fonts differently. A screenshot taken on macOS looks different from one taken on Linux.
Maintenance: Chromium updates, security patches, and Puppeteer version compatibility are ongoing work.
Cookie banners and popups: You need custom logic to dismiss consent dialogs, which varies by site.

Approach 2: Wrap a Screenshot API

The alternative is to skip local Chrome entirely and call a screenshot API from your MCP server. The API provider handles browser infrastructure, and your MCP server is just a thin wrapper.

Here is the same MCP server using SnapRender's API instead of Puppeteer:

// server-api.js
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const API_KEY = process.env.SNAPRENDER_API_KEY;
const BASE_URL = "https://app.snap-render.com/v1/screenshot";

const server = new McpServer({
  name: "screenshot-server",
  version: "1.0.0",
});

server.tool(
  "capture_screenshot",
  "Capture a screenshot of a web page using SnapRender API.",
  {
    url: z.string().url().describe("The URL to capture"),
    format: z
      .enum(["png", "jpeg", "webp", "pdf"])
      .optional()
      .describe("Output format (default: png)"),
    width: z
      .number()
      .int()
      .min(320)
      .max(3840)
      .optional()
      .describe("Viewport width in pixels (default: 1280)"),
    height: z
      .number()
      .int()
      .min(200)
      .max(2160)
      .optional()
      .describe("Viewport height in pixels (default: 720)"),
    full_page: z
      .boolean()
      .optional()
      .describe("Capture the full scrollable page (default: false)"),
    block_ads: z
      .boolean()
      .optional()
      .describe("Block ads and trackers (default: false)"),
    dark_mode: z
      .boolean()
      .optional()
      .describe("Emulate dark mode (default: false)"),
  },
  async ({
    url,
    format = "png",
    width = 1280,
    height = 720,
    full_page = false,
    block_ads = false,
    dark_mode = false,
  }) => {
    const params = new URLSearchParams({
      url,
      format,
      width: width.toString(),
      height: height.toString(),
      full_page: full_page.toString(),
      block_ads: block_ads.toString(),
      dark_mode: dark_mode.toString(),
    });

    try {
      const response = await fetch(`${BASE_URL}?${params}`, {
        headers: { "X-API-Key": API_KEY },
      });

      if (!response.ok) {
        const error = await response.text();
        return {
          content: [{ type: "text", text: `API error ${response.status}: ${error}` }],
          isError: true,
        };
      }

      const buffer = await response.arrayBuffer();
      const base64 = Buffer.from(buffer).toString("base64");
      const mimeType =
        format === "pdf" ? "application/pdf" : `image/${format}`;

      return {
        content: [
          {
            type: "image",
            data: base64,
            mimeType,
          },
        ],
      };
    } catch (error) {
      return {
        content: [{ type: "text", text: `Request failed: ${error.message}` }],
        isError: true,
      };
    }
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

Notice what is missing compared to the Puppeteer version: no browser management, no Chrome launch flags, no page lifecycle handling. The server is a pure HTTP client. It runs anywhere Node.js runs, with zero system dependencies beyond node itself.

The API approach also gives you features that are painful to implement with raw Puppeteer: ad blocking, cookie banner removal, device emulation, and dark mode. These come from the API, not from custom Puppeteer scripts.

Approach 3: Use SnapRender's Published MCP Server

If you do not want to write any server code at all, SnapRender publishes a ready-made screenshot MCP server on npm. It exposes the full SnapRender API as MCP tools and works with Claude Desktop and any other MCP client.

Installation

npm install -g snaprender-mcp

The server is also listed on Smithery, PulseMCP, mcp.so, and mcpservers.org, so you can find it through any major MCP registry.

Configuration in Claude Desktop

Add this to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "snaprender": {
      "command": "npx",
      "args": ["-y", "snaprender-mcp"],
      "env": {
        "SNAPRENDER_API_KEY": "your-api-key-here"
      }
    }
  }
}

Restart Claude Desktop, and screenshot tools appear automatically. Ask Claude to capture any URL and it will use the SnapRender screenshot MCP server behind the scenes.

What Tools It Exposes

The published server exposes all of SnapRender's capabilities as MCP tools: viewport configuration, device emulation, format selection (PNG, JPEG, WebP, PDF), full-page capture, ad blocking, dark mode, selector hiding, and cache control. Each feature is a parameter on the tool, with proper descriptions and validation.

Architecture Decision: Puppeteer vs API

The choice between running your own Chrome and calling an API comes down to a few factors:

Factor	Puppeteer (DIY)	Screenshot API
Setup complexity	High (Chrome + dependencies)	Low (just an API key)
System requirements	Chrome, 300MB+ RAM per instance	Node.js only
Concurrent screenshots	Limited by host resources	Limited by API plan
Response time	3-10s per capture	2-5s fresh, under 200ms cached
Features	Whatever you build	Ad blocking, dark mode, device emulation included
Cost	Compute costs	API subscription ($0/mo for 500 captures)
Maintenance	Chrome updates, crash recovery, memory management	None

For local development and experimentation, Puppeteer is fine. You learn how screenshot capture works, you control everything, and you pay nothing beyond your electricity.

For production use, especially when the MCP server runs on user machines (as Claude Desktop extensions do), the API approach wins. You cannot assume users have Chrome installed, have enough RAM for Puppeteer, or want to manage browser processes. An API call works anywhere.

Adding Rate Limiting

Whichever approach you choose, rate limiting protects your server from runaway agents. An AI model in a loop can generate hundreds of tool calls per minute if nothing stops it.

const RATE_LIMIT = 10; // max calls per minute
const callTimestamps = [];

function checkRateLimit() {
  const now = Date.now();
  const oneMinuteAgo = now - 60000;

  // Remove timestamps older than 1 minute
  while (callTimestamps.length > 0 && callTimestamps[0] < oneMinuteAgo) {
    callTimestamps.shift();
  }

  if (callTimestamps.length >= RATE_LIMIT) {
    return false;
  }

  callTimestamps.push(now);
  return true;
}

// Inside your tool handler, before taking the screenshot:
if (!checkRateLimit()) {
  return {
    content: [{ type: "text", text: "Rate limit exceeded. Max 10 screenshots per minute." }],
    isError: true,
  };
}

This is a simple sliding-window rate limiter. For the API approach, the API provider also enforces rate limits, so you get two layers of protection.

Adding Tool Discovery Metadata

Good tool descriptions make or break the AI's ability to use your MCP server effectively. The model reads the tool description and parameter descriptions to decide when and how to call the tool. Vague descriptions lead to misuse.

Compare these two tool descriptions:

// Bad: too vague
"Take a screenshot"

// Good: specific about capabilities and output
"Capture a screenshot of a web page at a given URL.
Returns the image as base64-encoded data.
Supports PNG, JPEG, WebP, and PDF formats.
Can capture full scrollable pages up to 32,768px tall."

The detailed description tells the model exactly what the tool can do, so it knows when to call it (user wants to see a web page) and when not to (user wants to edit a local file).

Parameter descriptions matter too. Instead of "The width", write "Viewport width in pixels. Common values: 1280 (desktop), 768 (tablet), 375 (mobile)". The model uses these descriptions to pick sensible defaults even when the user does not specify a size.

Testing Your Server

Beyond Claude Desktop, you can test MCP servers programmatically using the MCP SDK's client:

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "node",
  args: ["server.js"],
});

const client = new Client({ name: "test-client", version: "1.0.0" });
await client.connect(transport);

// List available tools
const tools = await client.listTools();
console.log("Available tools:", tools);

// Call the screenshot tool
const result = await client.callTool("capture_screenshot", {
  url: "https://example.com",
  format: "png",
  width: 1280,
});

console.log("Result type:", result.content[0].type);
console.log("Data length:", result.content[0].data.length);

await client.close();

This is useful for automated testing. You can verify that your server starts correctly, responds to tool discovery, validates inputs, handles errors, and returns properly formatted screenshots.

Where to Go From Here

Building a screenshot MCP server is a good first MCP project because the tool interface is simple (URL in, image out) and the result is immediately useful. Once you have it working, consider extending it:

Add a capture_element tool that takes a CSS selector and captures just that element
Add a compare_screenshots tool that takes two URLs and highlights visual differences
Add a capture_sequence tool that takes a list of URLs and returns all screenshots in one call
Expose device presets (iPhone, Pixel, iPad) as an enum parameter

Each of these is a natural extension that makes the screenshot MCP server more useful to AI agents. The pattern is always the same: define the tool schema, implement the handler, and the MCP protocol handles everything else.

DEV Community