A screenshot MCP server gives AI agents the ability to capture any web page as an image by calling a tool. You define a capture_screenshot tool with parameters like URL, format, and viewport size, wire it to a rendering backend (Puppeteer or a screenshot API), and connect it to any MCP client. This guide covers both approaches: building from scratch with the MCP SDK and Puppeteer, and using SnapRender's published screenshot MCP server that is already available on npm.
What You Are Building
The end result is an MCP server that exposes a capture_screenshot tool. When an AI agent (Claude, GPT via an MCP bridge, or any MCP client) needs to see a web page, it calls this tool with a URL and optional parameters. The server renders the page, returns the screenshot as a base64-encoded image, and the agent can analyze the visual content.
The tool schema looks like this:
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | The URL to capture |
format |
enum | no |
png, jpeg, webp, or pdf
|
width |
number | no | Viewport width in pixels |
height |
number | no | Viewport height in pixels |
full_page |
boolean | no | Capture the entire scrollable page |
This is a natural fit for MCP. The inputs are well-defined, the output is structured (an image), and AI agents regularly need visual information about web pages.
Approach 1: Build Your Own with Puppeteer
This approach gives you full control. You run Chromium on the same machine as the MCP server and use Puppeteer to render pages.
Project Setup
mkdir screenshot-mcp-server
cd screenshot-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk zod puppeteer
Add "type": "module" to your package.json since the MCP SDK uses ES modules.
The Complete Server
// server.js
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import puppeteer from "puppeteer";
const server = new McpServer({
name: "screenshot-server",
version: "1.0.0",
});
// Reuse a single browser instance across calls
let browser = null;
async function getBrowser() {
if (!browser || !browser.connected) {
browser = await puppeteer.launch({
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-gpu",
],
});
}
return browser;
}
server.tool(
"capture_screenshot",
"Capture a screenshot of a web page. Returns the image as base64-encoded data.",
{
url: z.string().url().describe("The URL to capture"),
format: z
.enum(["png", "jpeg", "webp"])
.optional()
.describe("Image format (default: png)"),
width: z
.number()
.int()
.min(320)
.max(3840)
.optional()
.describe("Viewport width in pixels (default: 1280)"),
height: z
.number()
.int()
.min(200)
.max(2160)
.optional()
.describe("Viewport height in pixels (default: 720)"),
full_page: z
.boolean()
.optional()
.describe("Capture the full scrollable page (default: false)"),
},
async ({ url, format = "png", width = 1280, height = 720, full_page = false }) => {
const instance = await getBrowser();
const page = await instance.newPage();
try {
await page.setViewport({ width, height });
await page.goto(url, {
waitUntil: "networkidle2",
timeout: 30000,
});
const screenshot = await page.screenshot({
type: format === "webp" ? "webp" : format,
fullPage: full_page,
encoding: "base64",
});
return {
content: [
{
type: "image",
data: screenshot,
mimeType: `image/${format}`,
},
],
};
} catch (error) {
return {
content: [
{
type: "text",
text: `Screenshot failed: ${error.message}`,
},
],
isError: true,
};
} finally {
await page.close();
}
}
);
// Graceful shutdown
process.on("SIGINT", async () => {
if (browser) await browser.close();
process.exit(0);
});
const transport = new StdioServerTransport();
await server.connect(transport);
This is a working screenshot MCP server. The key decisions:
Browser Instance Management
The getBrowser() function reuses a single Puppeteer browser instance across tool calls. Launching a new browser per screenshot takes 2 to 5 seconds. Reusing one brings that down to the page load time only. The browser.connected check handles cases where the browser process crashes between calls.
Each screenshot gets its own page (tab) via newPage(), and the finally block ensures it closes even if the screenshot fails. This prevents tab accumulation, which is one of the most common Puppeteer memory leak sources.
Error Handling
MCP tools can return errors by setting isError: true in the response. The client (and the AI model) sees the error message and can decide what to do. Common failures include:
- Navigation timeouts (page takes too long to load)
- Invalid URLs
- Pages that require authentication
- Chromium crashes under memory pressure
The try/catch/finally pattern here handles all of these gracefully.
Input Validation
Zod schemas handle input validation before your handler runs. If the AI sends width: -100, the MCP SDK rejects the call with a validation error before Puppeteer ever gets involved. This is one of MCP's advantages over raw function calling, where you typically validate inputs yourself.
Testing with Claude Desktop
Configure the server in Claude Desktop's MCP settings file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"screenshot": {
"command": "node",
"args": ["/absolute/path/to/server.js"]
}
}
}
Restart Claude Desktop. You should see the screenshot tool in the tools panel. Ask Claude to "take a screenshot of https://example.com" and it will call your MCP server.
Limitations of the Puppeteer Approach
Running Puppeteer locally works for development and personal use. In production, the problems pile up:
- Resource consumption: Each Chromium instance uses 100 to 300 MB of RAM. Concurrent screenshots multiply this.
- Chrome installation: The MCP server host needs Chromium installed. On some systems (minimal Docker containers, serverless environments), this is nontrivial.
- Font rendering: Different operating systems render fonts differently. A screenshot taken on macOS looks different from one taken on Linux.
- Maintenance: Chromium updates, security patches, and Puppeteer version compatibility are ongoing work.
- Cookie banners and popups: You need custom logic to dismiss consent dialogs, which varies by site.
Approach 2: Wrap a Screenshot API
The alternative is to skip local Chrome entirely and call a screenshot API from your MCP server. The API provider handles browser infrastructure, and your MCP server is just a thin wrapper.
Here is the same MCP server using SnapRender's API instead of Puppeteer:
// server-api.js
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const API_KEY = process.env.SNAPRENDER_API_KEY;
const BASE_URL = "https://app.snap-render.com/v1/screenshot";
const server = new McpServer({
name: "screenshot-server",
version: "1.0.0",
});
server.tool(
"capture_screenshot",
"Capture a screenshot of a web page using SnapRender API.",
{
url: z.string().url().describe("The URL to capture"),
format: z
.enum(["png", "jpeg", "webp", "pdf"])
.optional()
.describe("Output format (default: png)"),
width: z
.number()
.int()
.min(320)
.max(3840)
.optional()
.describe("Viewport width in pixels (default: 1280)"),
height: z
.number()
.int()
.min(200)
.max(2160)
.optional()
.describe("Viewport height in pixels (default: 720)"),
full_page: z
.boolean()
.optional()
.describe("Capture the full scrollable page (default: false)"),
block_ads: z
.boolean()
.optional()
.describe("Block ads and trackers (default: false)"),
dark_mode: z
.boolean()
.optional()
.describe("Emulate dark mode (default: false)"),
},
async ({
url,
format = "png",
width = 1280,
height = 720,
full_page = false,
block_ads = false,
dark_mode = false,
}) => {
const params = new URLSearchParams({
url,
format,
width: width.toString(),
height: height.toString(),
full_page: full_page.toString(),
block_ads: block_ads.toString(),
dark_mode: dark_mode.toString(),
});
try {
const response = await fetch(`${BASE_URL}?${params}`, {
headers: { "X-API-Key": API_KEY },
});
if (!response.ok) {
const error = await response.text();
return {
content: [{ type: "text", text: `API error ${response.status}: ${error}` }],
isError: true,
};
}
const buffer = await response.arrayBuffer();
const base64 = Buffer.from(buffer).toString("base64");
const mimeType =
format === "pdf" ? "application/pdf" : `image/${format}`;
return {
content: [
{
type: "image",
data: base64,
mimeType,
},
],
};
} catch (error) {
return {
content: [{ type: "text", text: `Request failed: ${error.message}` }],
isError: true,
};
}
}
);
const transport = new StdioServerTransport();
await server.connect(transport);
Notice what is missing compared to the Puppeteer version: no browser management, no Chrome launch flags, no page lifecycle handling. The server is a pure HTTP client. It runs anywhere Node.js runs, with zero system dependencies beyond node itself.
The API approach also gives you features that are painful to implement with raw Puppeteer: ad blocking, cookie banner removal, device emulation, and dark mode. These come from the API, not from custom Puppeteer scripts.
Approach 3: Use SnapRender's Published MCP Server
If you do not want to write any server code at all, SnapRender publishes a ready-made screenshot MCP server on npm. It exposes the full SnapRender API as MCP tools and works with Claude Desktop and any other MCP client.
Installation
npm install -g snaprender-mcp
The server is also listed on Smithery, PulseMCP, mcp.so, and mcpservers.org, so you can find it through any major MCP registry.
Configuration in Claude Desktop
Add this to your Claude Desktop MCP configuration:
{
"mcpServers": {
"snaprender": {
"command": "npx",
"args": ["-y", "snaprender-mcp"],
"env": {
"SNAPRENDER_API_KEY": "your-api-key-here"
}
}
}
}
Restart Claude Desktop, and screenshot tools appear automatically. Ask Claude to capture any URL and it will use the SnapRender screenshot MCP server behind the scenes.
What Tools It Exposes
The published server exposes all of SnapRender's capabilities as MCP tools: viewport configuration, device emulation, format selection (PNG, JPEG, WebP, PDF), full-page capture, ad blocking, dark mode, selector hiding, and cache control. Each feature is a parameter on the tool, with proper descriptions and validation.
Architecture Decision: Puppeteer vs API
The choice between running your own Chrome and calling an API comes down to a few factors:
| Factor | Puppeteer (DIY) | Screenshot API |
|---|---|---|
| Setup complexity | High (Chrome + dependencies) | Low (just an API key) |
| System requirements | Chrome, 300MB+ RAM per instance | Node.js only |
| Concurrent screenshots | Limited by host resources | Limited by API plan |
| Response time | 3-10s per capture | 2-5s fresh, under 200ms cached |
| Features | Whatever you build | Ad blocking, dark mode, device emulation included |
| Cost | Compute costs | API subscription ($0/mo for 500 captures) |
| Maintenance | Chrome updates, crash recovery, memory management | None |
For local development and experimentation, Puppeteer is fine. You learn how screenshot capture works, you control everything, and you pay nothing beyond your electricity.
For production use, especially when the MCP server runs on user machines (as Claude Desktop extensions do), the API approach wins. You cannot assume users have Chrome installed, have enough RAM for Puppeteer, or want to manage browser processes. An API call works anywhere.
Adding Rate Limiting
Whichever approach you choose, rate limiting protects your server from runaway agents. An AI model in a loop can generate hundreds of tool calls per minute if nothing stops it.
const RATE_LIMIT = 10; // max calls per minute
const callTimestamps = [];
function checkRateLimit() {
const now = Date.now();
const oneMinuteAgo = now - 60000;
// Remove timestamps older than 1 minute
while (callTimestamps.length > 0 && callTimestamps[0] < oneMinuteAgo) {
callTimestamps.shift();
}
if (callTimestamps.length >= RATE_LIMIT) {
return false;
}
callTimestamps.push(now);
return true;
}
// Inside your tool handler, before taking the screenshot:
if (!checkRateLimit()) {
return {
content: [{ type: "text", text: "Rate limit exceeded. Max 10 screenshots per minute." }],
isError: true,
};
}
This is a simple sliding-window rate limiter. For the API approach, the API provider also enforces rate limits, so you get two layers of protection.
Adding Tool Discovery Metadata
Good tool descriptions make or break the AI's ability to use your MCP server effectively. The model reads the tool description and parameter descriptions to decide when and how to call the tool. Vague descriptions lead to misuse.
Compare these two tool descriptions:
// Bad: too vague
"Take a screenshot"
// Good: specific about capabilities and output
"Capture a screenshot of a web page at a given URL.
Returns the image as base64-encoded data.
Supports PNG, JPEG, WebP, and PDF formats.
Can capture full scrollable pages up to 32,768px tall."
The detailed description tells the model exactly what the tool can do, so it knows when to call it (user wants to see a web page) and when not to (user wants to edit a local file).
Parameter descriptions matter too. Instead of "The width", write "Viewport width in pixels. Common values: 1280 (desktop), 768 (tablet), 375 (mobile)". The model uses these descriptions to pick sensible defaults even when the user does not specify a size.
Testing Your Server
Beyond Claude Desktop, you can test MCP servers programmatically using the MCP SDK's client:
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
const transport = new StdioClientTransport({
command: "node",
args: ["server.js"],
});
const client = new Client({ name: "test-client", version: "1.0.0" });
await client.connect(transport);
// List available tools
const tools = await client.listTools();
console.log("Available tools:", tools);
// Call the screenshot tool
const result = await client.callTool("capture_screenshot", {
url: "https://example.com",
format: "png",
width: 1280,
});
console.log("Result type:", result.content[0].type);
console.log("Data length:", result.content[0].data.length);
await client.close();
This is useful for automated testing. You can verify that your server starts correctly, responds to tool discovery, validates inputs, handles errors, and returns properly formatted screenshots.
Where to Go From Here
Building a screenshot MCP server is a good first MCP project because the tool interface is simple (URL in, image out) and the result is immediately useful. Once you have it working, consider extending it:
- Add a
capture_elementtool that takes a CSS selector and captures just that element - Add a
compare_screenshotstool that takes two URLs and highlights visual differences - Add a
capture_sequencetool that takes a list of URLs and returns all screenshots in one call - Expose device presets (iPhone, Pixel, iPad) as an enum parameter
Each of these is a natural extension that makes the screenshot MCP server more useful to AI agents. The pattern is always the same: define the tool schema, implement the handler, and the MCP protocol handles everything else.
Top comments (0)