How I Built a Claude Tool That Screenshots Every Page It Visits
Claude is incredibly powerful. I use it in Claude Desktop to browse websites, fill forms, extract data, monitor pages. But Claude is text-only.
When Claude clicks a button, it can't see the result. When Claude submits a form, it can't see the confirmation page. It's working blind.
So I built an MCP tool that changes that. After every browser action, it screenshots the page and shows Claude what it's looking at.
Now Claude can actually see.
The Problem: Claude's Tool Calls Are Invisible
Here's what using Claude with browser tools feels like:
- I ask Claude: "Check if that button is clickable"
- Claude calls my
inspect_pagetool - Tool returns:
{ button_found: true, clickable: true } - Claude continues
- But Claude has no idea what the button looks like
Claude is working from text descriptions. If my tool makes a mistake, Claude doesn't know. If the page renders differently than I described, Claude goes off the rails.
The Solution: Visual Feedback Loop
I added a screenshot tool. After Claude takes any action (click, fill, navigate), the tool screenshots the page and returns the image to Claude.
Now Claude sees the page, not just a text description.
const Claude can see it's looking at a form.
"Screenshot shows form with fields: Name, Email, Phone"
(Not: "Tool returned: { fields_found: 3 }")
This changes everything. Claude can verify its own actions.
Building the MCP Tool
Here's the MCP tool I built:
const { Server } = require('@modelcontextprotocol/sdk/server/index.js');
const { StdioServerTransport } = require('@modelcontextprotocol/sdk/server/stdio.js');
const {
ListToolsRequestSchema,
CallToolRequestSchema,
} = require('@modelcontextprotocol/sdk/types.js');
const fetch = require('node-fetch');
const fs = require('fs');
const PAGEBOLT_API_KEY = process.env.PAGEBOLT_API_KEY;
const server = new Server({
name: 'claude-screenshot-tool',
version: '1.0.0',
});
// List available tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
{
name: 'screenshot_page',
description: 'Take a screenshot of the current page to show Claude what it looks like',
inputSchema: {
type: 'object',
properties: {
url: {
type: 'string',
description: 'URL of the page to screenshot',
},
fullPage: {
type: 'boolean',
description: 'Capture full page (including lazy-loaded content)',
default: false,
},
},
required: ['url'],
},
},
],
};
});
// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === 'screenshot_page') {
const { url, fullPage } = request.params.arguments;
try {
const response = await fetch('https://api.pagebolt.dev/v1/screenshot', {
method: 'POST',
headers: {
'Authorization': `Bearer ${PAGEBOLT_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: url,
format: 'png',
width: 1280,
height: 720,
fullPage: fullPage || false,
blockBanners: true,
}),
});
if (!response.ok) {
return {
content: [
{
type: 'text',
text: `Screenshot failed: ${response.status}`,
},
],
};
}
const buffer = await response.arrayBuffer();
const base64 = Buffer.from(buffer).toString('base64');
return {
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: base64,
},
},
{
type: 'text',
text: `Screenshot of ${url} captured. Claude can now see the page.`,
},
],
};
} catch (error) {
return {
content: [
{
type: 'text',
text: `Error: ${error.message}`,
},
],
};
}
}
return {
content: [{ type: 'text', text: 'Tool not found' }],
};
});
const transport = new StdioServerTransport();
server.connect(transport);
The key parts:
- ListToolsRequestSchema — Tells Claude what tools are available
- CallToolRequestSchema — Handles tool calls from Claude
- Return image as base64 — Claude can display the screenshot inline
- blockBanners: true — Hide cookie popups so Claude sees clean pages
How Claude Uses It
Once I install this MCP tool in Claude Desktop, I can ask Claude to use it:
Me: "Navigate to example.com/form and take a screenshot"
Claude: navigates and calls the tool
Tool: returns screenshot as image
Claude: (seeing the image) "I can see a form with three fields: Name, Email, and Phone. Let me fill them in."
Claude can now see what it's doing. It can verify its actions. It can handle unexpected page layouts.
Real Example: Form Filling With Visual Verification
Without visual feedback, Claude's form-filling workflow looks like:
- Navigate to form page
- Fill "name" field with "John"
- Fill "email" field with "john@example.com"
- Fill "phone" field with "555-1234"
- Click submit button
- Hope the form submitted
With visual feedback:
- Navigate to form page
- Take screenshot → Claude sees the form
- Fill "name" field → Claude can read the placeholder text
- Take screenshot → Claude sees "Name" field is filled
- Fill "email" field → Claude verifies it's the email field
- Take screenshot → Claude sees both fields filled
- Fill "phone" field → Claude knows where to click
- Take screenshot → Claude sees confirmation message
- Claude confirms: "Form successfully submitted. I can see the thank you page."
No guessing. No blind steps. Claude sees what's happening.
Why This Matters
Claude is an agent now. It can browse, fill forms, extract data, monitor pages. But agents need feedback loops. Screenshots are that feedback.
When Claude can see:
- ✅ It makes better decisions (error detection)
- ✅ It recovers from mistakes (can see what went wrong)
- ✅ It adapts to page changes (sees dynamic content)
- ✅ It gains confidence in its own actions (sees results)
Installation
-
Save the MCP tool code to
claude-screenshot-tool.js - Set API key:
export PAGEBOLT_API_KEY=your_key_here
-
Add to Claude Desktop (
~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"claude-screenshot": {
"command": "node",
"args": ["/path/to/claude-screenshot-tool.js"]
}
}
}
- Restart Claude Desktop
- Claude can now call
screenshot_pagetool
Real-World Use Cases
Use Case 1: Website Monitoring
- Claude navigates a monitoring dashboard
- Takes screenshots at each step
- Detects visual changes (red alerts, status changes)
- Reports findings with visual evidence
Use Case 2: Competitor Analysis
- Claude visits competitor websites
- Screenshots pricing pages, feature lists
- Compares visually across competitors
- Summarizes findings with screenshot evidence
Use Case 3: Form Automation
- Claude fills complex forms
- Verifies each field visually
- Detects validation errors (sees red text, error messages)
- Retries intelligently based on visual feedback
Use Case 4: Content Extraction
- Claude navigates a site
- Screenshots key pages
- Extracts content with visual context
- Higher accuracy because it can see layout
Pricing
| Plan | Requests/Month | Cost | Best For |
|---|---|---|---|
| Free | 100 | $0 | Testing Claude tools |
| Starter | 5,000 | $29 | Small projects |
| Growth | 25,000 | $79 | Production Claude agents |
| Scale | 100,000 | $199 | Enterprise AI workflows |
Summary
- ✅ Claude is powerful but text-only
- ✅ MCP screenshot tool gives Claude visual awareness
- ✅ One tool call after each browser action
- ✅ Claude sees results, verifies actions, adapts intelligently
- ✅ 100+ requests/month free
Claude with visual feedback loops becomes smarter, more reliable, more capable.
Get started: Try PageBolt free — 100 requests/month, no credit card →
Top comments (0)