DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

How I Built a Claude Tool That Screenshots Every Page It Visits

How I Built a Claude Tool That Screenshots Every Page It Visits

Claude is incredibly powerful. I use it in Claude Desktop to browse websites, fill forms, extract data, monitor pages. But Claude is text-only.

When Claude clicks a button, it can't see the result. When Claude submits a form, it can't see the confirmation page. It's working blind.

So I built an MCP tool that changes that. After every browser action, it screenshots the page and shows Claude what it's looking at.

Now Claude can actually see.

The Problem: Claude's Tool Calls Are Invisible

Here's what using Claude with browser tools feels like:

  1. I ask Claude: "Check if that button is clickable"
  2. Claude calls my inspect_page tool
  3. Tool returns: { button_found: true, clickable: true }
  4. Claude continues
  5. But Claude has no idea what the button looks like

Claude is working from text descriptions. If my tool makes a mistake, Claude doesn't know. If the page renders differently than I described, Claude goes off the rails.

The Solution: Visual Feedback Loop

I added a screenshot tool. After Claude takes any action (click, fill, navigate), the tool screenshots the page and returns the image to Claude.

Now Claude sees the page, not just a text description.

const Claude can see it's looking at a form.
"Screenshot shows form with fields: Name, Email, Phone"
(Not: "Tool returned: { fields_found: 3 }")
Enter fullscreen mode Exit fullscreen mode

This changes everything. Claude can verify its own actions.

Building the MCP Tool

Here's the MCP tool I built:

const { Server } = require('@modelcontextprotocol/sdk/server/index.js');
const { StdioServerTransport } = require('@modelcontextprotocol/sdk/server/stdio.js');
const {
  ListToolsRequestSchema,
  CallToolRequestSchema,
} = require('@modelcontextprotocol/sdk/types.js');
const fetch = require('node-fetch');
const fs = require('fs');

const PAGEBOLT_API_KEY = process.env.PAGEBOLT_API_KEY;

const server = new Server({
  name: 'claude-screenshot-tool',
  version: '1.0.0',
});

// List available tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: 'screenshot_page',
        description: 'Take a screenshot of the current page to show Claude what it looks like',
        inputSchema: {
          type: 'object',
          properties: {
            url: {
              type: 'string',
              description: 'URL of the page to screenshot',
            },
            fullPage: {
              type: 'boolean',
              description: 'Capture full page (including lazy-loaded content)',
              default: false,
            },
          },
          required: ['url'],
        },
      },
    ],
  };
});

// Handle tool calls
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'screenshot_page') {
    const { url, fullPage } = request.params.arguments;

    try {
      const response = await fetch('https://api.pagebolt.dev/v1/screenshot', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${PAGEBOLT_API_KEY}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          url: url,
          format: 'png',
          width: 1280,
          height: 720,
          fullPage: fullPage || false,
          blockBanners: true,
        }),
      });

      if (!response.ok) {
        return {
          content: [
            {
              type: 'text',
              text: `Screenshot failed: ${response.status}`,
            },
          ],
        };
      }

      const buffer = await response.arrayBuffer();
      const base64 = Buffer.from(buffer).toString('base64');

      return {
        content: [
          {
            type: 'image',
            source: {
              type: 'base64',
              media_type: 'image/png',
              data: base64,
            },
          },
          {
            type: 'text',
            text: `Screenshot of ${url} captured. Claude can now see the page.`,
          },
        ],
      };
    } catch (error) {
      return {
        content: [
          {
            type: 'text',
            text: `Error: ${error.message}`,
          },
        ],
      };
    }
  }

  return {
    content: [{ type: 'text', text: 'Tool not found' }],
  };
});

const transport = new StdioServerTransport();
server.connect(transport);
Enter fullscreen mode Exit fullscreen mode

The key parts:

  1. ListToolsRequestSchema — Tells Claude what tools are available
  2. CallToolRequestSchema — Handles tool calls from Claude
  3. Return image as base64 — Claude can display the screenshot inline
  4. blockBanners: true — Hide cookie popups so Claude sees clean pages

How Claude Uses It

Once I install this MCP tool in Claude Desktop, I can ask Claude to use it:

Me: "Navigate to example.com/form and take a screenshot"

Claude: navigates and calls the tool

Tool: returns screenshot as image

Claude: (seeing the image) "I can see a form with three fields: Name, Email, and Phone. Let me fill them in."

Claude can now see what it's doing. It can verify its actions. It can handle unexpected page layouts.

Real Example: Form Filling With Visual Verification

Without visual feedback, Claude's form-filling workflow looks like:

  1. Navigate to form page
  2. Fill "name" field with "John"
  3. Fill "email" field with "john@example.com"
  4. Fill "phone" field with "555-1234"
  5. Click submit button
  6. Hope the form submitted

With visual feedback:

  1. Navigate to form page
  2. Take screenshot → Claude sees the form
  3. Fill "name" field → Claude can read the placeholder text
  4. Take screenshot → Claude sees "Name" field is filled
  5. Fill "email" field → Claude verifies it's the email field
  6. Take screenshot → Claude sees both fields filled
  7. Fill "phone" field → Claude knows where to click
  8. Take screenshot → Claude sees confirmation message
  9. Claude confirms: "Form successfully submitted. I can see the thank you page."

No guessing. No blind steps. Claude sees what's happening.

Why This Matters

Claude is an agent now. It can browse, fill forms, extract data, monitor pages. But agents need feedback loops. Screenshots are that feedback.

When Claude can see:

  • ✅ It makes better decisions (error detection)
  • ✅ It recovers from mistakes (can see what went wrong)
  • ✅ It adapts to page changes (sees dynamic content)
  • ✅ It gains confidence in its own actions (sees results)

Installation

  1. Save the MCP tool code to claude-screenshot-tool.js
  2. Set API key:
   export PAGEBOLT_API_KEY=your_key_here
Enter fullscreen mode Exit fullscreen mode
  1. Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
   {
     "mcpServers": {
       "claude-screenshot": {
         "command": "node",
         "args": ["/path/to/claude-screenshot-tool.js"]
       }
     }
   }
Enter fullscreen mode Exit fullscreen mode
  1. Restart Claude Desktop
  2. Claude can now call screenshot_page tool

Real-World Use Cases

Use Case 1: Website Monitoring

  • Claude navigates a monitoring dashboard
  • Takes screenshots at each step
  • Detects visual changes (red alerts, status changes)
  • Reports findings with visual evidence

Use Case 2: Competitor Analysis

  • Claude visits competitor websites
  • Screenshots pricing pages, feature lists
  • Compares visually across competitors
  • Summarizes findings with screenshot evidence

Use Case 3: Form Automation

  • Claude fills complex forms
  • Verifies each field visually
  • Detects validation errors (sees red text, error messages)
  • Retries intelligently based on visual feedback

Use Case 4: Content Extraction

  • Claude navigates a site
  • Screenshots key pages
  • Extracts content with visual context
  • Higher accuracy because it can see layout

Pricing

Plan Requests/Month Cost Best For
Free 100 $0 Testing Claude tools
Starter 5,000 $29 Small projects
Growth 25,000 $79 Production Claude agents
Scale 100,000 $199 Enterprise AI workflows

Summary

  • ✅ Claude is powerful but text-only
  • ✅ MCP screenshot tool gives Claude visual awareness
  • ✅ One tool call after each browser action
  • ✅ Claude sees results, verifies actions, adapts intelligently
  • ✅ 100+ requests/month free

Claude with visual feedback loops becomes smarter, more reliable, more capable.

Get started: Try PageBolt free — 100 requests/month, no credit card →

Top comments (0)