How to Build Claude Agents That Can Prove What They Actually Saw on the Web

#claude #agents #webautomation #visualproof

How to Build Claude Agents That Can Prove What They Actually Saw on the Web

Claude API agents are powerful reasoners. But when you ask them to navigate the web, grab data, or verify page state, you hit a problem: you can't see what they actually saw.

Your agent called a tool. It got back HTML. Did it parse the right element? Was the page interactive or did it timeout? Did a banner block the content? The LLM reasoned its way through, but you're left guessing whether the visual reality matched the HTML response.

This is where visual proof changes everything. Screenshot each tool call. Give Claude a mirror of what actually rendered.

The Problem: Agents Flying Blind

Imagine a Claude agent tasked with:

"Check if the signup form is active on example.com"
"Grab the price from the pricing table"
"Verify the login button changed after clicking it"

The agent's tools return raw HTML. But HTML isn't reality. CSS might hide elements. JavaScript might not have loaded. A modal might be blocking content. The agent reasons based on incomplete signals.

Result: False positives. Wrong decisions. Workflows that fail in production.

The Solution: Visual Documentation at Each Step

Add a screenshot to every tool call. When Claude asks the browser to fetch a page, capture what actually rendered. When it clicks a button, prove the page changed. The agent gets visual evidence, not just HTML.

Here's how.

Step 1: Add the PageBolt Screenshot Tool to Your Agent

import anthropic
import base64
import json

client = anthropic.Anthropic()

# Define the screenshot tool
tools = [
    {
        "name": "screenshot",
        "description": "Take a screenshot of a URL and return base64 image data",
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "URL to screenshot"
                },
                "width": {
                    "type": "integer",
                    "description": "Viewport width (default 1280)",
                    "default": 1280
                },
                "height": {
                    "type": "integer",
                    "description": "Viewport height (default 720)",
                    "default": 720
                }
            },
            "required": ["url"]
        }
    }
]

# Tool call handler
def take_screenshot(url, width=1280, height=720):
    """Call PageBolt API to capture screenshot"""
    import urllib.request
    import json

    api_key = "YOUR_API_KEY"  # Get from pagebolt.dev/dashboard

    payload = json.dumps({
        "url": url,
        "width": width,
        "height": height
    }).encode('utf-8')

    req = urllib.request.Request(
        'https://pagebolt.dev/api/v1/screenshot',
        data=payload,
        headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
        method='POST'
    )

    with urllib.request.urlopen(req) as resp:
        data = json.loads(resp.read())
        return data  # Returns {image: "base64_encoded_png", ...}

Step 2: Build the Agent Loop with Visual Checkpoints

def run_agent_with_visual_proof(user_query):
    """Agent that screenshots each action for verification"""

    messages = [
        {
            "role": "user",
            "content": user_query
        }
    ]

    screenshots = []  # Store visual proof

    while True:
        # Call Claude with tools enabled
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # If Claude wants to stop
        if response.stop_reason == "end_turn":
            final_text = next(
                (block.text for block in response.content if hasattr(block, 'text')),
                None
            )
            return {
                "result": final_text,
                "screenshots": screenshots  # Include visual proof
            }

        # Process tool calls
        if response.stop_reason == "tool_use":
            # Add Claude's response to message history
            messages.append({
                "role": "assistant",
                "content": response.content
            })

            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_input = block.input

                    # Execute the tool
                    if tool_name == "screenshot":
                        result = take_screenshot(**tool_input)

                        # Store screenshot for audit trail
                        screenshots.append({
                            "step": len(screenshots) + 1,
                            "url": tool_input["url"],
                            "image_base64": result["image"]
                        })

                        # Return to Claude: the image data so it can see what rendered
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": [
                                {
                                    "type": "image",
                                    "source": {
                                        "type": "base64",
                                        "media_type": "image/png",
                                        "data": result["image"]
                                    }
                                },
                                {
                                    "type": "text",
                                    "text": f"Screenshot captured at {tool_input['url']}"
                                }
                            ]
                        })

            # Add tool results to history
            messages.append({
                "role": "user",
                "content": tool_results
            })

# Run it
result = run_agent_with_visual_proof(
    "Check if the pricing page at https://example.com/pricing is loading correctly"
)

print("Agent response:", result["result"])
print(f"Captured {len(result['screenshots'])} screenshots for visual proof")

Step 3: Store and Audit

Each screenshot becomes part of your agent's audit trail. No more mystery about what the LLM saw.

import json

# Save the execution record
execution_record = {
    "agent_task": "Verify pricing page state",
    "timestamp": "2026-03-04T14:32:00Z",
    "steps": result["screenshots"],
    "agent_conclusion": result["result"]
}

with open("agent_execution_audit.json", "w") as f:
    json.dump(execution_record, f, indent=2)

Now you can:

Debug agent failures: "Oh, the agent didn't see the signup button because CSS hid it"
Audit agent decisions: "Here's proof of what the agent saw when it made decision X"
Build reliable workflows: "If the screenshot shows the error state, take this path instead"

Why This Matters

Claude agents are shipping now in production. Slack workflows. Customer support bots. Automation platforms. But without visual proof, they're black boxes.

Competing solutions (self-hosted Puppeteer, Selenium) give you the screenshot tool — but not the audit trail. You build the infrastructure, patch the libraries, debug the timeouts, manage the Chrome instances.

PageBolt gives you both: one API endpoint, instant visual proof, permanent audit history. Your agent sees it. You see it. No mysteries.

Try It Now

Get your API key at pagebolt.dev (free tier: 100 requests/month, no credit card)
Add the screenshot tool to your agent (copy-paste the code above)
Run your first Claude agent with visual proof

Then watch what actually happens when your agent navigates the web. Build more reliable workflows. Ship with confidence.

Your AI agents will thank you.