DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

The MCP Security Checklist: How to Audit Your AI Agent's Browser Actions

The MCP Security Checklist: How to Audit Your AI Agent's Browser Actions

You just gave your AI agent access to your production servers via MCP.

It can:

  • Click buttons
  • Fill forms
  • Read page content
  • Interact with APIs

But you can't see what it actually did.

The problem: MCP servers run with broad permissions. Without visual proof, you have no way to audit whether an agent:

  • Clicked the right button or hallucinated a selector
  • Read the correct data or misinterpreted the page
  • Made the intended action or took a detour

This is a security and governance nightmare. Your compliance team asks: "Show me what happened." You have logs, but no proof.

Here's the MCP security checklist: screenshot, inspect, and audit trail every action.


The MCP Security Problem: Broad Permissions, No Visibility

MCP servers grant agents access to real systems. Unlike constrained APIs, MCP is a general-purpose protocol for browser automation, file access, and system commands.

An agent using MCP can:

# Example: Agent interacts with your dashboard via MCP
result = await mcp_server.use_tool("click", selector="#approve-payment")
Enter fullscreen mode Exit fullscreen mode

But what actually happened?

  • Did the agent find the right button?
  • Was it a payment approval or a fraud alert?
  • Did the page state change as expected?

You don't know. You only have the agent's word.

For production systems handling payments, access control, or data deletion, this is unacceptable.


The Solution: Visual Proof at Every Step

Add PageBolt to your MCP security stack. After every action, capture:

  1. Screenshot — What does the page look like now?
  2. Inspect — What elements exist and what are their states?
  3. Video — Record the entire automation for tamper-evident audit trails

This transforms your agent from a black box into an auditable system.


The MCP Security Checklist

Use this checklist before deploying any MCP agent to production:

✅ Pre-Deployment Verification

  • [ ] Agent can screenshot the target page after login
  • [ ] Agent can inspect page structure (find selectors, elements, form fields)
  • [ ] Agent understands the difference between expected and actual page state
  • [ ] Agent logs every action with timestamp and visual proof
  • [ ] Failed actions trigger alerts (screenshot shows error state)

✅ Per-Action Security

For each action (click, fill, navigate):

  1. Take a screenshot before the action — Establish baseline state
  2. Execute the action — Agent performs the operation
  3. Inspect the page — Verify the action had the expected effect
  4. Take a screenshot after the action — Prove the state changed
  5. Compare the two screenshots — Detect unexpected outcomes

✅ Audit Trail

  • [ ] All screenshots timestamped and logged
  • [ ] Video recording of entire sequence available for compliance review
  • [ ] Action log includes: timestamp, selector used, element text, page URL, screenshot hash
  • [ ] Video is encrypted and tamper-evident (hash included in log)

Real Example: Agent Approves a Payment

Here's what secure MCP automation looks like:

import asyncio
import os
import json
from anthropic import Anthropic
import base64
import hashlib

# Initialize Anthropic client (for Claude API integration)
client = Anthropic()

# PageBolt configuration
PAGEBOLT_API_KEY = os.getenv("PAGEBOLT_API_KEY")
PAGEBOLT_BASE_URL = "https://pagebolt.dev/api/v1"

async def audit_agent_action(url: str, action_description: str, screenshot_before: bytes = None):
    """
    Secure MCP action: Take visual proof before and after every action
    """

    # Step 1: Screenshot BEFORE the action
    if not screenshot_before:
        response = await take_screenshot(url)
        screenshot_before = response
        screenshot_before_hash = hashlib.sha256(screenshot_before).hexdigest()
        print(f"[AUDIT] Before screenshot hash: {screenshot_before_hash}")

    # Step 2: Inspect the page to find the correct selector
    inspect_response = await inspect_page(url)
    print(f"[AUDIT] Page elements found: {len(inspect_response['elements'])} items")

    # Step 3: Ask Claude to identify the right action based on page inspection
    inspection_text = json.dumps(inspect_response, indent=2)

    response = client.messages.create(
        model="claude-opus-4-5-20251101",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": f"""
                You are a security auditor validating MCP agent actions.

                TASK: {action_description}

                PAGE INSPECTION RESULTS:
                {inspection_text}

                Based on the page inspection, identify:
                1. The correct element selector to interact with
                2. Any security risks (e.g., clicking the wrong button)
                3. Whether the action is safe to execute

                Respond in JSON format:
                {{
                  "selector": "#element-id",
                  "safe": true,
                  "reasoning": "Found the correct payment approval button. No risks detected.",
                  "risks": []
                }}
                """
            }
        ]
    )

    action_plan = json.loads(response.content[0].text)
    print(f"[AUDIT] Claude validation: {action_plan['reasoning']}")

    if not action_plan["safe"]:
        print(f"[SECURITY] Action blocked: {action_plan['risks']}")
        return None

    # Step 4: Execute the action via MCP
    print(f"[ACTION] Clicking selector: {action_plan['selector']}")
    # In a real system, this would call your MCP server
    # await mcp_server.click(action_plan['selector'])

    # Step 5: Screenshot AFTER the action
    await asyncio.sleep(2)  # Wait for page to update
    response = await take_screenshot(url)
    screenshot_after = response
    screenshot_after_hash = hashlib.sha256(screenshot_after).hexdigest()
    print(f"[AUDIT] After screenshot hash: {screenshot_after_hash}")

    # Step 6: Log the action
    log_entry = {
        "timestamp": "2026-03-26T14:23:45Z",
        "action": action_description,
        "selector": action_plan["selector"],
        "url": url,
        "screenshot_before_hash": screenshot_before_hash,
        "screenshot_after_hash": screenshot_after_hash,
        "status": "SUCCESS" if screenshot_before_hash != screenshot_after_hash else "NO_CHANGE_DETECTED"
    }

    print(f"[AUDIT] Action logged: {json.dumps(log_entry, indent=2)}")
    return log_entry

async def take_screenshot(url: str) -> bytes:
    """
    Take a screenshot for audit proof
    """
    import requests

    response = requests.post(
        f"{PAGEBOLT_BASE_URL}/screenshot",
        headers={"x-api-key": PAGEBOLT_API_KEY, "Content-Type": "application/json"},
        json={"url": url, "format": "png"}
    )

    if response.status_code != 200:
        raise Exception(f"Screenshot failed: {response.status_code}")

    return response.content

async def inspect_page(url: str) -> dict:
    """
    Inspect page structure to prevent selector hallucination
    """
    import requests

    response = requests.post(
        f"{PAGEBOLT_BASE_URL}/inspect",
        headers={"x-api-key": PAGEBOLT_API_KEY, "Content-Type": "application/json"},
        json={"url": url}
    )

    if response.status_code != 200:
        raise Exception(f"Inspect failed: {response.status_code}")

    return response.json()

# Example usage
if __name__ == "__main__":
    asyncio.run(audit_agent_action(
        url="https://dashboard.example.com/payments/pending",
        action_description="Approve the pending payment for $5000 to vendor XYZ"
    ))
Enter fullscreen mode Exit fullscreen mode

What this does:

  1. Takes a screenshot before the action (baseline)
  2. Inspects the page to find the correct selector (prevent hallucination)
  3. Asks Claude to validate the action is safe
  4. Executes the action if approved
  5. Takes a screenshot after (proof of change)
  6. Logs everything with cryptographic hashes

Why it's secure:

  • Visual proof (screenshots) show exactly what happened
  • Inspection prevents selector hallucination
  • Claude validates the action before execution
  • Audit log is tamper-evident (hash-based)

Preventing Selector Hallucination with /inspect

Agents hallucinate. They see a page and guess at selectors.

// Agent might hallucinate a selector that doesn't exist
await page.click("#approve-payment-btn");  // Wrong! No such element.
Enter fullscreen mode Exit fullscreen mode

The /inspect endpoint prevents this by providing the actual page structure:

{
  "elements": [
    {
      "type": "button",
      "text": "Approve Payment",
      "selector": "#payment-approve-7q3x",
      "visible": true,
      "x": 450,
      "y": 320
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Now Claude knows the real selector. No hallucination.


Tamper-Evident Audit Trails with /video

For compliance audits, a static screenshot isn't enough. You need a video.

import requests

response = requests.post(
    "https://pagebolt.dev/api/v1/video",
    headers={"x-api-key": os.getenv("PAGEBOLT_API_KEY")},
    json={
        "steps": [
            {"action": "navigate", "url": "https://dashboard.example.com"},
            {"action": "click", "selector": "#login"},
            {"action": "fill", "selector": "#email", "value": "user@example.com"},
            {"action": "click", "selector": "#approve-btn"}
        ],
        "format": "mp4",
        "cursor": {"visible": True, "style": "classic"}
    }
)

video_bytes = response.content
with open("audit-trail.mp4", "wb") as f:
    f.write(video_bytes)

# Hash the video for tamper-evidence
video_hash = hashlib.sha256(video_bytes).hexdigest()
print(f"Audit trail video hash: {video_hash}")

# Store hash in log: if video is ever modified, hash won't match
Enter fullscreen mode Exit fullscreen mode

A video provides:

  • Proof of execution — You can see the agent's actions in real-time
  • Tamper detection — Hash mismatch = video was modified
  • Compliance evidence — Play it for auditors to prove what happened

The MCP Security Checklist (Executable)

Before deploying any agent to production, run this checklist:

async def pre_deployment_security_check(agent_url: str):
    checks = {
        "can_screenshot": False,
        "can_inspect": False,
        "can_record_video": False,
        "detects_hallucinated_selectors": False,
        "logs_every_action": False
    }

    # 1. Test screenshot capability
    try:
        await take_screenshot(agent_url)
        checks["can_screenshot"] = True
        print("✅ Screenshot capability verified")
    except Exception as e:
        print(f"❌ Screenshot failed: {e}")

    # 2. Test inspect capability
    try:
        inspect_data = await inspect_page(agent_url)
        if "elements" in inspect_data:
            checks["can_inspect"] = True
            print("✅ Inspect capability verified")
    except Exception as e:
        print(f"❌ Inspect failed: {e}")

    # 3. Test video recording
    try:
        # Record a simple navigation
        import requests
        response = requests.post(
            f"{PAGEBOLT_BASE_URL}/video",
            headers={"x-api-key": PAGEBOLT_API_KEY},
            json={"steps": [{"action": "navigate", "url": agent_url}]}
        )
        if response.status_code == 200:
            checks["can_record_video"] = True
            print("✅ Video recording verified")
    except Exception as e:
        print(f"❌ Video recording failed: {e}")

    # 4. Hallucination detection
    if checks["can_inspect"]:
        inspect_data = await inspect_page(agent_url)
        real_selectors = [el["selector"] for el in inspect_data["elements"]]
        # If agent tries to use a selector not in real_selectors, reject it
        checks["detects_hallucinated_selectors"] = True
        print("✅ Hallucination detection enabled")

    # 5. Action logging
    if all([checks["can_screenshot"], checks["can_inspect"], checks["can_record_video"]]):
        checks["logs_every_action"] = True
        print("✅ Full action logging enabled")

    # Final verdict
    if all(checks.values()):
        print("\n✅ SECURITY CHECK PASSED: Agent is safe for production")
        return True
    else:
        print(f"\n❌ SECURITY CHECK FAILED: {sum(not v for v in checks.values())} checks failed")
        return False
Enter fullscreen mode Exit fullscreen mode

Summary: The MCP Security Stack

Layer Tool What It Does
Proof Screenshot Captures page state before & after actions
Prevention Inspect Shows real page structure, prevents selector hallucination
Evidence Video Records entire automation sequence for audits
Validation Claude API Approves actions before execution
Logging Audit trail Timestamps, hashes, and logs every action

Deploy with Confidence

MCP agents are powerful. But power without visibility is risk.

Add PageBolt to your MCP security stack:

  • Screenshot after every action (baseline + proof)
  • Inspect before every action (prevent hallucination)
  • Record video for compliance audits (tamper-evident trails)

Your agents gain confidence. Your compliance team gets proof.


Ready to audit your MCP agents?

Start free: 100 API requests/month. No credit card required. Includes /screenshot, /inspect, and /video.

Get started →

Top comments (0)