Custodia-Admin

Posted on Mar 4 • Originally published at pagebolt.dev

How to Debug Cursor Agents That Make Wrong Decisions — With Visual Proof

#cursor #agents #debugging #ai

How to Debug Cursor Agents That Make Wrong Decisions — With Visual Proof

You ask a Cursor agent to fetch pricing from a competitor's website. It runs through several steps, hits an API endpoint, parses the response.

Then it reports back: "Pricing not found."

But you know the pricing is there. You've seen it on the website. What went wrong?

The problem: you can't see what the agent actually saw.

Cursor agents execute in the background. They call tools. They get responses. They make decisions. But you're flying blind. Did the page load? Was the data in the HTML? Did the agent parse it correctly? Did a modal block the content?

Without visual proof, debugging is guesswork.

The Root Cause: Invisible Agent Execution

Cursor's agent framework is powerful. You define goals. The agent breaks them into steps. It calls tools, processes responses, adapts.

But the intermediate execution is invisible. You see the final result. You don't see:

What HTML the agent parsed
What CSS was applied (hidden vs visible elements)
Whether JavaScript loaded the data
What the agent actually "saw" when making decisions

Result: when an agent fails, you have no evidence. You can't trace the decision path.

The Solution: Screenshots at Every Checkpoint

Add a screenshot to every agent step. When the agent calls a tool, capture visual proof of what it's working with. When it parses a response, screenshot what actually rendered.

Now you can:

See what the agent saw: visual proof of page state at each checkpoint
Trace decision failures: "The agent didn't parse the price because CSS hid it"
Debug faster: reproduce the exact conditions the agent faced
Fix confidently: know exactly why it failed, fix the root cause

Real-World Example: Debugging a Pricing Scrape

Cursor agent task: "Fetch competitor pricing from example.com/pricing and report."

The agent fails. Pricing shows as "Not found". Here's how to debug with screenshots:

import anthropic
import json
import urllib.request
import base64

client = anthropic.Anthropic()

def get_screenshot(url):
    """Capture visual proof of page state"""
    api_key = "YOUR_API_KEY"  # pagebolt.dev API key

    payload = json.dumps({"url": url}).encode('utf-8')
    req = urllib.request.Request(
        'https://pagebolt.dev/api/v1/screenshot',
        data=payload,
        headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
        method='POST'
    )

    with urllib.request.urlopen(req) as resp:
        result = json.loads(resp.read())
        return {
            "image_base64": result["image"],
            "url": url
        }

def debug_agent_execution():
    """Step through Cursor agent with visual checkpoints"""

    tools = [
        {
            "name": "screenshot",
            "description": "Capture visual proof of webpage",
            "input_schema": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL to screenshot"}
                },
                "required": ["url"]
            }
        }
    ]

    # Initial task
    messages = [
        {
            "role": "user",
            "content": "Go to https://example.com/pricing and find the pricing information. Take screenshots at each step to prove what you see."
        }
    ]

    execution_log = []

    while True:
        # Agent processes
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # Check if agent is done
        if response.stop_reason == "end_turn":
            final_response = next(
                (block.text for block in response.content if hasattr(block, 'text')),
                None
            )

            return {
                "final_result": final_response,
                "execution_log": execution_log
            }

        # Process tool calls
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type == "tool_use" and block.name == "screenshot":
                    # Take screenshot
                    screenshot_data = get_screenshot(block.input["url"])

                    # Log for debugging
                    execution_log.append({
                        "step": len(execution_log) + 1,
                        "action": "screenshot",
                        "url": block.input["url"],
                        "image": screenshot_data["image_base64"]
                    })

                    # Return to agent
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": [
                            {
                                "type": "image",
                                "source": {
                                    "type": "base64",
                                    "media_type": "image/png",
                                    "data": screenshot_data["image_base64"]
                                }
                            },
                            {
                                "type": "text",
                                "text": f"Screenshot of {block.input['url']} captured"
                            }
                        ]
                    })

            messages.append({"role": "user", "content": tool_results})

# Run debug session
result = debug_agent_execution()

# Now you have:
# 1. The final result
# 2. Visual proof at each step (execution_log contains screenshots)
# 3. Evidence of what the agent saw

print("Agent Result:", result["final_result"])
print(f"\nExecution Steps: {len(result['execution_log'])}")
for step in result['execution_log']:
    print(f"  Step {step['step']}: {step['action']} at {step['url']}")

What this gives you:

Screenshot at each checkpoint
Proof of what HTML rendered
Evidence of CSS display states
Visual record of agent decision context

Now when the agent fails to find pricing, you have visual proof of why. "The price was in the HTML but CSS hid it." Or "The page never loaded the price element."

Why This Matters for Cursor Developers

Cursor agents are powerful but opaque. You deploy them and hope they work. When they fail, debugging is painful.

With screenshots, you debug like a human would: look at what the agent saw, understand why it made that decision.

Try It Now

Get API key at pagebolt.dev (free: 100 requests/month, no credit card)
Add the screenshot tool to your Cursor agent
Add screenshots to each checkpoint
Run your agent and inspect the visual log

Next time it fails, you'll know exactly why.

Debug with confidence. Ship Cursor agents that actually work.

DEV Community

How to Debug Cursor Agents That Make Wrong Decisions — With Visual Proof

How to Debug Cursor Agents That Make Wrong Decisions — With Visual Proof

The Root Cause: Invisible Agent Execution

The Solution: Screenshots at Every Checkpoint

Real-World Example: Debugging a Pricing Scrape

Why This Matters for Cursor Developers

Try It Now

Top comments (0)