How to Debug Cursor Agents That Make Wrong Decisions — With Visual Proof
You ask a Cursor agent to fetch pricing from a competitor's website. It runs through several steps, hits an API endpoint, parses the response.
Then it reports back: "Pricing not found."
But you know the pricing is there. You've seen it on the website. What went wrong?
The problem: you can't see what the agent actually saw.
Cursor agents execute in the background. They call tools. They get responses. They make decisions. But you're flying blind. Did the page load? Was the data in the HTML? Did the agent parse it correctly? Did a modal block the content?
Without visual proof, debugging is guesswork.
The Root Cause: Invisible Agent Execution
Cursor's agent framework is powerful. You define goals. The agent breaks them into steps. It calls tools, processes responses, adapts.
But the intermediate execution is invisible. You see the final result. You don't see:
- What HTML the agent parsed
- What CSS was applied (hidden vs visible elements)
- Whether JavaScript loaded the data
- What the agent actually "saw" when making decisions
Result: when an agent fails, you have no evidence. You can't trace the decision path.
The Solution: Screenshots at Every Checkpoint
Add a screenshot to every agent step. When the agent calls a tool, capture visual proof of what it's working with. When it parses a response, screenshot what actually rendered.
Now you can:
- See what the agent saw: visual proof of page state at each checkpoint
- Trace decision failures: "The agent didn't parse the price because CSS hid it"
- Debug faster: reproduce the exact conditions the agent faced
- Fix confidently: know exactly why it failed, fix the root cause
Real-World Example: Debugging a Pricing Scrape
Cursor agent task: "Fetch competitor pricing from example.com/pricing and report."
The agent fails. Pricing shows as "Not found". Here's how to debug with screenshots:
import anthropic
import json
import urllib.request
import base64
client = anthropic.Anthropic()
def get_screenshot(url):
"""Capture visual proof of page state"""
api_key = "YOUR_API_KEY" # pagebolt.dev API key
payload = json.dumps({"url": url}).encode('utf-8')
req = urllib.request.Request(
'https://pagebolt.dev/api/v1/screenshot',
data=payload,
headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
method='POST'
)
with urllib.request.urlopen(req) as resp:
result = json.loads(resp.read())
return {
"image_base64": result["image"],
"url": url
}
def debug_agent_execution():
"""Step through Cursor agent with visual checkpoints"""
tools = [
{
"name": "screenshot",
"description": "Capture visual proof of webpage",
"input_schema": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL to screenshot"}
},
"required": ["url"]
}
}
]
# Initial task
messages = [
{
"role": "user",
"content": "Go to https://example.com/pricing and find the pricing information. Take screenshots at each step to prove what you see."
}
]
execution_log = []
while True:
# Agent processes
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
# Check if agent is done
if response.stop_reason == "end_turn":
final_response = next(
(block.text for block in response.content if hasattr(block, 'text')),
None
)
return {
"final_result": final_response,
"execution_log": execution_log
}
# Process tool calls
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use" and block.name == "screenshot":
# Take screenshot
screenshot_data = get_screenshot(block.input["url"])
# Log for debugging
execution_log.append({
"step": len(execution_log) + 1,
"action": "screenshot",
"url": block.input["url"],
"image": screenshot_data["image_base64"]
})
# Return to agent
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_data["image_base64"]
}
},
{
"type": "text",
"text": f"Screenshot of {block.input['url']} captured"
}
]
})
messages.append({"role": "user", "content": tool_results})
# Run debug session
result = debug_agent_execution()
# Now you have:
# 1. The final result
# 2. Visual proof at each step (execution_log contains screenshots)
# 3. Evidence of what the agent saw
print("Agent Result:", result["final_result"])
print(f"\nExecution Steps: {len(result['execution_log'])}")
for step in result['execution_log']:
print(f" Step {step['step']}: {step['action']} at {step['url']}")
What this gives you:
- Screenshot at each checkpoint
- Proof of what HTML rendered
- Evidence of CSS display states
- Visual record of agent decision context
Now when the agent fails to find pricing, you have visual proof of why. "The price was in the HTML but CSS hid it." Or "The page never loaded the price element."
Why This Matters for Cursor Developers
Cursor agents are powerful but opaque. You deploy them and hope they work. When they fail, debugging is painful.
With screenshots, you debug like a human would: look at what the agent saw, understand why it made that decision.
Try It Now
- Get API key at pagebolt.dev (free: 100 requests/month, no credit card)
- Add the screenshot tool to your Cursor agent
- Add screenshots to each checkpoint
- Run your agent and inspect the visual log
Next time it fails, you'll know exactly why.
Debug with confidence. Ship Cursor agents that actually work.
Top comments (0)