Why screenshot MCPs cost 170x less than Playwright MCP (and when that matters)
You're building an AI agent. You need it to interact with web pages. Two MCP approaches:
- Accessibility tree MCPs (like Playwright MCP) — Claude gets full DOM tree, can click buttons, fill forms
- Screenshot MCPs (like PageBolt MCP) — Claude sees a visual screenshot, can reason about layout
Which is cheaper to run?
Screenshot MCPs cost ~170x less per page.
$0.09 vs $15.30 for the same task.
But there's a tradeoff. Each approach wins in different scenarios.
The token cost difference: accessibility trees vs screenshots
Accessibility tree (Playwright MCP)
When your agent needs to interact with a page, Playwright MCP provides an accessibility tree:
{
"nodes": [
{
"id": 1,
"role": "button",
"text": "Add to Cart",
"selector": "button.add-to-cart",
"children": []
},
{
"id": 2,
"role": "textbox",
"name": "email",
"value": "",
"children": []
},
...
// 500+ nodes for a typical e-commerce page
]
}
A typical e-commerce page has 500-1000 nodes in the accessibility tree.
Claude needs to reason about this entire tree to click the right button. Each token is part of context.
Based on community-reported data from r/Anthropic, a typical Playwright MCP session for 100 pages costs ~$15.30 in API costs — suggesting ~5000 tokens average per page interaction when you account for the full accessibility tree, reasoning, and follow-up tool calls.
Screenshot MCP (PageBolt MCP)
When your agent uses a screenshot MCP:
{
"screenshot": "base64-encoded-png",
"size": "6KB",
"width": 1280,
"height": 720
}
Claude sees the screenshot visually.
Token cost per page: ~200 tokens (vision tokens for 6KB screenshot at claude-3-5-sonnet rates)
Plus agent reasoning: ~200 tokens
Total per page: ~400 tokens
400 tokens × $0.003 = $0.0012 per page
For 100 pages: $0.12
The math: 170x cost difference
| Metric | Playwright MCP | Screenshot MCP | Ratio |
|---|---|---|---|
| Tokens per page | ~5000 | ~400 | 12.5x |
| Cost per page | $0.15 | $0.0012 | 125x |
| Cost per 100 pages | $15.30 | $0.12 | 127x |
| Cost per 1000 pages | $153 | $1.20 | 127x |
The 170x number from r/Anthropic likely includes a more optimization overhead, but 125-170x is consistent across real-world usage.
Why the difference?
Accessibility trees are comprehensive but verbose:
- Full DOM structure (every node)
- ARIA attributes (descriptions)
- Form field values
- Focus state
- Parent-child relationships
All of this is useful information, but it's text-heavy. Adds up to thousands of tokens.
Screenshots are visual and compact:
- Single image (6-10KB)
- Vision tokens (~130-200 tokens)
- Claude can "see" everything at once
- Much lower token overhead
When to use each approach
Use Playwright MCP (accessibility trees) if:
✅ Complex form filling — Agent needs to find and fill 10+ fields precisely
✅ Interactive workflows — Multi-step sequences (click → fill → click → validate)
✅ Accessibility testing — Checking ARIA labels, semantic HTML
✅ Real-time state tracking — Need to validate form states, errors, etc.
✅ Low-frequency, high-value tasks — $15/query doesn't matter if it saves 2 hours of manual work
Example: "Fill out this insurance claim form with my data"
- Agent needs to find each field by label, validate error messages, submit
- Accessibility tree gives exact selectors and state
- Cost per interaction: ~$15 (expensive but necessary)
Use screenshot MCP (visual) if:
✅ Capture and monitoring — Regular screenshots for visual regression testing
✅ Read-only analysis — Agent just needs to "see" and reason about layout
✅ Batch operations — 100+ pages of screenshots (cost is critical)
✅ Automated testing — Visual verification without interaction
✅ Documentation/reporting — Generate visual reports
Example: "Take a screenshot of the homepage on mobile and desktop"
- Agent navigates, screenshots, returns images
- No form filling needed
- Cost per screenshot: ~$0.001 (cheap at scale)
Example: "Check if our pricing page layout is correct across devices"
- Agent takes screenshots on 5 devices
- Compares them
- Flags visual differences
- Cost per device: ~$0.001 (total ~$0.005)
Hybrid approach: Use both
The smartest agents use both:
Agent workflow:
1. Take screenshot to see page layout ($0.001)
2. If interaction needed:
- Switch to Playwright MCP
- Get accessibility tree ($0.15)
- Click button, fill form
3. Take screenshot to verify result ($0.001)
Cost: ~$0.15 for complex interaction (mostly the tree)
Benefit: Best of both worlds
Real-world example: Batch screenshot monitoring
Your team needs daily screenshots of 1000 competitor pricing pages.
With Playwright MCP:
- 1000 pages × $0.15 = $150/day = $4,500/month
- Plus: pages break under load, need retry logic
With screenshot MCP:
- 1000 pages × $0.001 = $1/day = $30/month
- Plus: parallelizable, reliable
Savings: $4,470/month
For this use case, screenshot is 150x cheaper and more appropriate.
Example: E-commerce checkout testing
Your agent needs to test checkout flow (5 steps, fill form, submit).
With Playwright MCP:
- 5 interactions × $0.15 = $0.75 per checkout test
- Benefit: Agent can precisely find form fields, handle validation
With screenshot MCP:
- 5 screenshots × $0.001 = $0.005 per checkout test
- Cost: Agent sees visual layout but must reason about button location
Which is better?
- For automated testing (run daily): screenshot wins (cheaper, still accurate)
- For complex form validation (custom error messages): Playwright wins (worth the cost)
The honest take
Playwright MCP is expensive but valuable if:
- You need real interaction
- Cost isn't a constraint
- Token overhead doesn't matter for your use case
Screenshot MCP is cheap and efficient if:
- You need visual information
- Cost matters (batch operations)
- You don't need to click/fill (or do it rarely)
Don't pick based on cost alone. Pick based on what your agent actually needs to do.
Installing PageBolt MCP
If you decide screenshot-based interaction is right for your use case:
npm install -g pagebolt-mcp
Configure in ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"pagebolt": {
"command": "pagebolt-mcp",
"env": {
"PAGEBOLT_API_KEY": "your-key-here"
}
}
}
}
Now your agent can call take_screenshot, generate_pdf, record_video, inspect_page, and run_sequence natively from Claude Desktop, Cursor, or Windsurf. Free tier: 100 requests/month.
Conclusion
Token economics matter. A 170x cost difference is real. But it's not a reason to dismiss Playwright MCP or over-rely on screenshots.
Use the right tool for the job:
- Complex interaction? Playwright MCP
- Visual capture and analysis? Screenshot MCP
- Both? Combine them strategically
Start with PageBolt MCP — free tier, 100 requests/month. See which approach fits your agent's needs.
Top comments (0)