Why AI Agents Use inspect_page Instead of Dumping the Full DOM
You're building a Claude agent to automate web tasks. The agent needs to navigate a page and interact with buttons, forms, and links.
Your first instinct: get the full HTML and let Claude parse it.
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{
"role": "user",
"content": f"Here's the page HTML:\n\n{full_html_dump}"
}]
)
Simple. Direct. Extremely expensive.
That full HTML dump is 8,000+ tokens. Claude charges $3 per 1M input tokens. One page = $0.03 per agent query. Scale to 100 queries a day, and you're spending $3/day just on token overhead.
But here's the thing: Claude doesn't need the full DOM. It needs to know what it can interact with.
The Problem: DOM Bloat
A typical website's HTML includes:
- Layout divs (nesting chains, 50+ levels deep)
- CSS classes and inline styles (framework boilerplate, Tailwind utilities)
- Script tags and data attributes
- Comment nodes and meta tags
- Images, videos, analytics trackers
Result: 10,000+ DOM nodes for a page that has maybe 50 interactive elements.
The agent needs to know:
- Button at coordinates X saying "Submit"
- Input field for "email"
- Link to "/checkout"
It doesn't need to know:
- The 200-line CSS in a
<style>tag - The 500 nested divs from the framework
- The tracking pixels and analytics
But when you dump the full DOM, Claude has to parse all of it. Tokens wasted. Money wasted.
The Solution: Structured Element Inspection
Instead of dumping the full DOM, inspect only the interactive elements.
PageBolt's inspect_page does exactly this:
import json
import urllib.request
def inspect_page(url):
"""Get structured map of interactive elements only"""
api_key = "YOUR_API_KEY" # pagebolt.dev
payload = json.dumps({"url": url}).encode()
req = urllib.request.Request(
'https://pagebolt.dev/api/v1/inspect',
data=payload,
headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
method='POST'
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())
Returns:
{
"buttons": [
{"text": "Submit", "selector": "#submit-btn", "type": "primary"},
{"text": "Cancel", "selector": ".cancel-btn", "type": "secondary"}
],
"inputs": [
{"name": "email", "selector": "#email-field", "type": "email"},
{"name": "password", "selector": "#password-field", "type": "password"}
],
"links": [
{"text": "Forgot password?", "href": "/forgot", "selector": "a.forgot"}
],
"headings": [
{"text": "Sign In", "level": "h1"}
]
}
That's 500 tokens instead of 8,000.
Token Cost Comparison
Full DOM approach:
Page HTML: 8,000 tokens
Agent reasoning: 200 tokens
Response: 100 tokens
TOTAL: 8,300 tokens per query
inspect_page approach:
Structured element map: 500 tokens
Agent reasoning: 200 tokens
Response: 100 tokens
TOTAL: 800 tokens per query
Savings: 90% reduction in tokens
100 agent queries:
- Full DOM: $0.25
- inspect_page: $0.02
Scale to 10,000 queries a month, and you're saving $75/month. For a startup, that's meaningful.
Real Example: Multi-Page Automation
Build an agent that navigates 5 pages to complete a workflow:
import anthropic
import json
import urllib.request
client = anthropic.Anthropic()
def inspect_page(url):
api_key = "YOUR_API_KEY"
payload = json.dumps({"url": url}).encode()
req = urllib.request.Request(
'https://pagebolt.dev/api/v1/inspect',
data=payload,
headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
method='POST'
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())
def automate_workflow(task):
"""Agent navigates multiple pages efficiently"""
pages = [
"https://example.com/login",
"https://example.com/account",
"https://example.com/settings",
"https://example.com/billing",
"https://example.com/confirm"
]
total_tokens_used = 0
for page_url in pages:
# Inspect the page (not dump full HTML)
page_elements = inspect_page(page_url)
# Send structured elements to Claude
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""
Task: {task}
Current page elements:
{json.dumps(page_elements, indent=2)}
What should the agent do next? Respond with a single action."""
}]
)
# Track tokens
total_tokens_used += response.usage.input_tokens + response.usage.output_tokens
print(f"Page {page_url}: {response.content[0].text}")
print(f"\nTotal tokens for 5-page workflow: {total_tokens_used}")
print(f"Cost: ${(total_tokens_used / 1_000_000) * 3:.3f}")
# Run the agent
automate_workflow("Complete the account setup and enable 2FA")
With full DOM: ~41,500 tokens, ~$0.12
With inspect_page: ~4,000 tokens, ~$0.01
That's 10x cheaper. Same automation. Same results.
Why This Matters
Token cost is becoming the limiting factor for AI automation. As agents run longer workflows and access more pages, efficiency compounds.
Claude agents are already cheaper than hiring humans. But inefficient agent implementations waste that advantage.
The lesson: give your agent exactly the information it needs, not everything. Your token bill and your agent's reasoning speed will both improve.
Try It Now
- Get your API key at pagebolt.dev (free: 100 requests/month, no credit card)
- Replace
full_html_dumpwithinspect_page(url) - Watch your token costs drop by 90%
- Scale your agent workflows without the token overhead
Your Claude agents will be smarter and cheaper.
That's the power of structural inspection.
Top comments (0)