DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

Why AI Agents Use inspect_page Instead of Dumping the Full DOM

Why AI Agents Use inspect_page Instead of Dumping the Full DOM

You're building a Claude agent to automate web tasks. The agent needs to navigate a page and interact with buttons, forms, and links.

Your first instinct: get the full HTML and let Claude parse it.

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": f"Here's the page HTML:\n\n{full_html_dump}"
    }]
)
Enter fullscreen mode Exit fullscreen mode

Simple. Direct. Extremely expensive.

That full HTML dump is 8,000+ tokens. Claude charges $3 per 1M input tokens. One page = $0.03 per agent query. Scale to 100 queries a day, and you're spending $3/day just on token overhead.

But here's the thing: Claude doesn't need the full DOM. It needs to know what it can interact with.


The Problem: DOM Bloat

A typical website's HTML includes:

  • Layout divs (nesting chains, 50+ levels deep)
  • CSS classes and inline styles (framework boilerplate, Tailwind utilities)
  • Script tags and data attributes
  • Comment nodes and meta tags
  • Images, videos, analytics trackers

Result: 10,000+ DOM nodes for a page that has maybe 50 interactive elements.

The agent needs to know:

  • Button at coordinates X saying "Submit"
  • Input field for "email"
  • Link to "/checkout"

It doesn't need to know:

  • The 200-line CSS in a <style> tag
  • The 500 nested divs from the framework
  • The tracking pixels and analytics

But when you dump the full DOM, Claude has to parse all of it. Tokens wasted. Money wasted.


The Solution: Structured Element Inspection

Instead of dumping the full DOM, inspect only the interactive elements.

PageBolt's inspect_page does exactly this:

import json
import urllib.request

def inspect_page(url):
    """Get structured map of interactive elements only"""
    api_key = "YOUR_API_KEY"  # pagebolt.dev

    payload = json.dumps({"url": url}).encode()
    req = urllib.request.Request(
        'https://pagebolt.dev/api/v1/inspect',
        data=payload,
        headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
        method='POST'
    )

    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())
Enter fullscreen mode Exit fullscreen mode

Returns:

{
  "buttons": [
    {"text": "Submit", "selector": "#submit-btn", "type": "primary"},
    {"text": "Cancel", "selector": ".cancel-btn", "type": "secondary"}
  ],
  "inputs": [
    {"name": "email", "selector": "#email-field", "type": "email"},
    {"name": "password", "selector": "#password-field", "type": "password"}
  ],
  "links": [
    {"text": "Forgot password?", "href": "/forgot", "selector": "a.forgot"}
  ],
  "headings": [
    {"text": "Sign In", "level": "h1"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

That's 500 tokens instead of 8,000.


Token Cost Comparison

Full DOM approach:

Page HTML: 8,000 tokens
Agent reasoning: 200 tokens
Response: 100 tokens
TOTAL: 8,300 tokens per query
Enter fullscreen mode Exit fullscreen mode

inspect_page approach:

Structured element map: 500 tokens
Agent reasoning: 200 tokens
Response: 100 tokens
TOTAL: 800 tokens per query
Enter fullscreen mode Exit fullscreen mode

Savings: 90% reduction in tokens

100 agent queries:

  • Full DOM: $0.25
  • inspect_page: $0.02

Scale to 10,000 queries a month, and you're saving $75/month. For a startup, that's meaningful.


Real Example: Multi-Page Automation

Build an agent that navigates 5 pages to complete a workflow:

import anthropic
import json
import urllib.request

client = anthropic.Anthropic()

def inspect_page(url):
    api_key = "YOUR_API_KEY"
    payload = json.dumps({"url": url}).encode()
    req = urllib.request.Request(
        'https://pagebolt.dev/api/v1/inspect',
        data=payload,
        headers={'x-api-key': api_key, 'Content-Type': 'application/json'},
        method='POST'
    )
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())

def automate_workflow(task):
    """Agent navigates multiple pages efficiently"""

    pages = [
        "https://example.com/login",
        "https://example.com/account",
        "https://example.com/settings",
        "https://example.com/billing",
        "https://example.com/confirm"
    ]

    total_tokens_used = 0

    for page_url in pages:
        # Inspect the page (not dump full HTML)
        page_elements = inspect_page(page_url)

        # Send structured elements to Claude
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=256,
            messages=[{
                "role": "user",
                "content": f"""
Task: {task}

Current page elements:
{json.dumps(page_elements, indent=2)}

What should the agent do next? Respond with a single action."""
            }]
        )

        # Track tokens
        total_tokens_used += response.usage.input_tokens + response.usage.output_tokens
        print(f"Page {page_url}: {response.content[0].text}")

    print(f"\nTotal tokens for 5-page workflow: {total_tokens_used}")
    print(f"Cost: ${(total_tokens_used / 1_000_000) * 3:.3f}")

# Run the agent
automate_workflow("Complete the account setup and enable 2FA")
Enter fullscreen mode Exit fullscreen mode

With full DOM: ~41,500 tokens, ~$0.12
With inspect_page: ~4,000 tokens, ~$0.01

That's 10x cheaper. Same automation. Same results.


Why This Matters

Token cost is becoming the limiting factor for AI automation. As agents run longer workflows and access more pages, efficiency compounds.

Claude agents are already cheaper than hiring humans. But inefficient agent implementations waste that advantage.

The lesson: give your agent exactly the information it needs, not everything. Your token bill and your agent's reasoning speed will both improve.


Try It Now

  1. Get your API key at pagebolt.dev (free: 100 requests/month, no credit card)
  2. Replace full_html_dump with inspect_page(url)
  3. Watch your token costs drop by 90%
  4. Scale your agent workflows without the token overhead

Your Claude agents will be smarter and cheaper.

That's the power of structural inspection.

Top comments (0)