linou518

Posted on Mar 30

Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats

#openclaw #ai #dashboard

Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats

2026-03-30 | techsfree-web

How do you pull data from a web app with no public API into your own dashboard?
Try fetching HTML with requests and you get an empty shell — it's a SPA. BeautifulSoup has nothing to work with.
The answer: Chrome DevTools Protocol (CDP).

Why CDP?

The claude.ai usage page shows current token consumption rates, reset times, and overage status. Managing multiple accounts meant manually checking the browser every time — tedious.

Three options were on the table:

Approach	Problem
`requests` + BeautifulSoup	SPA — no data in the initial HTML
Selenium / Playwright	Heavy startup cost, separate process management
Chrome CDP	Direct access to an already-running Chrome ✅

Chrome was already running with --remote-debugging-port enabled. Why not use it?

How CDP Works

CDP is a WebSocket-based protocol. Hit /json on the debug port and you get a JSON list of currently open tabs.

import urllib.request, json

tabs = json.loads(
    urllib.request.urlopen("http://127.0.0.1:9225/json").read().decode()
)
# [{title: "...", url: "...", webSocketDebuggerUrl: "ws://..."}, ...]

Then connect to the target tab's webSocketDebuggerUrl and send commands.

import asyncio, websockets

async def scrape_tab(tab):
    async with websockets.connect(
        tab['webSocketDebuggerUrl'], max_size=5*1024*1024
    ) as ws:
        # Reload the page to get fresh data
        await ws.send(json.dumps({
            "id": 1, "method": "Page.reload",
            "params": {"ignoreCache": True}
        }))
        # ...

The SPA Wall: Load Complete ≠ Render Complete

This was the first major gotcha.

Waiting for Page.loadEventFired isn't enough — SPAs render the DOM afterward via JavaScript. Read document.body.innerText immediately after the event and you'll often get a loading spinner.

The fix: two-stage waiting.

# 1. Wait for Page.loadEventFired (up to 10 seconds)
deadline = asyncio.get_event_loop().time() + 10
while asyncio.get_event_loop().time() < deadline:
    msg = json.loads(await asyncio.wait_for(ws.recv(), 2))
    if msg.get('method') == 'Page.loadEventFired':
        break

# 2. Extra sleep for React rendering
await asyncio.sleep(3)  # ← this matters

3 seconds is empirical. 500ms wasn't enough. Tune this per site.

Extracting Data via JS Injection

Once the DOM has settled, inject JavaScript with Runtime.evaluate.

Since the data isn't structured in the DOM, I parse document.body.innerText line by line:

(function() {
    const lines = document.body.innerText
        .split('\n').map(l => l.trim()).filter(l => l);

    const r = { session_pct: null, weekly_all_pct: null };

    for (let i = 0; i < lines.length; i++) {
        const line = lines[i];

        // "42% used" → extract percentage
        const pct = line.match(/^(\d+)%\s*used$/);
        if (pct) {
            if (r.session_pct === null) r.session_pct = parseInt(pct[1]);
        }

        // "Resets in 2h 30m" → reset time
        const ri = line.match(/^Resets in (\d+) hr (\d+) min$/);
        if (ri) r.session_reset = `${ri[1]}h ${ri[2]}m`;
    }

    return JSON.stringify(r);
})()

It's regex guerrilla warfare against unstructured text — but as long as the display logic doesn't change, it's surprisingly stable.

Multiple Account Support

Multiple accounts run in separate Chrome profiles. Keep the usage page open in each and CDP returns all the tabs at once.

usage_tabs = [
    t for t in tabs if "claude.ai/settings/usage" in t.get('url', '')
]
tasks = [scrape_tab(t) for t in usage_tabs]
results = await asyncio.gather(*tasks, return_exceptions=True)

asyncio.gather runs them in parallel. Multiple tabs finish in seconds.

Account names are extracted from page text and mapped to display labels like "Work" or "Personal."

Dashboard Integration

A Flask endpoint serves the scraped data with a 5-minute cache. The frontend fetches once on page load — no polling needed.

[Chrome browser] ←WebSocket→ [Python scraper] → [Flask cache] → [Dashboard UI]

The morning ritual of "how much quota have I used this month?" is now answered by just opening the dashboard.

Summary & Gotchas

Key points of the CDP approach:

Pros: Reuses an existing Chrome session, no authentication needed, lightweight
Cons: Fragile to page structure changes, requires Chrome to be running
Watch out: Set max_size=5*1024*1024 or the WebSocket will truncate on default size

When you need data from a SaaS with no public API, CDP is a far lighter alternative to Selenium. It's become a reliable tool in my homelab toolkit.

DEV Community

Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats

Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats

Why CDP?

How CDP Works

The SPA Wall: Load Complete ≠ Render Complete

Extracting Data via JS Injection

Multiple Account Support

Dashboard Integration

Summary & Gotchas

Top comments (0)