DEV Community

linou518
linou518

Posted on

Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats

Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats

2026-03-30 | techsfree-web


How do you pull data from a web app with no public API into your own dashboard?
Try fetching HTML with requests and you get an empty shell — it's a SPA. BeautifulSoup has nothing to work with.
The answer: Chrome DevTools Protocol (CDP).


Why CDP?

The claude.ai usage page shows current token consumption rates, reset times, and overage status. Managing multiple accounts meant manually checking the browser every time — tedious.

Three options were on the table:

Approach Problem
requests + BeautifulSoup SPA — no data in the initial HTML
Selenium / Playwright Heavy startup cost, separate process management
Chrome CDP Direct access to an already-running Chrome ✅

Chrome was already running with --remote-debugging-port enabled. Why not use it?


How CDP Works

CDP is a WebSocket-based protocol. Hit /json on the debug port and you get a JSON list of currently open tabs.

import urllib.request, json

tabs = json.loads(
    urllib.request.urlopen("http://127.0.0.1:9225/json").read().decode()
)
# [{title: "...", url: "...", webSocketDebuggerUrl: "ws://..."}, ...]
Enter fullscreen mode Exit fullscreen mode

Then connect to the target tab's webSocketDebuggerUrl and send commands.

import asyncio, websockets

async def scrape_tab(tab):
    async with websockets.connect(
        tab['webSocketDebuggerUrl'], max_size=5*1024*1024
    ) as ws:
        # Reload the page to get fresh data
        await ws.send(json.dumps({
            "id": 1, "method": "Page.reload",
            "params": {"ignoreCache": True}
        }))
        # ...
Enter fullscreen mode Exit fullscreen mode

The SPA Wall: Load Complete ≠ Render Complete

This was the first major gotcha.

Waiting for Page.loadEventFired isn't enough — SPAs render the DOM afterward via JavaScript. Read document.body.innerText immediately after the event and you'll often get a loading spinner.

The fix: two-stage waiting.

# 1. Wait for Page.loadEventFired (up to 10 seconds)
deadline = asyncio.get_event_loop().time() + 10
while asyncio.get_event_loop().time() < deadline:
    msg = json.loads(await asyncio.wait_for(ws.recv(), 2))
    if msg.get('method') == 'Page.loadEventFired':
        break

# 2. Extra sleep for React rendering
await asyncio.sleep(3)  # ← this matters
Enter fullscreen mode Exit fullscreen mode

3 seconds is empirical. 500ms wasn't enough. Tune this per site.


Extracting Data via JS Injection

Once the DOM has settled, inject JavaScript with Runtime.evaluate.

Since the data isn't structured in the DOM, I parse document.body.innerText line by line:

(function() {
    const lines = document.body.innerText
        .split('\n').map(l => l.trim()).filter(l => l);

    const r = { session_pct: null, weekly_all_pct: null };

    for (let i = 0; i < lines.length; i++) {
        const line = lines[i];

        // "42% used" → extract percentage
        const pct = line.match(/^(\d+)%\s*used$/);
        if (pct) {
            if (r.session_pct === null) r.session_pct = parseInt(pct[1]);
        }

        // "Resets in 2h 30m" → reset time
        const ri = line.match(/^Resets in (\d+) hr (\d+) min$/);
        if (ri) r.session_reset = `${ri[1]}h ${ri[2]}m`;
    }

    return JSON.stringify(r);
})()
Enter fullscreen mode Exit fullscreen mode

It's regex guerrilla warfare against unstructured text — but as long as the display logic doesn't change, it's surprisingly stable.


Multiple Account Support

Multiple accounts run in separate Chrome profiles. Keep the usage page open in each and CDP returns all the tabs at once.

usage_tabs = [
    t for t in tabs if "claude.ai/settings/usage" in t.get('url', '')
]
tasks = [scrape_tab(t) for t in usage_tabs]
results = await asyncio.gather(*tasks, return_exceptions=True)
Enter fullscreen mode Exit fullscreen mode

asyncio.gather runs them in parallel. Multiple tabs finish in seconds.

Account names are extracted from page text and mapped to display labels like "Work" or "Personal."


Dashboard Integration

A Flask endpoint serves the scraped data with a 5-minute cache. The frontend fetches once on page load — no polling needed.

[Chrome browser] ←WebSocket→ [Python scraper] → [Flask cache] → [Dashboard UI]
Enter fullscreen mode Exit fullscreen mode

The morning ritual of "how much quota have I used this month?" is now answered by just opening the dashboard.


Summary & Gotchas

Key points of the CDP approach:

  • Pros: Reuses an existing Chrome session, no authentication needed, lightweight
  • Cons: Fragile to page structure changes, requires Chrome to be running
  • Watch out: Set max_size=5*1024*1024 or the WebSocket will truncate on default size

When you need data from a SaaS with no public API, CDP is a far lighter alternative to Selenium. It's become a reliable tool in my homelab toolkit.

Top comments (0)