Scraping SPAs with Chrome CDP — How I Auto-Fetch claude.ai Usage Stats
2026-03-30 | techsfree-web
How do you pull data from a web app with no public API into your own dashboard?
Try fetching HTML with requests and you get an empty shell — it's a SPA. BeautifulSoup has nothing to work with.
The answer: Chrome DevTools Protocol (CDP).
Why CDP?
The claude.ai usage page shows current token consumption rates, reset times, and overage status. Managing multiple accounts meant manually checking the browser every time — tedious.
Three options were on the table:
| Approach | Problem |
|---|---|
requests + BeautifulSoup |
SPA — no data in the initial HTML |
| Selenium / Playwright | Heavy startup cost, separate process management |
| Chrome CDP | Direct access to an already-running Chrome ✅ |
Chrome was already running with --remote-debugging-port enabled. Why not use it?
How CDP Works
CDP is a WebSocket-based protocol. Hit /json on the debug port and you get a JSON list of currently open tabs.
import urllib.request, json
tabs = json.loads(
urllib.request.urlopen("http://127.0.0.1:9225/json").read().decode()
)
# [{title: "...", url: "...", webSocketDebuggerUrl: "ws://..."}, ...]
Then connect to the target tab's webSocketDebuggerUrl and send commands.
import asyncio, websockets
async def scrape_tab(tab):
async with websockets.connect(
tab['webSocketDebuggerUrl'], max_size=5*1024*1024
) as ws:
# Reload the page to get fresh data
await ws.send(json.dumps({
"id": 1, "method": "Page.reload",
"params": {"ignoreCache": True}
}))
# ...
The SPA Wall: Load Complete ≠ Render Complete
This was the first major gotcha.
Waiting for Page.loadEventFired isn't enough — SPAs render the DOM afterward via JavaScript. Read document.body.innerText immediately after the event and you'll often get a loading spinner.
The fix: two-stage waiting.
# 1. Wait for Page.loadEventFired (up to 10 seconds)
deadline = asyncio.get_event_loop().time() + 10
while asyncio.get_event_loop().time() < deadline:
msg = json.loads(await asyncio.wait_for(ws.recv(), 2))
if msg.get('method') == 'Page.loadEventFired':
break
# 2. Extra sleep for React rendering
await asyncio.sleep(3) # ← this matters
3 seconds is empirical. 500ms wasn't enough. Tune this per site.
Extracting Data via JS Injection
Once the DOM has settled, inject JavaScript with Runtime.evaluate.
Since the data isn't structured in the DOM, I parse document.body.innerText line by line:
(function() {
const lines = document.body.innerText
.split('\n').map(l => l.trim()).filter(l => l);
const r = { session_pct: null, weekly_all_pct: null };
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
// "42% used" → extract percentage
const pct = line.match(/^(\d+)%\s*used$/);
if (pct) {
if (r.session_pct === null) r.session_pct = parseInt(pct[1]);
}
// "Resets in 2h 30m" → reset time
const ri = line.match(/^Resets in (\d+) hr (\d+) min$/);
if (ri) r.session_reset = `${ri[1]}h ${ri[2]}m`;
}
return JSON.stringify(r);
})()
It's regex guerrilla warfare against unstructured text — but as long as the display logic doesn't change, it's surprisingly stable.
Multiple Account Support
Multiple accounts run in separate Chrome profiles. Keep the usage page open in each and CDP returns all the tabs at once.
usage_tabs = [
t for t in tabs if "claude.ai/settings/usage" in t.get('url', '')
]
tasks = [scrape_tab(t) for t in usage_tabs]
results = await asyncio.gather(*tasks, return_exceptions=True)
asyncio.gather runs them in parallel. Multiple tabs finish in seconds.
Account names are extracted from page text and mapped to display labels like "Work" or "Personal."
Dashboard Integration
A Flask endpoint serves the scraped data with a 5-minute cache. The frontend fetches once on page load — no polling needed.
[Chrome browser] ←WebSocket→ [Python scraper] → [Flask cache] → [Dashboard UI]
The morning ritual of "how much quota have I used this month?" is now answered by just opening the dashboard.
Summary & Gotchas
Key points of the CDP approach:
- Pros: Reuses an existing Chrome session, no authentication needed, lightweight
- Cons: Fragile to page structure changes, requires Chrome to be running
-
Watch out: Set
max_size=5*1024*1024or the WebSocket will truncate on default size
When you need data from a SaaS with no public API, CDP is a far lighter alternative to Selenium. It's become a reliable tool in my homelab toolkit.
Top comments (0)