Before you automate a browser, check the network tab

#webdev #automation #scraping #api

You need data from a website, so the first instinct is often Playwright, Puppeteer, Selenium, or an AI browser agent. That works, but it is easy to miss something important: the browser probably did not create the data you want. It fetched it from an internal API and rendered it.

Open DevTools on a product page, job board, travel site, or finance dashboard. Filter the Network tab to fetch or xhr. Reload the page. A lot of the useful stuff is already JSON.

The browser is often the expensive part

A browser automation script has real costs:

Chromium has to start or be kept warm
The page has to load assets you may not care about
Selectors break when the UI changes
Timeouts happen when ads, modals, or lazy loading behave differently
If an LLM agent drives the browser, every step may involve screenshots, DOM snapshots, and model calls

For example, this Playwright script can extract product data from rendered HTML:

import { chromium } from "playwright";

const browser = await chromium.launch();
const page = await browser.newPage();

await page.goto("https://example.com/products/123", {
  waitUntil: "networkidle",
  timeout: 30000,
});

const price = await page.locator("[data-testid='price']").innerText();
const title = await page.locator("h1").innerText();

console.log({ title, price });
await browser.close();

This is fine until the site renames data-testid, renders the price after a delayed client-side request, or shows a cookie modal that covers the page. The failure usually looks boring:

locator.innerText: Timeout 30000ms exceeded.
Call log:
  - waiting for locator('[data-testid="price"]')

That error tells you the selector failed, not whether the data was unavailable. The JSON endpoint may still have returned the price correctly.

Start by finding the data source

Before writing browser automation, capture the request that populated the page.

In Chrome DevTools:

Open Network
Filter by fetch or xhr
Reload the page
Click likely requests
Check Response and Preview for structured JSON
Right click the request and copy as cURL

You might end up with something like this:

curl 'https://www.example.com/api/products/123?currency=USD' \
  -H 'accept: application/json' \
  -H 'user-agent: Mozilla/5.0' \
  -H 'x-client-version: web-2026.06.1'

If that works outside the browser, you can usually replace a browser job with a normal HTTP client:

const res = await fetch("https://www.example.com/api/products/123?currency=USD", {
  headers: {
    accept: "application/json",
    "user-agent": "Mozilla/5.0",
    "x-client-version": "web-2026.06.1",
  },
});

if (!res.ok) {
  throw new Error(`HTTP ${res.status}: ${await res.text()}`);
}

const product = await res.json();
console.log(product.price, product.availability);

That is the clean path: less infrastructure, lower latency, fewer moving parts, and easier retries.

For teams that want maintained access to these private website APIs instead of managing request signatures themselves, Wire exposes cataloged site actions as normal REST calls with structured JSON responses.

Where direct HTTP gets messy

The copied cURL request is often not enough in production.

Many sites include values that change per session or per request:

CSRF tokens
signed query parameters
device or browser fingerprints
short-lived session cookies
nonces that prevent replay
GraphQL operation hashes

The failure mode is usually a 401, 403, or a JSON response that looks valid but contains an application-level denial:

{
  "error": "invalid_signature",
  "message": "request signature expired"
}

This is where a direct network approach stops being just “copy as cURL”. You need to understand how the frontend generates those headers and tokens. Sometimes that is straightforward. Sometimes the signing code changes weekly.

For supported sites, Wire maintains stable action identifiers while handling endpoint changes, signatures, and authenticated session details behind the API boundary.

If you build this yourself, treat the integration like any other external dependency. Add contract tests against the response shape. Alert on 403 spikes. Store HAR files for debugging. Do not assume a private endpoint is stable just because it returned JSON today.

When a browser is still the right tool

Direct HTTP is not always better. It is better when the data already exists as a request you can reproduce reliably.

Use browser automation when the task depends on the visual state of the page:

multi-page forms with conditional fields
portals that require interactive navigation
flows where validation messages affect the next action
sites that render data only after complex client-side state changes
tasks where a human would say “click the second matching result”

This is where tools like Skyvern make sense. Skyvern runs Chromium, observes screenshots and DOM, asks an LLM what to do, and executes actions through Playwright. Its cached runs can replay generated Playwright scripts, which removes the LLM from successful repeat paths, but it still runs a browser.

That distinction matters. A cached browser workflow may avoid model latency, but it still pays for page load, browser memory, navigation, and UI fragility. A live AI browser agent can adapt to UI changes, but it may take many seconds per step and can fail in ways that are harder to reproduce than a bad HTTP response.

A practical decision rule

Use this order:

Check the Network tab first
If the data is available as JSON, try to reproduce the request with curl
If the request needs tokens or signatures, decide whether maintaining that logic is worth it
If the workflow is visual, stateful, or form-heavy, use browser automation
If volume is high, avoid browsers unless there is no reliable network-layer option

The next time you start a scraping or automation task, spend 10 minutes capturing the HAR file before opening Playwright. If you can turn the target action into one authenticated HTTP request with a testable JSON schema, do that first. If you cannot, then reach for the browser with a clearer reason.