You start with Playwright because it works. The agent needs data from a page, so you give it a browser, navigate to the URL, wait for selectors, extract text, and move on. Then the workflow grows from 3 pages to 300. Suddenly you are debugging Navigation timeout of 30000 ms exceeded, Target closed, rate limits, memory pressure, and a queue full of half-dead browser sessions.
The problem is not that browser automation is bad. It is that a browser is often the wrong abstraction for getting structured data into an AI system.
Browsers are useful, but expensive
A headless browser gives you high fidelity. It runs JavaScript, stores cookies, clicks buttons, submits forms, follows client-side routing, and sees the page roughly as a user would.
That matters for:
- multi-step login flows
- pages that render all useful data client-side
- visual testing
- checkout or booking flows
- workflows that need clicks, hovers, uploads, or form submission
But if your agent only needs titles, prices, comments, abstracts, availability, or review text, a full browser can become unnecessary overhead.
Each browser session consumes CPU and memory. Parallel extraction means parallel browser contexts or instances. You also have to manage lifecycle issues: browser startup, page crashes, timeouts, proxy assignment, retries, and cleanup.
A typical browser-based extraction loop looks like this:
import { chromium } from "playwright";
const browser = await chromium.launch();
const page = await browser.newPage();
try {
await page.goto("https://example.com/product/123", {
waitUntil: "networkidle",
timeout: 30_000,
});
const title = await page.locator("h1").innerText();
const price = await page.locator("[data-testid='price']").innerText();
console.log({ title, price });
} finally {
await browser.close();
}
This is fine for a small number of pages. At scale, the failure modes stack up. networkidle may never happen because analytics requests keep running. A selector change breaks extraction. A page crash loses the whole session. If the agent launches many of these calls at once, infrastructure becomes part of the reasoning loop whether you wanted it or not.
Structured extraction is a better default when the shape is known
For many agent workflows, the goal is not to interact with the site. The goal is to turn web content into typed data.
Instead of giving the agent a browser, give it an extraction API that returns JSON:
{
"title": "Example product",
"price": 42.99,
"currency": "USD",
"availability": "in_stock"
}
That changes the agent’s job. It no longer has to reason about selectors, loading states, cookie banners, or whether a button is visible. It receives data and decides what to do with it.
Wire is Anakin’s API layer for web actions, including catalog-based extractors that return structured data without making the agent manage browser sessions directly.
The broader pattern is what matters: move brittle web interaction out of the prompt and into deterministic infrastructure. Let code handle retries, parsing, authentication, and rate limits. Let the model handle ranking, summarizing, planning, or answering the user.
This works best when:
- the data shape is predictable
- you need many pages per workflow
- visual fidelity does not matter
- the agent can run jobs asynchronously
- failed extractions can be retried or skipped
It works poorly when the site requires complex interaction, CAPTCHA solving, dynamic user-specific flows, or visual verification. In those cases, use a browser. Do not pretend JSON extraction replaces the entire browser automation stack.
Use async jobs instead of long HTTP requests
Long-running extraction should usually be asynchronous. A synchronous request ties your agent to an HTTP timeout. If the page is slow, the proxy retries, or the extraction spans multiple pages, you either block the agent or fail the request.
A better pattern is:
- Submit a job.
- Get a
job_id. - Continue other work.
- Poll until the job reaches
completedorfailed. - Feed the result back into the agent.
The API shape usually looks like this:
curl -X POST https://api.example.com/v1/tasks \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"action": "extract.product",
"params": {
"url": "https://example.com/product/123"
}
}'
# 202 Accepted
# { "job_id": "job_abc123", "status": "processing" }
Then poll:
curl https://api.example.com/v1/jobs/job_abc123 \
-H "Authorization: Bearer $API_KEY"
# 200 OK
# {
# "status": "completed",
# "data": {
# "title": "Example product",
# "price": 42.99
# }
# }
Wire uses this same async job style for heavier web actions: submit work, receive a job id, and poll for structured results while the agent continues other tasks.
The important part is not the vendor-specific endpoint. It is the control flow. Agents often run several web lookups in parallel, and async jobs make that manageable.
Polling needs backoff and terminal states
Do not poll every 100ms. Also do not poll forever.
Your polling code should handle:
-
processingas non-terminal -
completedas success -
failedas terminal failure -
429withRetry-After - transient
5xxerrors - a maximum attempt count or deadline
Example:
import time
import httpx
TERMINAL = {"completed", "failed"}
def poll_job(job_id: str, api_key: str, deadline_seconds: int = 120):
url = f"https://api.example.com/v1/jobs/{job_id}"
headers = {"Authorization": f"Bearer {api_key}"}
started = time.monotonic()
attempt = 0
while time.monotonic() - started < deadline_seconds:
resp = httpx.get(url, headers=headers, timeout=10)
if resp.status_code == 429:
retry_after = int(resp.headers.get("Retry-After", "1"))
time.sleep(retry_after)
continue
if 500 <= resp.status_code < 600:
time.sleep(min(2 ** attempt, 30))
attempt += 1
continue
resp.raise_for_status()
payload = resp.json()
status = payload.get("status")
if status == "completed":
return payload["data"]
if status == "failed":
raise RuntimeError(payload.get("error", "extraction failed"))
time.sleep(min(2 ** attempt, 30))
attempt += 1
raise TimeoutError(f"job {job_id} did not finish within {deadline_seconds}s")
This code is boring on purpose. The agent should not improvise retry policy in natural language. Put that behavior in normal application code where you can test it.
Pick the lowest-fidelity tool that works
A useful rule: use the lowest-fidelity web access method that gives correct data.
Start with direct APIs if the site provides them. Use structured extraction when the data is public or session-backed but predictable. Use browser automation when interaction or rendering fidelity is required.
Mixing all three is normal, but keep the boundaries clear. If every extraction path eventually falls back to a browser, you still need browser infrastructure. If most tasks return JSON and only a few need Playwright, your system gets simpler and cheaper to operate.
A practical next step: take one existing browser-based extraction in your agent stack and log what it actually uses from the page. If it only reads a handful of fields, replace that path with a JSON-producing function and keep the browser version as a fallback.
Top comments (1)
API based access over browsers anyday!