Zee

Posted on Apr 20

We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Here is What We Learned

#webdev #ai #scraping #playwright

We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Heres What We Learned

At Haunt API, we build web extraction tools for AI agents. Our MCP server lets Claude and other AI assistants extract structured data from any URL. Simple enough on paper — fetch a page, parse the HTML, return JSON.

The problem? Half the internet doesnt want to be fetched.

The Problem With Just Use Playwright

Most web scraping tutorials go something like this:

from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.launch()
    page = await browser.new_page()
    await page.goto(url)
    html = await page.content()

And that works! For a demo. For a product that real users depend on, it falls apart fast:

Sites detect headless browsers and serve captchas or empty pages
SPA pages need time to render — how long do you wait? 2 seconds? 5? 10?
You are burning resources loading images, fonts, and CSS when you only need text
Every render costs the same — no caching, no intelligence

We went through all of these. Here is how we solved each one.

Lesson 1: Do Not Use One Tool For Everything

Our pipeline has three tiers, and most requests never hit Playwright:

Direct HTTP — Works for approximately 80% of the web. Fast, cheap, no browser needed.
FlareSolverr — Handles Cloudflare challenges and basic JS rendering.
Playwright — Full browser rendering for JS-heavy SPAs that return empty skeletons.

The key insight: we detect skeleton pages — HTML that has an empty root div but no actual content — and only spin up the browser when we need to.

Lesson 2: Smart Wait Strategies Beat Fixed Timers

The worst thing about browser automation is the waiting. A fixed sleep is either too short or too long. We built three concurrent wait strategies — first one to trigger wins:

Content Stability — Poll visible text every 200ms. If unchanged for 1 second, done.
Network Idle — Wait for no new requests for 500ms.
Meaningful Content — Wait until 500+ chars of visible text exist.

This cut our average render time from 6 seconds to under 3.

Lesson 3: Fingerprint Rotation Matters

Headless Chromium has tells. We rotate fingerprints per-URL — same site sees a consistent browser, different sites see different browsers. 10 viewport variants across Windows, macOS, and Linux UAs.

Lesson 4: Block What You Do Not Need

When extracting text data, images and fonts are dead weight. We block them at the network level plus 20+ tracking domains. This cuts HTML payload by 40-60%.

Lesson 5: Cache Renders, Not Requests

If two users extract data from the same URL within 5 minutes, the page probably has not changed. Cache hits return in 0ms.

The Architecture

Six modules, each with a single job:

server.py — FastAPI orchestration, browser lifecycle
fingerprint.py — UA/viewport/locale rotation
smart_wait.py — Content stability + network idle detection
site_detect.py — Static vs SPA classification
cache.py — LRU render cache with TTL
stealth.py — Resource blocking + headless detection evasion

Each module is approximately 100 lines. Easy to test, easy to modify.

What We Learned

Do not reach for the browser first. Most pages are server-rendered.
Wait smarter, not longer.
Be a moving target with fingerprint rotation.
Cache aggressively.
Build modules, not monoliths.

The Playwright browser engine is the oven. Everything around it — the routing, the waiting, the caching, the stealth — is the recipe. That is where the actual engineering lives.

We are Haunt API — web extraction built for AI agents. If you are building with Claude, Cursor, or any AI assistant, our MCP server gives your agent the ability to extract data from any URL.

Top comments (2)

Ohad Badihi • Apr 28

Nice writeup — the wait-strategy and stealth pieces match what we landed on. One thing I'd add from running a Playwright-based render fleet: scheduled browser restarts (~hourly) saved us a lot of pain. Even with aggressive context.close() after every job, Chromium's resident memory creeps over thousands of renders, and a preemptive restart is way cheaper than an OOM-kill mid-job. Worth it even if your traffic doesn't look memory-bound on the surface — the leaks are slow and the cliff is sudden

Ohad Badihi • May 2

Wrote up the browser-restart point in more depth, in case useful — covers the lock pattern, why hourly works for us, and the four fixes together: dev.to/rendershot/headless-chromiu...