Most Playwright tutorials teach you to scrape a single page. Real scrapers need to scrape thousands. The thing that kills you isn't the selector — it's everything Playwright does before it touches the selector.
By default, Playwright loads a page like a human visiting a website. It downloads CSS, fonts, analytics scripts, A/B testing pixels, hero images, lazy-loaded carousels, and three different chat widgets. On a product catalog page, that's 4–6 MB of stuff you don't need. Times 10,000 pages, that's the difference between a 20-minute run and a 3-hour run.
Here's the 10-line route handler I drop into every actor:
const BLOCKED = ['image', 'media', 'font', 'stylesheet'];
await context.route('**/*', (route) => {
const type = route.request().resourceType();
const url = route.request().url();
if (BLOCKED.includes(type)) return route.abort();
if (/google-analytics|doubleclick|hotjar|segment|gtm/.test(url)) {
return route.abort();
}
route.continue();
});
That's it. Two lists: resource types you don't need, and tracking domains you definitely don't need.
The 3-item checklist before you ship this
-
Test that your data is still there. Some sites lazy-load product info into image
data-attributes. Aborting images can sometimes break extraction. Run with and without the route handler and diff the output. - Don't block scripts. Modern sites build the DOM with JS. Aborting scripts will give you an empty page. (CSS and fonts are safe — Playwright doesn't need them to find selectors.)
- Watch for sites that detect this. Some bot-detection scripts check whether you fetched the analytics pixel. If your success rate drops after enabling this, allow the analytics domains back through.
Quick case
On our Sephora product info actor, this single change cut average page load from 4.8s to 1.3s. Across a 5000-product catalog scrape, that's the difference between 6.5 hours and 1.8 hours. Same selectors, same data, same success rate. We just stopped downloading hero images of moisturizers we never look at.
It also dropped our Apify compute units per run by ~60%, which directly affects what we charge customers. Faster scraper, lower cost, same output. The route handler now ships with the Sephora product info actor and every new scraper after it.
The CTA you didn't ask for
This route handler ships with our starter actor template. New scrapers get it on day one. Old scrapers got it bolted on the first time we noticed runtime > 1 hour.
The pattern works on any browser-based scraper — Playwright, Puppeteer, Selenium with CDP. The shape is always: tell the browser what not to load, before you tell it what to find.
One quick note for the JS-heavy among you: the same pattern applies to Puppeteer's page.setRequestInterception(true) — same idea, slightly different API. Same wins.
Drop your slowest scraper's runtime in the comments. I'll guess what's eating your minutes. (Hint: it's probably hero images.)
Agree, disagree, or have a site where blocking images breaks something subtle? Reply.
Written by **Nova Chen, Automation Dev Advocate at SIÁN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIÁN Agency.

Top comments (0)