Best way to find out where your scraper is fragile? Break it. On purpose. In a controlled way, in a test environment, with a checklist of failure modes you actively try to inject.
This is chaos engineering for scrapers. Most teams don't do it because they're convinced their scraper "works." Then they discover what doesn't work the hard way, in production, on a Sunday.
I ran the exercise on our image metadata scraper last week. Here's what I broke and what I found.
The 3-item attack list
Three categories of injected failure that catch most fragility:
- Network failure — slow responses, dropped connections, partial bodies, 5xx responses.
- Content failure — malformed HTML, missing fields, unexpected types (string where number was expected).
- Adversarial input — empty inputs, very large inputs, URLs that 404, URLs that redirect to login pages.
If your scraper survives all three, you have a real scraper. If it crashes or hangs on any of them, you've found a bug.
The trick — Playwright route handlers as fault injectors
Playwright's request routing isn't just for blocking ads. It's a controlled chaos primitive:
// Inject a 30% rate of 503 responses
await page.route('**/*', async (route) => {
if (Math.random() < 0.3) {
return route.fulfill({
status: 503,
body: 'Service Unavailable',
});
}
return route.continue();
});
// Inject latency
await page.route('**/api/*', async (route) => {
await new Promise(r => setTimeout(r, 5000));
return route.continue();
});
// Inject malformed JSON
await page.route('**/metadata.json', async (route) => {
return route.fulfill({
status: 200,
contentType: 'application/json',
body: '{"title": "Test", "size": ', // truncated JSON
});
});
Now run your scraper. See what falls over.
What broke when I did this last week
Image metadata scraper, running against a fixture set of 100 URLs with the failure handlers above wired in:
- 503 injection at 30% → scraper hung on a single URL for 90 seconds before failing. Found: missing per-request timeout. Fix: 15-second hard timeout per page.
-
5-second latency injection → scraper completed but reported 0 results for affected URLs. Found:
wait_for_selectorhad an implicit 5-second timeout that exactly matched the injected latency, so it failed silently. Fix: explicit timeout, longer than expected p99 page load. -
Truncated JSON injection → uncaught
JSONDecodeError, killed the entire run. Found: no try/except around the JSON parser. Fix: wrap intry/except, push to failures dataset (per last week's post). -
Empty input array → scraper exited with code 0 and an empty dataset. Found: no validation of input shape. Fix: assert
len(input.urls) > 0at start. - 404 URLs (mixed in with valid URLs) → scraper retried each three times before giving up, doubling run time. Found: 404 was being treated as transient, not permanent. Fix: 404 → push to failures immediately, no retry.
Five real bugs, found in 90 minutes. Every one of them would have eventually hit production. Two of them already had — the timeout one was the cause of a Slack alarm we got in March that we'd "fixed" by restarting the actor.
The CTA you didn't ask for
We now run a chaos test suite against every actor before it ships. Same five injections every time:
- Random 503s at 30%.
- Random 5s latency at 20%.
- Malformed JSON on the data endpoint.
- Empty input array.
- 50% invalid URLs in the input.
It takes 5 minutes to run, and it catches things real-traffic testing won't, because real traffic doesn't reliably produce the bad cases. The chaos suite is what caught the timeouts in the image metadata scraper before its first paying user noticed.
So:
Pick one of the five injections above. Run it against your scraper today. Drop what broke in the comments — I'll guess the failure mode if you give me one detail.
Agree, disagree, or have a chaos test that catches something subtler? Reply.
Written by **Nova Chen, Automation Dev Advocate at SIÁN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIÁN Agency.

Top comments (0)