Most scraper "incidents" I'm pulled into start the same way: someone shows me a graph of 429 responses and asks how to make them go away. The honest answer — that nobody likes — is that the 429s are the well-behaved part of the system. The rest is what's broken.
I'm going to argue that rate limits are not your enemy. They're a contract. And scrapers that treat them like a contract — instead of an obstacle — are the only ones I trust to run unsupervised for more than a quarter.
The teardown
Three things teams typically do when they hit rate limits, in order of how bad they are:
- Add proxies. "If they limit me, I'll just be more people." This works for about six weeks. Then the target site fingerprints your residential proxy pool and you're back to where you started, with a higher monthly bill.
- Decrease delays. "If we go faster, we'll finish before they notice." Faster only matters if the request budget exists. Going faster against a hard limit just stacks failures earlier.
- Retry harder. Add exponential backoff with a 30-minute cap. Now your "1-hour scraper" is a 4-hour scraper that completes when the throttle window expires.
All three are forms of the same denial: refusing to accept that the source site is telling you the rate at which they're willing to serve you data. They are. You should listen.
What rate limits actually are
A rate limit is the source-site engineer's way of saying: here is the contract under which my system stays healthy. They published the rate (often: in headers) because they've measured what their infrastructure can serve before things degrade. When you exceed it, you don't just hurt yourself — you contribute to the conditions that get scrapers blocked entirely.
There are three signals you should be reading from every response, not just the body:
-
Retry-Afterheader. This is the source telling you, in seconds, when it'll talk to you again. Respect it literally. -
X-RateLimit-Remaining(or equivalent). Some sites publish their budget. Use it. Slow down before you hit zero. - Status code distribution over time. If your 200 rate is dropping while 429 rises, you're approaching a soft limit you can't see. Back off proactively.
If you're not reading those, your scraper is operating blind against an opponent who is leaving the lights on for you.
The replacement pattern
Here's the rate-aware request loop I drop into every actor:
import asyncio, time
from collections import deque
class RateBudget:
"""Token bucket — refills at `rate` per second, max `burst` tokens."""
def __init__(self, rate: float, burst: int):
self.rate = rate
self.burst = burst
self.tokens = burst
self.last_refill = time.monotonic()
async def take(self):
while self.tokens < 1:
await asyncio.sleep((1 - self.tokens) / self.rate)
now = time.monotonic()
self.tokens = min(self.burst,
self.tokens + (now - self.last_refill) * self.rate)
self.last_refill = now
self.tokens -= 1
async def fetch(url, budget, session):
await budget.take()
resp = await session.get(url)
if resp.status == 429:
retry_after = int(resp.headers.get('Retry-After', '60'))
await asyncio.sleep(retry_after)
return await fetch(url, budget, session)
return resp
Three things this does that "decrease the delay" doesn't:
- Token bucket means the rate is global, not per-request. Concurrency works without exceeding the contract.
-
Retry-Afteris honoured literally. No exponential backoff guessing — the source already told you. - No proxy rotation. You don't need to be more people. You need to be one well-behaved person.
Result
On the two scrapers I migrated to this pattern this quarter:
- Idealista. 429 rate dropped from 8% to 0.4%. Total run time went up by 11% (from 47min to 52min average) — because we stopped hammering. Per-run cost went down 38% — because we stopped paying for retries that were never going to succeed.
- Sephora. 429 rate from 15% to <1%. Run time about the same. Block rate (full IP block requiring rotation) went from "monthly" to "zero in the last 90 days." This one's the real win — we used to burn a residential proxy pool subscription. Now we don't need it.
The pattern that emerges every time: respecting the rate makes you slower per-request, but more reliable per-run, and significantly cheaper per-result. The unit economics of a polite scraper beat the unit economics of an aggressive one. By a lot.
When it's wrong
This is wrong if the source site doesn't publish a contract — no Retry-After, no rate header, just blanket blocks. There you genuinely are guessing. But the guess should still bias toward "much slower than you think you need to be," not toward "more proxies." A token bucket at 1 req/sec is a fine starting point for an unknown site; you can ratchet up while watching error rates.
This is also wrong if you have explicit business permission to scrape at higher rates — a partnership, an API key, a contract. Those are different relationships. The advice here is for scrapers running against the public web, where 429 is the only contract you have.
Closing
Stop thinking of rate limits as the cost of doing business. Start thinking of them as a free service the target site is providing you: telling you exactly how to stay welcome. Most blocked scrapers I see were blocked not because they "got caught" — they were blocked because they ignored repeated, clearly-articulated signals that they were being rude.
We packaged the token bucket + Retry-After honour into a small middleware that sits in front of every actor we ship — visible across our Apify portfolio. About 30 lines of code. It's the most boring reliability win I've shipped this year, and the most consistent.
Which response header is your scraper currently ignoring? Drop it in the comments — I'll show you what to do with it.
Written by **Jonas Keller, Senior Automation Architect at SIÁN Agency. Find more from Jonas on dev.to. For custom scraping or automation work, hire SIÁN Agency.

Top comments (0)