I got rate limited scraping 100 pages. Here's what actually worked
Was pulling product data from an ecommerce site. Page 47 out of 100. Script crashes. 429 Too Many Requests.
Zero data collected.
What I tried first
Thought adding a 1 second delay would fix it.
import requests
import time
for page in range(1, 101):
response = requests.get(f'https://example.com/products?page={page}')
time.sleep(1) # This won't save you
products = response.json()
Got blocked again. Page 52 this time.
Tried proxies next. Bought a cheap proxy list. Half of them didn't work. The ones that did got flagged within 20 requests. Wasted $15.
What worked
Randomized delays between 2 to 5 seconds. Not consistent 1 second sleeps.
import random
import time
import requests
for page in range(1, 101):
delay = random.uniform(2, 5)
time.sleep(delay)
response = requests.get(
f'https://example.com/products?page={page}',
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
}
)
if response.status_code == 200:
products = response.json()
# process data
elif response.status_code == 429:
time.sleep(60) # Back off
continue
Added proper User Agent headers. Some sites check for this.
Built in retry logic for 429 errors. When you hit rate limit, wait a minute and try again instead of crashing.
Took 8 minutes to scrape 100 pages instead of 2. But it worked.
The thing nobody mentions
Sites don't just check request speed. They check patterns.
If you hit pages 1, 2, 3, 4 in perfect sequence every exactly 1 second, that's obviously a bot.
Real users jump around. They spend different time on different pages. They don't go 1→2→3→4→5.
Randomizing the delay helps. But also consider randomizing which pages you hit in what order if your use case allows it.
pages = list(range(1, 101))
random.shuffle(pages) # Random order
for page in pages:
# scrape page
delay = random.uniform(2, 5)
time.sleep(delay)
Got all 100 pages. No blocks.
Top comments (0)