I got rate-limited scraping 100 pages. Here's what actually worked
Broke a scraper last Tuesday because I was too impatient. Hit rate limits on page 47 of 100, lost all the data, had to start over. Fun times.
The Problem
I needed product data from an e-commerce site. Simple job - name, price, availability. But their API was locked behind enterprise pricing ($500/month, no thanks), so scraping it was.
First attempt: blasted through requests as fast as possible.
import requests
from bs4 import BeautifulSoup
for page in range(1, 101):
response = requests.get(f'https://example.com/products?page={page}')
soup = BeautifulSoup(response.text, 'html.parser')
# Extract data...
Result: banned at page 47. Zero data collected.
What Actually Worked
Three changes made it work:
1. Add random delays
import time
import random
time.sleep(random.uniform(2, 5)) # 2-5 second delays
2. Rotate user agents
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
# Add 3-4 more
]
headers = {'User-Agent': random.choice(user_agents)}
response = requests.get(url, headers=headers)
3. Save progress
import json
with open('progress.json', 'w') as f:
json.dump({'last_page': page, 'data': results}, f)
If it breaks, restart from last page instead of page 1.
What I Learned
- Scraping slow > scraping fast > getting banned
- User agent rotation matters (sites check this)
- Save progress every 10-20 pages
- Some sites are fine with scraping if you're polite about it
Second run: finished all 100 pages. Took 15 minutes instead of 2, but actually worked.
For bigger jobs now I just use ParseForge scrapers because they handle this stuff automatically, but this approach works fine for smaller projects.
Top comments (2)
Yets