DEV Community

Nico Reyes
Nico Reyes

Posted on

I got rate-limited scraping 100 pages. Here's what actually worked

I got rate-limited scraping 100 pages. Here's what actually worked

Broke a scraper last Tuesday because I was too impatient. Hit rate limits on page 47 of 100, lost all the data, had to start over. Fun times.

The Problem

I needed product data from an e-commerce site. Simple job - name, price, availability. But their API was locked behind enterprise pricing ($500/month, no thanks), so scraping it was.

First attempt: blasted through requests as fast as possible.

import requests
from bs4 import BeautifulSoup

for page in range(1, 101):
    response = requests.get(f'https://example.com/products?page={page}')
    soup = BeautifulSoup(response.text, 'html.parser')
    # Extract data...
Enter fullscreen mode Exit fullscreen mode

Result: banned at page 47. Zero data collected.

What Actually Worked

Three changes made it work:

1. Add random delays

import time
import random

time.sleep(random.uniform(2, 5))  # 2-5 second delays
Enter fullscreen mode Exit fullscreen mode

2. Rotate user agents

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
    # Add 3-4 more
]

headers = {'User-Agent': random.choice(user_agents)}
response = requests.get(url, headers=headers)
Enter fullscreen mode Exit fullscreen mode

3. Save progress

import json

with open('progress.json', 'w') as f:
    json.dump({'last_page': page, 'data': results}, f)
Enter fullscreen mode Exit fullscreen mode

If it breaks, restart from last page instead of page 1.

What I Learned

  • Scraping slow > scraping fast > getting banned
  • User agent rotation matters (sites check this)
  • Save progress every 10-20 pages
  • Some sites are fine with scraping if you're polite about it

Second run: finished all 100 pages. Took 15 minutes instead of 2, but actually worked.

For bigger jobs now I just use ParseForge scrapers because they handle this stuff automatically, but this approach works fine for smaller projects.

Top comments (2)

Collapse
 
nicodev__ profile image
Nico Reyes

Yets