Web scrapers get banned. That's not a bug in your code — it's an intended feature of the sites you're scraping. Rate limiting, IP reputation checks, and behavioral fingerprinting are all designed to block automated access. Proxy rotation is the primary countermeasure.
This guide covers everything: proxy types, rotation strategies, working Python code, and the honest trade-off between rolling your own rotation vs paying for a managed scraping API.
Why Rotate Proxies at All?
When you make requests from a single IP address, the target server can:
- Rate-limit your IP after N requests per minute
- Soft-ban your IP after detecting non-human request patterns
- Hard-ban your IP and all IPs in the same subnet
- Serve degraded content — fake prices, empty results, honeypot data
Rotating proxies distributes your requests across many IP addresses, making your traffic look like many independent users rather than one aggressive bot.
But rotation alone isn't enough. How you rotate matters as much as whether you rotate.
Proxy Types: What You're Actually Buying
Datacenter Proxies
Datacenter proxies are IP addresses hosted in commercial data centers — AWS, DigitalOcean, Hetzner, etc. They're fast, cheap, and easy to get at scale.
Pros: Low latency, high throughput, cheap ($0.50–$2/GB)
Cons: Easily identified as non-residential. Sites like LinkedIn, Airbnb, and Ticketmaster block entire datacenter ASNs. Subnet bans are common — if one IP gets banned, all IPs in the /24 often follow.
Use when: Scraping sites with low anti-bot sophistication (public APIs, static HTML sites, small e-commerce).
Residential Proxies
Residential proxies are real consumer IP addresses — often sourced from opt-in VPN or mobile apps that sell bandwidth. Traffic appears to come from real homes, with ISP-assigned addresses.
Pros: High trust scores, bypass most IP-reputation checks, geographically diverse
Cons: Expensive ($5–$15/GB), slower than datacenter, ethical greyness around how the IPs are sourced
Use when: Scraping sites with aggressive anti-bot: Amazon, Google, social platforms, travel sites.
Mobile Proxies
Mobile proxies use IP addresses assigned by mobile carriers (4G/5G). These are the highest-trust IPs on the internet — carriers NAT thousands of users behind a single IP, so sites almost never block mobile IPs.
Pros: Nearly unbanned, very high trust scores
Cons: Very expensive ($15–$50/GB), limited pool sizes, high latency
Use when: You're scraping something where even residential proxies fail — highly protected SERP scrapers, ticketing sites, social media at scale.
Rotation Strategies
Round-Robin
Cycle through a proxy list sequentially. Request 1 uses proxy[0], request 2 uses proxy[1], and so on. When you hit the end of the list, wrap back to the start.
Use when: You have a large, homogeneous proxy pool and each request is independent.
Risk: Predictable patterns. If all your proxies hit the same endpoint in sequence, the server can detect the pattern even without recognizing each individual IP.
Random Rotation
Pick a random proxy from the pool for each request. Harder to detect patterns than round-robin but offers no guarantees — you might use the same proxy twice in a row.
Use when: Your proxy pool is large (100+) and requests are stateless.
Sticky Sessions
Assign a proxy to a session or workflow, not a single request. All requests in a session use the same IP. Rotate only when a session completes or fails.
Use when: Scraping workflows that require authentication, shopping carts, pagination — anywhere a single IP must appear consistent across multiple requests.
Failure-Based Rotation
Don't rotate on a schedule — rotate on failure. Start with one proxy, stick with it until you get a ban signal (403, 429, CAPTCHA), then switch.
Use when: You have a small proxy pool and want to preserve IPs rather than burn through them unnecessarily.
Python Code: Basic Proxy Rotation
Here's a minimal round-robin rotator using requests:
import requests
import itertools
import time
PROXIES = [
"http://user:pass@proxy1.example.com:9001",
"http://user:pass@proxy2.example.com:9001",
"http://user:pass@proxy3.example.com:9001",
]
proxy_cycle = itertools.cycle(PROXIES)
def scrape(url: str) -> str | None:
proxy = next(proxy_cycle)
try:
response = requests.get(
url,
proxies={"http": proxy, "https": proxy},
timeout=10,
headers={"User-Agent": "Mozilla/5.0 (compatible; research-bot/1.0)"},
)
response.raise_for_status()
return response.text
except requests.RequestException as e:
print(f"Request failed with proxy {proxy}: {e}")
return None
urls = ["https://example.com/page/1", "https://example.com/page/2"]
for url in urls:
html = scrape(url)
if html:
print(f"Got {len(html)} bytes from {url}")
time.sleep(1)
Python Code: Rotation With Backoff and Retry
Production scrapers need retry logic. A 429 doesn't mean the proxy is permanently banned — often a brief wait is enough. Use tenacity for clean retry behavior:
import requests
import random
import time
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
PROXIES = [
"http://user:pass@proxy1.example.com:9001",
"http://user:pass@proxy2.example.com:9001",
"http://user:pass@proxy3.example.com:9001",
"http://user:pass@proxy4.example.com:9001",
]
BANNED_PROXIES: set[str] = set()
def pick_proxy() -> str:
available = [p for p in PROXIES if p not in BANNED_PROXIES]
if not available:
raise RuntimeError("All proxies are banned")
return random.choice(available)
class ProxyBanned(Exception):
pass
@retry(
retry=retry_if_exception_type(ProxyBanned),
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=30),
)
def scrape_with_retry(url: str) -> str:
proxy = pick_proxy()
proxies = {"http": proxy, "https": proxy}
try:
response = requests.get(url, proxies=proxies, timeout=15)
if response.status_code == 403:
print(f"Proxy {proxy} banned — removing from pool")
BANNED_PROXIES.add(proxy)
raise ProxyBanned(f"403 on {proxy}")
if response.status_code == 429:
raise ProxyBanned(f"429 rate limited on {proxy}")
response.raise_for_status()
return response.text
except requests.Timeout:
raise ProxyBanned(f"Timeout on {proxy}")
# Usage
try:
html = scrape_with_retry("https://example.com/product/123")
print(f"Success: {len(html)} bytes")
except Exception as e:
print(f"All retries failed: {e}")
Python Code: Sticky Session Management
For multi-step workflows (login → navigate → extract), you need the same proxy for the entire session:
import requests
import random
from contextlib import contextmanager
PROXY_POOL = [
"http://user:pass@proxy1.example.com:9001",
"http://user:pass@proxy2.example.com:9001",
"http://user:pass@proxy3.example.com:9001",
]
@contextmanager
def proxy_session():
"""Create a requests.Session pinned to one proxy for its lifetime."""
proxy = random.choice(PROXY_POOL)
session = requests.Session()
session.proxies = {"http": proxy, "https": proxy}
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
})
try:
yield session
finally:
session.close()
# Usage — all requests in this block use the same proxy
with proxy_session() as session:
login_response = session.post(
"https://example.com/login",
data={"username": "user", "password": "pass"},
)
profile_response = session.get("https://example.com/profile")
data_response = session.get("https://example.com/data")
print(f"Data: {data_response.text[:200]}")
Common Pitfalls
Subnet Detection
If you buy a /24 block of datacenter IPs from the same provider, sophisticated targets will detect the shared ASN and subnet and ban the entire range. Fix: diversify across multiple proxy providers or use residential IPs with genuinely different ASNs.
Geographic Mismatch
If you're scraping a US price comparison site but your proxies are in Eastern Europe, the site might serve you different content, redirect you, or flag your session as suspicious. Always match proxy geography to your target's expected audience. Most paid proxy services let you specify the country.
DNS Leaks
Even with an HTTP/S proxy, DNS resolution can happen locally, revealing your real IP address or data center origin to any system watching DNS traffic. Use a proxy that handles DNS server-side, or set dns_over_https in your HTTP client configuration.
Not Rotating User Agents
Rotating proxies while sending identical User-Agent headers every time is like wearing a new mask but keeping the same voice. Rotate user agents in sync with proxy rotation. Match realistic browser UA strings to the OS — don't mix Windows Chrome UAs with Linux request patterns.
Rotating Too Aggressively
Burning through 50 proxies in 30 seconds on one site is more suspicious than using 3 proxies slowly. Sites look at request velocity, not just IP uniqueness. Add realistic delays (1–5 seconds between requests) and introduce jitter.
DIY vs Managed Scraping API: The Real Trade-Off
Here's where most guides hedge. I'll be direct about the numbers.
DIY Proxy Rotation: The Full Cost
| Item | Monthly Cost |
|---|---|
| Residential proxy pool (10GB) | $50–$150 |
| Proxy management code (dev time) | 5–20 hrs |
| Monitoring and retry logic | 3–10 hrs |
| Debugging bans and rotations | Ongoing |
| CAPTCHA solving service | $10–$30 |
| Total | $60–$180 + ongoing dev time |
DIY makes sense when you're scraping at very high volume (millions of requests/month) where per-request pricing on managed APIs gets expensive, or when you need fine-grained control that APIs don't offer.
Managed Scraping APIs: What You Get
| Service | Free Tier | Paid Entry | What It Handles |
|---|---|---|---|
| ScraperAPI | 5,000 credits | $49/mo | Proxies, CAPTCHAs, JS, headers |
| Scrape.do | 1,000 credits | $29/mo | Proxies, CAPTCHAs, TLS fingerprints |
| ScrapeOps | 1,000 credits | $49/mo | Proxy aggregation, monitoring |
Managed APIs handle proxy rotation, CAPTCHA solving, browser fingerprinting, and header management. One HTTP call replaces 400 lines of infrastructure code.
Decision Matrix
| Scenario | Recommendation |
|---|---|
| Under 100K requests/month | Managed API — cheaper than your dev time |
| 100K–1M requests/month | Compare managed vs DIY, run the numbers |
| Over 1M requests/month | DIY usually wins on cost |
| Scraping JS-heavy sites | Managed API (browser infra is expensive to maintain) |
| Need geo-targeting | Both work — check API's country list first |
| Scraping protected sites (Cloudflare, Akamai) | Managed API — they update fingerprints constantly |
| You want full control | DIY |
| Solo dev, time-constrained | Managed API — skip the ops overhead |
| Enterprise with dedicated scraping team | DIY |
The honest answer: below 500K requests/month, the engineering time to maintain a reliable DIY rotation system almost always costs more than just paying for a managed API. Above that, the math shifts.
Tools Worth Knowing
ScrapeOps has one of the best free proxy comparison tools in the space. Before committing to any proxy provider, run their benchmarks to see real success rates against your target sites — it's free and saves you from buying a proxy pool that won't work.
Their monitoring dashboard is also worth a look if you're running multiple scrapers — you get success rate tracking, latency histograms, and cost-per-successful-request across all your providers.
Final Recommendations
Start with a managed API unless you have a clear reason not to. The free tiers on ScraperAPI, Scrape.do, and ScrapeOps are generous enough to build and test a complete scraper before spending a dollar.
When you hit volume thresholds where managed APIs get expensive, migrate the high-frequency scrapes to DIY rotation while keeping the complex ones (JS-heavy, CAPTCHA-protected) on managed infrastructure.
The rotation strategy matters as much as the proxy type. Use sticky sessions for authenticated workflows, random rotation for stateless scraping, and failure-based rotation when preserving proxy longevity matters more than throughput.
Try These Services
ScraperAPI — Use code SCRAPE13833889 for 50% off your first month. Best for high-volume e-commerce scraping with structured data endpoints.
Scrape.do — Best budget option with strong Cloudflare bypass and TLS fingerprinting. Starts at $29/mo.
ScrapeOps — Best for monitoring and proxy comparison. Free benchmarking tools even on the free tier.
Get the full guide: The Complete Web Scraping Playbook 2026 — 48 pages covering proxy rotation, browser automation, CAPTCHA solving, anti-detection, and production scraper architecture. $9.
Disclosure: This article contains affiliate links. I may earn a commission if you sign up through my links, at no extra cost to you. I only recommend tools I've personally used or benchmarked.
Top comments (0)