Proxies are the backbone of any serious web scraping operation. Without them, your IP gets blocked after a few hundred requests. But not all proxies are equal — choosing the wrong type can waste your budget or get you detected anyway.
Let's break down the three main proxy types and when to use each.
The Three Proxy Types
Datacenter Proxies
Datacenter proxies come from cloud providers (AWS, GCP, OVH). They're fast and cheap, but websites can easily identify them because their IP ranges are publicly known.
Best for: High-volume scraping of sites with minimal anti-bot protection
Cost: $1-5 per GB
Speed: Fastest (1-10ms latency)
Detection risk: High
Residential Proxies
Residential proxies route traffic through real consumer ISP connections. They look like regular users browsing from home, making them much harder to detect.
Best for: Scraping sites with strong anti-bot measures (Amazon, Google, social media)
Cost: $5-15 per GB
Speed: Medium (50-200ms latency)
Detection risk: Low
Mobile Proxies
Mobile proxies use 4G/5G connections from real mobile carriers. Since carriers use CGNAT (shared IPs), blocking a mobile IP would block thousands of real users. Sites rarely block them.
Best for: The most protected sites, account-related operations
Cost: $15-30 per GB
Speed: Slowest (100-500ms latency)
Detection risk: Very low
Comparison Table
| Feature | Datacenter | Residential | Mobile |
|---|---|---|---|
| Speed | ★★★★★ | ★★★ | ★★ |
| Cost | ★★★★★ | ★★★ | ★ |
| Stealth | ★★ | ★★★★ | ★★★★★ |
| IP Pool | 10K-100K | 10M-50M | 1M-5M |
| Best Use | Bulk scraping | Protected sites | High-value targets |
Implementing Proxy Rotation in Python
Here's a practical proxy rotation setup:
import requests
import random
from itertools import cycle
class ProxyRotator:
def __init__(self, proxies: list[str]):
self.proxy_pool = cycle(proxies)
self.failed_proxies = set()
def get_next_proxy(self) -> dict:
proxy = next(self.proxy_pool)
while proxy in self.failed_proxies:
proxy = next(self.proxy_pool)
return {"http": proxy, "https": proxy}
def mark_failed(self, proxy: str):
self.failed_proxies.add(proxy)
def fetch(self, url: str, max_retries: int = 3) -> requests.Response | None:
for attempt in range(max_retries):
proxy_dict = self.get_next_proxy()
try:
response = requests.get(
url,
proxies=proxy_dict,
timeout=15,
headers={"User-Agent": "Mozilla/5.0"}
)
if response.status_code == 200:
return response
except requests.RequestException:
self.mark_failed(list(proxy_dict.values())[0])
return None
# Usage
proxies = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
"http://user:pass@proxy3.example.com:8080",
]
rotator = ProxyRotator(proxies)
response = rotator.fetch("https://example.com/data")
Smart Proxy Rotation Strategies
1. Geo-Targeted Rotation
Match your proxy location to your target:
def get_geo_proxy(target_country: str, proxy_list: dict) -> str:
"""Select proxy matching target site's country."""
country_proxies = proxy_list.get(target_country, proxy_list["us"])
return random.choice(country_proxies)
proxy_list = {
"us": ["http://us-proxy1:8080", "http://us-proxy2:8080"],
"uk": ["http://uk-proxy1:8080", "http://uk-proxy2:8080"],
"de": ["http://de-proxy1:8080", "http://de-proxy2:8080"],
}
2. Sticky Sessions
Some scraping tasks need the same IP across multiple requests:
import hashlib
def get_sticky_proxy(session_id: str, proxy_list: list[str]) -> str:
"""Consistent proxy for same session using hashing."""
index = int(hashlib.md5(session_id.encode()).hexdigest(), 16) % len(proxy_list)
return proxy_list[index]
# Same session_id always gets same proxy
proxy = get_sticky_proxy("user_session_123", proxies)
3. Tiered Proxy Strategy
Start cheap, escalate only when needed:
def tiered_fetch(url: str, datacenter_proxies, residential_proxies):
# Try datacenter first (cheap)
response = fetch_with_proxy(url, random.choice(datacenter_proxies))
if response and response.status_code == 200:
return response
# Escalate to residential (expensive but reliable)
response = fetch_with_proxy(url, random.choice(residential_proxies))
return response
Proxy Provider Integration
Most proxy providers offer a single gateway endpoint that handles rotation:
import requests
# Provider gateway handles rotation automatically
proxy_url = "http://user:pass@gateway.provider.com:7777"
response = requests.get(
"https://target-site.com/data",
proxies={"http": proxy_url, "https": proxy_url},
timeout=30
)
For reliable residential and mobile proxy access with automatic rotation, ThorData offers competitive pricing and a large IP pool.
Common Proxy Mistakes
- Using the same proxy for too many requests — rotate after every 5-10 requests
- Not matching proxy location to target — a German proxy scraping a US site looks suspicious
- Ignoring proxy speed — slow proxies create timeouts that waste your budget
- Not handling proxy failures — always implement retry logic with fallback proxies
- Sending too many concurrent requests — even with proxies, pace your requests
Conclusion
Start with datacenter proxies for basic scraping, upgrade to residential for protected sites, and reserve mobile proxies for the toughest targets. A tiered strategy saves money while maintaining high success rates.
For a reliable proxy solution with all three types, check out ThorData — they offer flexible plans that scale with your scraping needs.
Happy scraping!
Top comments (0)