DEV Community

agenthustler
agenthustler

Posted on

Web Scraping with Proxies: Residential vs Datacenter vs Mobile in 2026

Proxies are the backbone of any serious web scraping operation. Without them, your IP gets blocked after a few hundred requests. But not all proxies are equal — choosing the wrong type can waste your budget or get you detected anyway.

Let's break down the three main proxy types and when to use each.

The Three Proxy Types

Datacenter Proxies

Datacenter proxies come from cloud providers (AWS, GCP, OVH). They're fast and cheap, but websites can easily identify them because their IP ranges are publicly known.

Best for: High-volume scraping of sites with minimal anti-bot protection
Cost: $1-5 per GB
Speed: Fastest (1-10ms latency)
Detection risk: High

Residential Proxies

Residential proxies route traffic through real consumer ISP connections. They look like regular users browsing from home, making them much harder to detect.

Best for: Scraping sites with strong anti-bot measures (Amazon, Google, social media)
Cost: $5-15 per GB
Speed: Medium (50-200ms latency)
Detection risk: Low

Mobile Proxies

Mobile proxies use 4G/5G connections from real mobile carriers. Since carriers use CGNAT (shared IPs), blocking a mobile IP would block thousands of real users. Sites rarely block them.

Best for: The most protected sites, account-related operations
Cost: $15-30 per GB
Speed: Slowest (100-500ms latency)
Detection risk: Very low

Comparison Table

Feature Datacenter Residential Mobile
Speed ★★★★★ ★★★ ★★
Cost ★★★★★ ★★★
Stealth ★★ ★★★★ ★★★★★
IP Pool 10K-100K 10M-50M 1M-5M
Best Use Bulk scraping Protected sites High-value targets

Implementing Proxy Rotation in Python

Here's a practical proxy rotation setup:

import requests
import random
from itertools import cycle

class ProxyRotator:
    def __init__(self, proxies: list[str]):
        self.proxy_pool = cycle(proxies)
        self.failed_proxies = set()

    def get_next_proxy(self) -> dict:
        proxy = next(self.proxy_pool)
        while proxy in self.failed_proxies:
            proxy = next(self.proxy_pool)
        return {"http": proxy, "https": proxy}

    def mark_failed(self, proxy: str):
        self.failed_proxies.add(proxy)

    def fetch(self, url: str, max_retries: int = 3) -> requests.Response | None:
        for attempt in range(max_retries):
            proxy_dict = self.get_next_proxy()
            try:
                response = requests.get(
                    url,
                    proxies=proxy_dict,
                    timeout=15,
                    headers={"User-Agent": "Mozilla/5.0"}
                )
                if response.status_code == 200:
                    return response
            except requests.RequestException:
                self.mark_failed(list(proxy_dict.values())[0])
        return None

# Usage
proxies = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

rotator = ProxyRotator(proxies)
response = rotator.fetch("https://example.com/data")
Enter fullscreen mode Exit fullscreen mode

Smart Proxy Rotation Strategies

1. Geo-Targeted Rotation

Match your proxy location to your target:

def get_geo_proxy(target_country: str, proxy_list: dict) -> str:
    """Select proxy matching target site's country."""
    country_proxies = proxy_list.get(target_country, proxy_list["us"])
    return random.choice(country_proxies)

proxy_list = {
    "us": ["http://us-proxy1:8080", "http://us-proxy2:8080"],
    "uk": ["http://uk-proxy1:8080", "http://uk-proxy2:8080"],
    "de": ["http://de-proxy1:8080", "http://de-proxy2:8080"],
}
Enter fullscreen mode Exit fullscreen mode

2. Sticky Sessions

Some scraping tasks need the same IP across multiple requests:

import hashlib

def get_sticky_proxy(session_id: str, proxy_list: list[str]) -> str:
    """Consistent proxy for same session using hashing."""
    index = int(hashlib.md5(session_id.encode()).hexdigest(), 16) % len(proxy_list)
    return proxy_list[index]

# Same session_id always gets same proxy
proxy = get_sticky_proxy("user_session_123", proxies)
Enter fullscreen mode Exit fullscreen mode

3. Tiered Proxy Strategy

Start cheap, escalate only when needed:

def tiered_fetch(url: str, datacenter_proxies, residential_proxies):
    # Try datacenter first (cheap)
    response = fetch_with_proxy(url, random.choice(datacenter_proxies))
    if response and response.status_code == 200:
        return response

    # Escalate to residential (expensive but reliable)
    response = fetch_with_proxy(url, random.choice(residential_proxies))
    return response
Enter fullscreen mode Exit fullscreen mode

Proxy Provider Integration

Most proxy providers offer a single gateway endpoint that handles rotation:

import requests

# Provider gateway handles rotation automatically
proxy_url = "http://user:pass@gateway.provider.com:7777"

response = requests.get(
    "https://target-site.com/data",
    proxies={"http": proxy_url, "https": proxy_url},
    timeout=30
)
Enter fullscreen mode Exit fullscreen mode

For reliable residential and mobile proxy access with automatic rotation, ThorData offers competitive pricing and a large IP pool.

Common Proxy Mistakes

  1. Using the same proxy for too many requests — rotate after every 5-10 requests
  2. Not matching proxy location to target — a German proxy scraping a US site looks suspicious
  3. Ignoring proxy speed — slow proxies create timeouts that waste your budget
  4. Not handling proxy failures — always implement retry logic with fallback proxies
  5. Sending too many concurrent requests — even with proxies, pace your requests

Conclusion

Start with datacenter proxies for basic scraping, upgrade to residential for protected sites, and reserve mobile proxies for the toughest targets. A tiered strategy saves money while maintaining high success rates.

For a reliable proxy solution with all three types, check out ThorData — they offer flexible plans that scale with your scraping needs.

Happy scraping!

Top comments (0)