agenthustler

Posted on Mar 25

ThorData Proxies Tutorial: Rotate Residential IPs for Web Scraping in 2026

#webdev #programming #tutorial #python

Every web scraper hits the same wall: IP bans, CAPTCHAs, and geo-restricted content. You rotate user agents, add delays, retry failed requests — and still get blocked after a few hundred requests.

The fix? Residential proxies that route your traffic through real consumer IPs. In this tutorial, I'll show you how to use ThorData's residential proxy network to scrape reliably with Python — from basic requests to async rotation patterns.

Why Residential Proxies Matter for Scraping

Datacenter proxies are cheap but easy to detect. Websites fingerprint IP ranges owned by cloud providers (AWS, GCP, Hetzner) and block them aggressively.

Residential proxies use IPs assigned to real ISPs and households. To a target website, your request looks like a regular user browsing from their home. This matters for three scenarios:

Anti-bot systems (Cloudflare, DataDome, PerimeterX) that block datacenter IPs on sight
Geo-restricted content — you need an IP in Brazil to see Brazilian pricing, a UK IP for UK news archives
Rate limits — rotating through thousands of IPs means no single IP gets flagged

Without residential proxies, you're fighting an arms race you'll lose. With them, you're invisible.

ThorData: What You Get

ThorData runs a residential proxy pool of 60M+ IPs across 195+ countries. Here's what makes it practical for developers:

Pay-as-you-go pricing — no monthly commitment, no contracts. You pay per GB of traffic.
Sticky sessions — keep the same IP for up to 30 minutes when you need session persistence (login flows, paginated scraping).
City-level targeting — target specific countries, states, or cities via the proxy username string.
HTTP/HTTPS/SOCKS5 support — works with any HTTP client library.
Dashboard with real-time usage — track bandwidth, success rates, and costs.

For scraping projects, the combination of large IP pool + granular geo-targeting + no contracts is hard to beat.

Getting Started: Sign Up and Get Credentials

Create an account at ThorData. You'll get a free trial with bandwidth to test.
Once logged in, navigate to the Dashboard → Residential Proxies section.
Copy your proxy credentials: you'll get a host, port, username, and password.

Your proxy endpoint will look like this:

Host: geo.thordata.net
Port: 9000
Username: your-username-res-any
Password: your-password

The -res-any suffix in the username tells ThorData to use any available residential IP. You can change any to a country code like us, gb, or br for geo-targeting.

Basic Python Tutorial: Requests with ThorData Proxy

Let's start with the simplest approach — using the requests library with ThorData as your proxy:

import requests

PROXY_HOST = "geo.thordata.net"
PROXY_PORT = 9000
PROXY_USER = "your-username-res-any"
PROXY_PASS = "your-password"

proxy_url = f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"

proxies = {
    "http": proxy_url,
    "https": proxy_url,
}

response = requests.get(
    "https://httpbin.org/ip",
    proxies=proxies,
    timeout=30,
)

print(response.json())
# {"origin": "186.215.xx.xx"}  — a residential IP, not your server's IP

Each request gets a random IP from the pool. No configuration needed — ThorData handles rotation automatically.

Geo-Targeting

Target a specific country by changing the username suffix:

# US IPs only
PROXY_USER = "your-username-res-us"

# UK IPs only
PROXY_USER = "your-username-res-gb"

# Brazil, São Paulo specifically
PROXY_USER = "your-username-res-br-city-saopaulo"

Sticky Sessions

When you need the same IP across multiple requests (e.g., maintaining a logged-in session), add a session ID:

import random

session_id = random.randint(10000, 99999)
PROXY_USER = f"your-username-res-us-session-{session_id}"

# All requests with this proxy_user will use the same IP
# for up to 30 minutes

Error Handling

Production scrapers need to handle proxy errors gracefully:

import requests
from requests.exceptions import ProxyError, Timeout

def fetch_with_proxy(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, proxies=proxies, timeout=30)

            if response.status_code == 407:
                raise Exception("Proxy auth failed — check credentials")

            if response.status_code == 403:
                print(f"Blocked on attempt {attempt + 1}, rotating IP...")
                continue  # next attempt gets a new IP automatically

            response.raise_for_status()
            return response

        except (ProxyError, Timeout) as e:
            print(f"Proxy error on attempt {attempt + 1}: {e}")
            if attempt == max_retries - 1:
                raise

    return None

# Usage
result = fetch_with_proxy("https://example.com/data")
if result:
    print(result.text[:200])

Key points:

407 means your proxy credentials are wrong. Don't retry — fix the username/password.
403 often means the target site blocked that IP. A retry gets a fresh IP automatically.
Timeouts happen with residential proxies more than datacenter ones. Set a reasonable timeout (30s) and retry.

Advanced: Async Scraping with aiohttp

For high-throughput scraping, use aiohttp to make concurrent requests through ThorData:

import asyncio
import aiohttp

PROXY_URL = "http://user-res-any:pass@geo.thordata.net:9000"

async def fetch(session, url):
    for attempt in range(3):
        try:
            async with session.get(
                url, proxy=PROXY_URL, timeout=aiohttp.ClientTimeout(total=30)
            ) as response:
                if response.status == 403:
                    print(f"403 on {url}, retrying...")
                    await asyncio.sleep(1)
                    continue
                return await response.text()
        except (aiohttp.ClientProxyConnectionError, asyncio.TimeoutError) as e:
            print(f"Error on {url}: {e}")
            if attempt == 2:
                return None
    return None

async def scrape_batch(urls, concurrency=10):
    semaphore = asyncio.Semaphore(concurrency)

    async def bounded_fetch(session, url):
        async with semaphore:
            return await fetch(session, url)

    async with aiohttp.ClientSession() as session:
        tasks = [bounded_fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# Usage
urls = [f"https://httpbin.org/ip?n={i}" for i in range(50)]
results = asyncio.run(scrape_batch(urls, concurrency=10))

successful = [r for r in results if r is not None]
print(f"Fetched {len(successful)}/{len(urls)} URLs successfully")

This pattern gives you:

10 concurrent requests through different residential IPs
Automatic retries on 403s and timeouts
Backpressure via semaphore so you don't overwhelm the proxy or target

For larger jobs, bump concurrency to 20-50. ThorData handles the IP rotation — you just need to manage your own request rate to stay polite to target servers.

Cost Comparison: ThorData vs Competitors

Here's how ThorData stacks up against the two biggest residential proxy providers as of early 2026:

Feature	ThorData	Bright Data	Oxylabs
Pool size	60M+ IPs	72M+ IPs	100M+ IPs
Entry price	~$2/GB (pay-as-you-go)	~$8/GB (min $500/mo)	~$8/GB (min $300/mo)
Contract required	No	Yes (monthly)	Yes (monthly)
Geo-targeting	Country, city	Country, city, ASN	Country, city
Sticky sessions	Up to 30 min	Up to 10 min	Up to 30 min
SOCKS5 support	Yes	Yes	Yes
Free trial	Yes	Yes (limited)	Yes (limited)

The big difference is the entry price and commitment. Bright Data and Oxylabs are enterprise-focused — their per-GB rates are competitive at scale, but you're locked into $300-500/month minimums.

ThorData lets you start at a few dollars and scale up. For indie developers, side projects, and early-stage startups, that pay-as-you-go model is significantly more practical. You're not burning $500/month while you figure out if your scraping project even works.

Conclusion

Residential proxies are the difference between a scraper that works on your laptop and one that works in production. ThorData gives you the IP diversity and geo-targeting you need without enterprise pricing or contracts.

To recap what we covered:

Basic proxy setup with Python requests — one line to add proxy support
Geo-targeting and sticky sessions via the username string
Error handling for 407, 403, and timeout scenarios
Async scraping with aiohttp for high-throughput jobs
Cost comparison showing ThorData's advantage for pay-as-you-go usage

Ready to try it? Sign up for ThorData and grab your proxy credentials. The free trial gives you enough bandwidth to test everything in this tutorial. Start with the basic requests example, verify your IP is rotating, then scale up to async when you need throughput.

Happy scraping.

DEV Community