Miller James

Posted on Apr 24

How to Set Up a Proxy Rotation System for Your Scrapy or Requests-Based Scraper

A workable proxy rotation system has five moving parts: a proxy pool, a selector, request-level proxy injection, failure tracking, and a verification step that proves the outgoing IP behavior matches your design. Requests supports per-request proxies, explicit timeouts, retries through adapters, and optional SOCKS support, while Scrapy’s proxy flow centers on request.meta['proxy'] and HttpProxyMiddleware.

If you already have a scraper and you want to add rotation without a wrapper library, the cleanest implementation is to keep one internal proxy model, rotate at the request layer, put unstable proxies on cooldown instead of deleting them immediately, and verify everything against an IP-echo endpoint before you touch your real target. Requests also documents that session.proxies can be overridden by environment proxy variables, which is why per-request proxy assignment is the safer default in a custom rotator.

Prerequisites

You need three things before you start:

A scraper that already makes successful requests.
A proxy list from your provider, either as full URLs or as host, port, username, and password fields.
A decision about whether your workflow is stateless or session-sensitive.

That last decision matters because both Requests and Scrapy make per-request proxy assignment straightforward, but your target workflow may still require continuity across several requests. Scrapy’s proxy handling uses request metadata, and Requests lets you pass a fresh proxies dict on each call, so the library side is easy; the hard part is deciding when rotation should happen.

Use per-request rotation when each page fetch stands alone. Use a sticky session when login state, carts, multi-step forms, or account flows need the same network identity across several requests.

How Does a Proxy Rotation System Actually Work?

Keep the design simple:

The proxy pool stores usable endpoints and their current state.
The selector chooses the next proxy.
The request hook injects that proxy into the request.
The feedback loop marks success or failure.
The verification step confirms that rotation is actually happening.

That model lines up with how public Scrapy rotation packages are described: they focus on rotating proxies, checking that endpoints are alive, and reacting to failures, not just loading a static text file. github

Your internal proxy record should hold at least:

id
scheme
host
port
username
password
fail_count
cooldown_until
last_status
session_label if your provider exposes sticky-session routing

Here is a minimal model:

from dataclasses import dataclass
from typing import Optional
import time

@dataclass
class ProxyEndpoint:
    id: str
    scheme: str
    host: str
    port: int
    username: Optional[str] = None
    password: Optional[str] = None
    fail_count: int = 0
    cooldown_until: float = 0.0
    last_status: Optional[int] = None
    session_label: Optional[str] = None

    def as_url(self) -> str:
        auth = ""
        if self.username and self.password:
            auth = f"{self.username}:{self.password}@"
        return f"{self.scheme}://{auth}{self.host}:{self.port}"

    def as_requests_proxies(self) -> dict:
        url = self.as_url()
        return {"http": url, "https": url}

And a small pool object:

import random
import time

class ProxyPool:
    def __init__(self, proxies, max_failures=3, cooldown_seconds=120):
        self.proxies = proxies
        self.max_failures = max_failures
        self.cooldown_seconds = cooldown_seconds

    def available(self):
        now = time.time()
        return [p for p in self.proxies if p.cooldown_until <= now]

    def choose(self):
        candidates = self.available()
        if not candidates:
            raise RuntimeError("No proxies are currently available")
        return random.choice(candidates)

    def mark_success(self, proxy, status_code=None):
        proxy.fail_count = 0
        proxy.cooldown_until = 0.0
        proxy.last_status = status_code

    def mark_failure(self, proxy, status_code=None):
        proxy.fail_count += 1
        proxy.last_status = status_code
        if proxy.fail_count >= self.max_failures:
            proxy.cooldown_until = time.time() + self.cooldown_seconds

Don’t overcomplicate failure handling on day one. A timeout may be transient, a 429 usually points to pacing, and a 407 usually means the proxy URL or credentials are wrong.

How Do You Rotate Proxies in Requests Without a Wrapper?

Requests already gives you the pieces you need. The official advanced usage docs show per-request proxies, Session reuse, timeout tuples, proxy authentication in the proxy URL, SOCKS support, and retry support through HTTPAdapter and urllib3.util.Retry.
The values in the sample below are example starter values for this article’s reference implementation, not universal defaults. You should tune them after you run the verification section.

import time
import requests
from requests import Session
from requests.adapters import HTTPAdapter
from requests.exceptions import ProxyError, ConnectTimeout, ReadTimeout, SSLError, ConnectionError
from urllib3.util import Retry

def build_session():
    session = Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (compatible; scraper/1.0)"
    })

    retries = Retry(total=0, backoff_factor=0, status_forcelist=[])
    adapter = HTTPAdapter(max_retries=retries)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

def should_retry_status(status_code: int) -> bool:
    return status_code in {403, 407, 429, 500, 502, 503, 504}

def backoff_seconds(attempt: int) -> float:
    return min(2 ** attempt, 8)

def fetch_with_rotation(url: str, pool: ProxyPool, attempts: int = 5):
    session = build_session()
    last_error = None

    for attempt in range(attempts):
        proxy = pool.choose()

        try:
            response = session.get(
                url,
                proxies=proxy.as_requests_proxies(),
                timeout=(3.05, 20),
            )

            if should_retry_status(response.status_code):
                pool.mark_failure(proxy, response.status_code)
                last_error = RuntimeError(f"Retryable status {response.status_code}")
                time.sleep(backoff_seconds(attempt))
                continue

            pool.mark_success(proxy, response.status_code)
            return response, proxy

        except (ProxyError, ConnectTimeout, ReadTimeout, ConnectionError, SSLError) as exc:
            pool.mark_failure(proxy)
            last_error = exc
            time.sleep(backoff_seconds(attempt))

    raise RuntimeError(f"All attempts failed. Last error: {last_error}")

This pattern works well for three reasons. Requests says most external calls should have explicit timeouts because requests do not time out by default, and it documents the difference between a single timeout value and a (connect, read) tuple. Requests also notes that session.proxies can be overwritten by environmental proxy settings, which is why the code above passes the proxy at request time.
Use it like this:

proxies = [
    ProxyEndpoint("p1", "http", "proxy1.example.com", 8000, "user", "pass"),
    ProxyEndpoint("p2", "http", "proxy2.example.com", 8000, "user", "pass"),
]

pool = ProxyPool(proxies, max_failures=2, cooldown_seconds=90)

response, proxy = fetch_with_rotation("https://httpbin.org/ip", pool)
print("Used proxy:", proxy.id)
print("Response:", response.text)

Verify Requests Rotation

Do not point this at your production target first. Use an IP-echo endpoint and confirm:

The selected proxy ID changes over several runs.
The returned IP changes the way your provider says it should.
A bad proxy enters cooldown and the code keeps moving.

A basic check loop is enough:

for i in range(10):
    try:
        r, p = fetch_with_rotation("https://httpbin.org/ip", pool)
        print(i, p.id, r.status_code, r.text)
    except Exception as e:
        print(i, "failed", e)

The output shape you want is boring: several successful responses, changing proxy IDs, and no full-stop crash when one endpoint goes bad.

`socks5` vs `socks5h`

Requests supports SOCKS as an optional feature, and its docs draw an important line between socks5 and socks5h: socks5 resolves DNS on the client, while socks5h resolves DNS on the proxy server. If hostname resolution or observed region looks wrong, check that first.

How Do You Rotate Proxies in Scrapy Without a Wrapper?

Scrapy’s proxy path is built around request.meta['proxy']. Zyte’s Scrapy proxy write-up explains that HttpProxyMiddleware reads the proxy value from request metadata, and Scrapy’s published middleware source shows the same thing in code.

So your custom middleware only needs to pick a proxy and attach it before the request reaches the downloader.

The values below are sample project values for the example code, not framework defaults.

# middlewares.py
import random
import time
from scrapy.exceptions import IgnoreRequest

class RotatingProxyMiddleware:
    RETRYABLE_STATUSES = {403, 407, 429, 500, 502, 503, 504}

    def __init__(self, proxies, max_failures=3, cooldown_seconds=120, max_retry_times=3):
        self.proxies = proxies
        self.max_failures = max_failures
        self.cooldown_seconds = cooldown_seconds
        self.max_retry_times = max_retry_times

    @classmethod
    def from_crawler(cls, crawler):
        raw = crawler.settings.getlist("ROTATION_PROXY_LIST")
        proxies = []
        for idx, item in enumerate(raw):
            proxies.append({
                "id": f"p{idx+1}",
                "url": item,
                "fail_count": 0,
                "cooldown_until": 0.0,
                "last_status": None,
            })
        return cls(
            proxies=proxies,
            max_failures=crawler.settings.getint("ROTATION_MAX_FAILURES", 3),
            cooldown_seconds=crawler.settings.getint("ROTATION_COOLDOWN_SECONDS", 120),
            max_retry_times=crawler.settings.getint("ROTATION_MAX_RETRY_TIMES", 3),
        )

    def available(self):
        now = time.time()
        return [p for p in self.proxies if p["cooldown_until"] <= now]

    def choose(self):
        candidates = self.available()
        if not candidates:
            raise IgnoreRequest("No proxies currently available")
        return random.choice(candidates)

    def mark_success(self, proxy, status=None):
        proxy["fail_count"] = 0
        proxy["cooldown_until"] = 0.0
        proxy["last_status"] = status

    def mark_failure(self, proxy, status=None):
        proxy["fail_count"] += 1
        proxy["last_status"] = status
        if proxy["fail_count"] >= self.max_failures:
            proxy["cooldown_until"] = time.time() + self.cooldown_seconds

    def process_request(self, request, spider):
        if request.meta.get("disable_proxy_rotation"):
            return None

        if "rotation_proxy" not in request.meta:
            proxy = self.choose()
            request.meta["rotation_proxy"] = proxy
            request.meta["proxy"] = proxy["url"]

        return None

    def process_response(self, request, response, spider):
        proxy = request.meta.get("rotation_proxy")
        if not proxy:
            return response

        if response.status in self.RETRYABLE_STATUSES:
            self.mark_failure(proxy, response.status)
            retry_times = request.meta.get("rotation_retry_times", 0) + 1
            if retry_times <= self.max_retry_times:
                new_request = request.copy()
                new_request.dont_filter = True
                new_request.meta.pop("rotation_proxy", None)
                new_request.meta.pop("proxy", None)
                new_request.meta["rotation_retry_times"] = retry_times
                return new_request
        else:
            self.mark_success(proxy, response.status)

        return response

    def process_exception(self, request, exception, spider):
        proxy = request.meta.get("rotation_proxy")
        if proxy:
            self.mark_failure(proxy)
            retry_times = request.meta.get("rotation_retry_times", 0) + 1
            if retry_times <= self.max_retry_times:
                new_request = request.copy()
                new_request.dont_filter = True
                new_request.meta.pop("rotation_proxy", None)
                new_request.meta.pop("proxy", None)
                new_request.meta["rotation_retry_times"] = retry_times
                return new_request

        return None

Use settings like this:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    "myproject.middlewares.RotatingProxyMiddleware": 610,
}

ROTATION_PROXY_LIST = [
    "http://user:pass@proxy1.example.com:8000",
    "http://user:pass@proxy2.example.com:8000",
]

ROTATION_MAX_FAILURES = 2
ROTATION_COOLDOWN_SECONDS = 90
ROTATION_MAX_RETRY_TIMES = 3

DOWNLOAD_TIMEOUT = 20
RETRY_ENABLED = False

And a tiny spider:

# spiders/ip_test.py
import scrapy

class IpSpider(scrapy.Spider):
    name = "ip_test"

    def start_requests(self):
        for _ in range(10):
            yield scrapy.Request("https://httpbin.org/ip")

    def parse(self, response):
        self.logger.info("IP response: %s", response.text)

If you need one request to bypass the rotator, disable it explicitly:

yield scrapy.Request(
    "https://internal.example.net/health",
    meta={"disable_proxy_rotation": True}
)

This is the key fix that many tutorials skip: once the middleware is global, your existing spiders should not need per-request “auto proxy” markers. The middleware owns rotation unless a request opts out.

Which Rotation Mode Should You Use?

Use this quick decision rule:

Workflow	Better choice
Public pages where each request stands alone	Per-request rotation
Login, cart, checkout, or account steps	Sticky session
Multi-page QA flow that must keep the same identity	Sticky session
Regional availability checks from several locations	Per-request rotation

If request 2 would break when it comes from a different network identity than request 1, don’t use per-request rotation.

Which Proxy Type Fits This System?

In general, residential proxies are a better fit when the workflow is region-aware or trust-sensitive, ISP/static endpoints are better when a session must stay stable for longer, and datacenter proxies are a better fit for simpler internal checks where speed and cost matter more than continuity. Proxy001’s homepage says the service offers static, dynamic, residential, and mobile proxy services, supports Python and Scrapy, and provides access to 100M+ premium IPs across 200+ regions with a free test option.
That’s the natural point to validate your custom rotator against a commercial pool: after your code works against an IP-echo endpoint and before you use it in a larger workflow.

How Do You Know the Rotation Is Actually Working?

Run verification in this order:

1. Validate one proxy first

Check that one known-good endpoint authenticates correctly, honors your timeout settings, and works over HTTPS.

2. Validate the rotator on an IP-echo endpoint

Log:

selected proxy ID,
returned IP,
status code,
elapsed time,
and whether the proxy entered cooldown.

3. Add one intentionally bad proxy

This proves your failure path actually works. You want to see the bad endpoint fail, accumulate failures, enter cooldown, and stop dragging the whole scraper down.

4. Only then hit your real target

If IP rotation works but the target still returns repeated 403s or 429s, the next place to look is pacing, session continuity, headers, or provider quality.

Why Are You Still Getting 403, 407, or 429 Errors?

A 407 Proxy Authentication Required usually means a bad credential set or a malformed proxy URL. Requests documents proxy auth directly inside the proxy URL, and Scrapy’s proxy path uses the same URL form through request.meta['proxy'].

A 403 does not automatically mean the proxy is dead. It can also mean the workflow is incomplete, the cookies are wrong, or you rotated when the server expected continuity.

A 429 usually points to pacing. Lower your request rate, add backoff, and stop retrying the same path too aggressively.

A hang usually means missing timeouts. Requests says most external requests should use explicit timeouts because otherwise the code may wait for minutes or more.

An SSL failure may come from certificate trust settings. Requests verifies HTTPS certificates by default, allows a CA bundle path, and warns that verify=False weakens security and should be limited to local testing.

A DNS mismatch with SOCKS usually means you chose socks5 when you needed socks5h. Requests documents that distinction directly.

Quick fixes table

Symptom	Likely cause	Fix
407	Bad credentials or malformed proxy URL	Rebuild the URL as `scheme://user:pass@host:port` and test one endpoint first.
403 on a stateful flow	You rotated too often	Move that workflow to sticky session mode.
429	Request pacing is too high	Slow down and add backoff before retrying.
Same exit IP every time	Provider-side sticky routing or a non-rotating endpoint	Test over a longer window and check your provider’s session behavior.
SSL error	Local CA trust issue	Use the correct CA bundle path; keep `verify=False` for local testing only.
Wrong region with SOCKS	DNS resolved on the client	Try `socks5h` instead of `socks5`.

Compliance Note

Use proxy rotation for legitimate engineering work such as QA, uptime checks, ad verification, authorized regional testing, or data collection you are allowed to perform. Keep your request volume reasonable, review the target site’s terms and robots directives where applicable, and document the internal purpose of the scraper before you scale it.

Proxy rotation is not a substitute for sloppy request design. If your headers, session model, retry behavior, or pacing are wrong, adding more endpoints just spreads the same mistake across a larger pool.

If you want to validate this setup with a provider that publicly lists Python and Scrapy support, multiple proxy types, broad regional coverage, and a free test path, Proxy001 is a practical next step for that stage of implementation. The site says it offers static, dynamic, residential, and mobile proxy services and access to 100M+ premium IPs in 200+ regions, which is enough to test whether your rotation logic behaves the way you expect before you expand the pool.

DEV Community

How to Set Up a Proxy Rotation System for Your Scrapy or Requests-Based Scraper

Prerequisites

How Does a Proxy Rotation System Actually Work?

How Do You Rotate Proxies in Requests Without a Wrapper?

Verify Requests Rotation

`socks5` vs `socks5h`

How Do You Rotate Proxies in Scrapy Without a Wrapper?

Which Rotation Mode Should You Use?

Which Proxy Type Fits This System?

How Do You Know the Rotation Is Actually Working?

1. Validate one proxy first

2. Validate the rotator on an IP-echo endpoint

3. Add one intentionally bad proxy

4. Only then hit your real target

Why Are You Still Getting 403, 407, or 429 Errors?

Quick fixes table

Compliance Note

Top comments (0)

Prerequisites

How Does a Proxy Rotation System Actually Work?

How Do You Rotate Proxies in Requests Without a Wrapper?

Verify Requests Rotation

socks5 vs socks5h

How Do You Rotate Proxies in Scrapy Without a Wrapper?

Which Rotation Mode Should You Use?

Which Proxy Type Fits This System?

How Do You Know the Rotation Is Actually Working?

1. Validate one proxy first

2. Validate the rotator on an IP-echo endpoint

3. Add one intentionally bad proxy

4. Only then hit your real target

Why Are You Still Getting 403, 407, or 429 Errors?

Quick fixes table

Compliance Note

`socks5` vs `socks5h`