A workable proxy rotation system has five moving parts: a proxy pool, a selector, request-level proxy injection, failure tracking, and a verification step that proves the outgoing IP behavior matches your design. Requests supports per-request proxies, explicit timeouts, retries through adapters, and optional SOCKS support, while Scrapy’s proxy flow centers on request.meta['proxy'] and HttpProxyMiddleware.
If you already have a scraper and you want to add rotation without a wrapper library, the cleanest implementation is to keep one internal proxy model, rotate at the request layer, put unstable proxies on cooldown instead of deleting them immediately, and verify everything against an IP-echo endpoint before you touch your real target. Requests also documents that session.proxies can be overridden by environment proxy variables, which is why per-request proxy assignment is the safer default in a custom rotator.
Prerequisites
You need three things before you start:
- A scraper that already makes successful requests.
- A proxy list from your provider, either as full URLs or as host, port, username, and password fields.
- A decision about whether your workflow is stateless or session-sensitive.
That last decision matters because both Requests and Scrapy make per-request proxy assignment straightforward, but your target workflow may still require continuity across several requests. Scrapy’s proxy handling uses request metadata, and Requests lets you pass a fresh proxies dict on each call, so the library side is easy; the hard part is deciding when rotation should happen.
Use per-request rotation when each page fetch stands alone. Use a sticky session when login state, carts, multi-step forms, or account flows need the same network identity across several requests.
How Does a Proxy Rotation System Actually Work?
Keep the design simple:
- The proxy pool stores usable endpoints and their current state.
- The selector chooses the next proxy.
- The request hook injects that proxy into the request.
- The feedback loop marks success or failure.
- The verification step confirms that rotation is actually happening.
That model lines up with how public Scrapy rotation packages are described: they focus on rotating proxies, checking that endpoints are alive, and reacting to failures, not just loading a static text file. github
Your internal proxy record should hold at least:
idschemehostportusernamepasswordfail_countcooldown_untillast_status-
session_labelif your provider exposes sticky-session routing
Here is a minimal model:
from dataclasses import dataclass
from typing import Optional
import time
@dataclass
class ProxyEndpoint:
id: str
scheme: str
host: str
port: int
username: Optional[str] = None
password: Optional[str] = None
fail_count: int = 0
cooldown_until: float = 0.0
last_status: Optional[int] = None
session_label: Optional[str] = None
def as_url(self) -> str:
auth = ""
if self.username and self.password:
auth = f"{self.username}:{self.password}@"
return f"{self.scheme}://{auth}{self.host}:{self.port}"
def as_requests_proxies(self) -> dict:
url = self.as_url()
return {"http": url, "https": url}
And a small pool object:
import random
import time
class ProxyPool:
def __init__(self, proxies, max_failures=3, cooldown_seconds=120):
self.proxies = proxies
self.max_failures = max_failures
self.cooldown_seconds = cooldown_seconds
def available(self):
now = time.time()
return [p for p in self.proxies if p.cooldown_until <= now]
def choose(self):
candidates = self.available()
if not candidates:
raise RuntimeError("No proxies are currently available")
return random.choice(candidates)
def mark_success(self, proxy, status_code=None):
proxy.fail_count = 0
proxy.cooldown_until = 0.0
proxy.last_status = status_code
def mark_failure(self, proxy, status_code=None):
proxy.fail_count += 1
proxy.last_status = status_code
if proxy.fail_count >= self.max_failures:
proxy.cooldown_until = time.time() + self.cooldown_seconds
Don’t overcomplicate failure handling on day one. A timeout may be transient, a 429 usually points to pacing, and a 407 usually means the proxy URL or credentials are wrong.
How Do You Rotate Proxies in Requests Without a Wrapper?
Requests already gives you the pieces you need. The official advanced usage docs show per-request proxies, Session reuse, timeout tuples, proxy authentication in the proxy URL, SOCKS support, and retry support through HTTPAdapter and urllib3.util.Retry.
The values in the sample below are example starter values for this article’s reference implementation, not universal defaults. You should tune them after you run the verification section.
import time
import requests
from requests import Session
from requests.adapters import HTTPAdapter
from requests.exceptions import ProxyError, ConnectTimeout, ReadTimeout, SSLError, ConnectionError
from urllib3.util import Retry
def build_session():
session = Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; scraper/1.0)"
})
retries = Retry(total=0, backoff_factor=0, status_forcelist=[])
adapter = HTTPAdapter(max_retries=retries)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
def should_retry_status(status_code: int) -> bool:
return status_code in {403, 407, 429, 500, 502, 503, 504}
def backoff_seconds(attempt: int) -> float:
return min(2 ** attempt, 8)
def fetch_with_rotation(url: str, pool: ProxyPool, attempts: int = 5):
session = build_session()
last_error = None
for attempt in range(attempts):
proxy = pool.choose()
try:
response = session.get(
url,
proxies=proxy.as_requests_proxies(),
timeout=(3.05, 20),
)
if should_retry_status(response.status_code):
pool.mark_failure(proxy, response.status_code)
last_error = RuntimeError(f"Retryable status {response.status_code}")
time.sleep(backoff_seconds(attempt))
continue
pool.mark_success(proxy, response.status_code)
return response, proxy
except (ProxyError, ConnectTimeout, ReadTimeout, ConnectionError, SSLError) as exc:
pool.mark_failure(proxy)
last_error = exc
time.sleep(backoff_seconds(attempt))
raise RuntimeError(f"All attempts failed. Last error: {last_error}")
This pattern works well for three reasons. Requests says most external calls should have explicit timeouts because requests do not time out by default, and it documents the difference between a single timeout value and a (connect, read) tuple. Requests also notes that session.proxies can be overwritten by environmental proxy settings, which is why the code above passes the proxy at request time.
Use it like this:
proxies = [
ProxyEndpoint("p1", "http", "proxy1.example.com", 8000, "user", "pass"),
ProxyEndpoint("p2", "http", "proxy2.example.com", 8000, "user", "pass"),
]
pool = ProxyPool(proxies, max_failures=2, cooldown_seconds=90)
response, proxy = fetch_with_rotation("https://httpbin.org/ip", pool)
print("Used proxy:", proxy.id)
print("Response:", response.text)
Verify Requests Rotation
Do not point this at your production target first. Use an IP-echo endpoint and confirm:
- The selected proxy ID changes over several runs.
- The returned IP changes the way your provider says it should.
- A bad proxy enters cooldown and the code keeps moving.
A basic check loop is enough:
for i in range(10):
try:
r, p = fetch_with_rotation("https://httpbin.org/ip", pool)
print(i, p.id, r.status_code, r.text)
except Exception as e:
print(i, "failed", e)
The output shape you want is boring: several successful responses, changing proxy IDs, and no full-stop crash when one endpoint goes bad.
socks5 vs socks5h
Requests supports SOCKS as an optional feature, and its docs draw an important line between socks5 and socks5h: socks5 resolves DNS on the client, while socks5h resolves DNS on the proxy server. If hostname resolution or observed region looks wrong, check that first.
How Do You Rotate Proxies in Scrapy Without a Wrapper?
Scrapy’s proxy path is built around request.meta['proxy']. Zyte’s Scrapy proxy write-up explains that HttpProxyMiddleware reads the proxy value from request metadata, and Scrapy’s published middleware source shows the same thing in code.
So your custom middleware only needs to pick a proxy and attach it before the request reaches the downloader.
The values below are sample project values for the example code, not framework defaults.
# middlewares.py
import random
import time
from scrapy.exceptions import IgnoreRequest
class RotatingProxyMiddleware:
RETRYABLE_STATUSES = {403, 407, 429, 500, 502, 503, 504}
def __init__(self, proxies, max_failures=3, cooldown_seconds=120, max_retry_times=3):
self.proxies = proxies
self.max_failures = max_failures
self.cooldown_seconds = cooldown_seconds
self.max_retry_times = max_retry_times
@classmethod
def from_crawler(cls, crawler):
raw = crawler.settings.getlist("ROTATION_PROXY_LIST")
proxies = []
for idx, item in enumerate(raw):
proxies.append({
"id": f"p{idx+1}",
"url": item,
"fail_count": 0,
"cooldown_until": 0.0,
"last_status": None,
})
return cls(
proxies=proxies,
max_failures=crawler.settings.getint("ROTATION_MAX_FAILURES", 3),
cooldown_seconds=crawler.settings.getint("ROTATION_COOLDOWN_SECONDS", 120),
max_retry_times=crawler.settings.getint("ROTATION_MAX_RETRY_TIMES", 3),
)
def available(self):
now = time.time()
return [p for p in self.proxies if p["cooldown_until"] <= now]
def choose(self):
candidates = self.available()
if not candidates:
raise IgnoreRequest("No proxies currently available")
return random.choice(candidates)
def mark_success(self, proxy, status=None):
proxy["fail_count"] = 0
proxy["cooldown_until"] = 0.0
proxy["last_status"] = status
def mark_failure(self, proxy, status=None):
proxy["fail_count"] += 1
proxy["last_status"] = status
if proxy["fail_count"] >= self.max_failures:
proxy["cooldown_until"] = time.time() + self.cooldown_seconds
def process_request(self, request, spider):
if request.meta.get("disable_proxy_rotation"):
return None
if "rotation_proxy" not in request.meta:
proxy = self.choose()
request.meta["rotation_proxy"] = proxy
request.meta["proxy"] = proxy["url"]
return None
def process_response(self, request, response, spider):
proxy = request.meta.get("rotation_proxy")
if not proxy:
return response
if response.status in self.RETRYABLE_STATUSES:
self.mark_failure(proxy, response.status)
retry_times = request.meta.get("rotation_retry_times", 0) + 1
if retry_times <= self.max_retry_times:
new_request = request.copy()
new_request.dont_filter = True
new_request.meta.pop("rotation_proxy", None)
new_request.meta.pop("proxy", None)
new_request.meta["rotation_retry_times"] = retry_times
return new_request
else:
self.mark_success(proxy, response.status)
return response
def process_exception(self, request, exception, spider):
proxy = request.meta.get("rotation_proxy")
if proxy:
self.mark_failure(proxy)
retry_times = request.meta.get("rotation_retry_times", 0) + 1
if retry_times <= self.max_retry_times:
new_request = request.copy()
new_request.dont_filter = True
new_request.meta.pop("rotation_proxy", None)
new_request.meta.pop("proxy", None)
new_request.meta["rotation_retry_times"] = retry_times
return new_request
return None
Use settings like this:
# settings.py
DOWNLOADER_MIDDLEWARES = {
"myproject.middlewares.RotatingProxyMiddleware": 610,
}
ROTATION_PROXY_LIST = [
"http://user:pass@proxy1.example.com:8000",
"http://user:pass@proxy2.example.com:8000",
]
ROTATION_MAX_FAILURES = 2
ROTATION_COOLDOWN_SECONDS = 90
ROTATION_MAX_RETRY_TIMES = 3
DOWNLOAD_TIMEOUT = 20
RETRY_ENABLED = False
And a tiny spider:
# spiders/ip_test.py
import scrapy
class IpSpider(scrapy.Spider):
name = "ip_test"
def start_requests(self):
for _ in range(10):
yield scrapy.Request("https://httpbin.org/ip")
def parse(self, response):
self.logger.info("IP response: %s", response.text)
If you need one request to bypass the rotator, disable it explicitly:
yield scrapy.Request(
"https://internal.example.net/health",
meta={"disable_proxy_rotation": True}
)
This is the key fix that many tutorials skip: once the middleware is global, your existing spiders should not need per-request “auto proxy” markers. The middleware owns rotation unless a request opts out.
Which Rotation Mode Should You Use?
Use this quick decision rule:
| Workflow | Better choice |
|---|---|
| Public pages where each request stands alone | Per-request rotation |
| Login, cart, checkout, or account steps | Sticky session |
| Multi-page QA flow that must keep the same identity | Sticky session |
| Regional availability checks from several locations | Per-request rotation |
If request 2 would break when it comes from a different network identity than request 1, don’t use per-request rotation.
Which Proxy Type Fits This System?
In general, residential proxies are a better fit when the workflow is region-aware or trust-sensitive, ISP/static endpoints are better when a session must stay stable for longer, and datacenter proxies are a better fit for simpler internal checks where speed and cost matter more than continuity. Proxy001’s homepage says the service offers static, dynamic, residential, and mobile proxy services, supports Python and Scrapy, and provides access to 100M+ premium IPs across 200+ regions with a free test option.
That’s the natural point to validate your custom rotator against a commercial pool: after your code works against an IP-echo endpoint and before you use it in a larger workflow.
How Do You Know the Rotation Is Actually Working?
Run verification in this order:
1. Validate one proxy first
Check that one known-good endpoint authenticates correctly, honors your timeout settings, and works over HTTPS.
2. Validate the rotator on an IP-echo endpoint
Log:
- selected proxy ID,
- returned IP,
- status code,
- elapsed time,
- and whether the proxy entered cooldown.
3. Add one intentionally bad proxy
This proves your failure path actually works. You want to see the bad endpoint fail, accumulate failures, enter cooldown, and stop dragging the whole scraper down.
4. Only then hit your real target
If IP rotation works but the target still returns repeated 403s or 429s, the next place to look is pacing, session continuity, headers, or provider quality.
Why Are You Still Getting 403, 407, or 429 Errors?
A 407 Proxy Authentication Required usually means a bad credential set or a malformed proxy URL. Requests documents proxy auth directly inside the proxy URL, and Scrapy’s proxy path uses the same URL form through request.meta['proxy'].
A 403 does not automatically mean the proxy is dead. It can also mean the workflow is incomplete, the cookies are wrong, or you rotated when the server expected continuity.
A 429 usually points to pacing. Lower your request rate, add backoff, and stop retrying the same path too aggressively.
A hang usually means missing timeouts. Requests says most external requests should use explicit timeouts because otherwise the code may wait for minutes or more.
An SSL failure may come from certificate trust settings. Requests verifies HTTPS certificates by default, allows a CA bundle path, and warns that verify=False weakens security and should be limited to local testing.
A DNS mismatch with SOCKS usually means you chose socks5 when you needed socks5h. Requests documents that distinction directly.
Quick fixes table
| Symptom | Likely cause | Fix |
|---|---|---|
| 407 | Bad credentials or malformed proxy URL | Rebuild the URL as scheme://user:pass@host:port and test one endpoint first. |
| 403 on a stateful flow | You rotated too often | Move that workflow to sticky session mode. |
| 429 | Request pacing is too high | Slow down and add backoff before retrying. |
| Same exit IP every time | Provider-side sticky routing or a non-rotating endpoint | Test over a longer window and check your provider’s session behavior. |
| SSL error | Local CA trust issue | Use the correct CA bundle path; keep verify=False for local testing only. |
| Wrong region with SOCKS | DNS resolved on the client | Try socks5h instead of socks5. |
Compliance Note
Use proxy rotation for legitimate engineering work such as QA, uptime checks, ad verification, authorized regional testing, or data collection you are allowed to perform. Keep your request volume reasonable, review the target site’s terms and robots directives where applicable, and document the internal purpose of the scraper before you scale it.
Proxy rotation is not a substitute for sloppy request design. If your headers, session model, retry behavior, or pacing are wrong, adding more endpoints just spreads the same mistake across a larger pool.
If you want to validate this setup with a provider that publicly lists Python and Scrapy support, multiple proxy types, broad regional coverage, and a free test path, Proxy001 is a practical next step for that stage of implementation. The site says it offers static, dynamic, residential, and mobile proxy services and access to 100M+ premium IPs in 200+ regions, which is enough to test whether your rotation logic behaves the way you expect before you expand the pool.
Top comments (0)