DEV Community

Cover image for Rotating proxies for Scrapy, Playwright and requests — one small library
Billy
Billy

Posted on

Rotating proxies for Scrapy, Playwright and requests — one small library

If you scrape anything at scale, you know the drill: proxies die, sites start returning 403/429, and your run grinds to a halt. The classic Scrapy answer, scrapy-rotating-proxies, hasn't seen a real update in years — and it's Scrapy-only, so the moment you reach for Playwright or plain requests, you're rebuilding rotation from scratch.

I wanted one small proxy pool I could share across all three. So I wrote proxyspin: a rotating pool with health tracking, ban detection and sticky sessions, and thin adapters for Scrapy, Playwright and requests. Zero required dependencies, pure standard library.

pip install proxyspin
Enter fullscreen mode Exit fullscreen mode

The pool

Everything is built around one object. Load proxies from a list, a file, or a URL:

from proxyspin import ProxyPool

pool = ProxyPool.from_file("proxies.txt", strategy="round_robin")
# or inline / from your provider's export endpoint:
pool = ProxyPool(["http://user:pass@gate1.example.com:8000", "10.0.0.2:8000"])
pool = ProxyPool.from_url("https://example.com/api/my-list.txt")

proxy = pool.get()        # -> Proxy; proxy.url is ready to use
pool.mark_failed(proxy)   # bench it after repeated failures
pool.mark_ok(proxy)       # reset its failure streak
Enter fullscreen mode Exit fullscreen mode

It parses every common list format — host:port, host:port:user:pass, user:pass@host:port, scheme://user:pass@host:port — for HTTP, HTTPS, SOCKS4 and SOCKS5.

Rotation strategies

  • round_robin — cycle through healthy proxies in order (default)
  • random — pick one at random
  • sticky — keep returning the same proxy for a given key (a target domain, an account id, a worker name) until it goes unhealthy

The health model

Every proxy starts healthy. mark_failed bumps its failure streak; when the streak hits max_failures (default 2) the proxy is benched with exponential backoff (cooldown * 2**overshoot, base 60s, capped at 1h), then automatically rejoins rotation. mark_ok resets the streak. So dead proxies quietly drop out and recover on their own — you never manually prune the list.

Scrapy

Enable the middleware and point it at your proxies. Ban detection and retry-through-the-next-proxy are automatic:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    "proxyspin.scrapy_middleware.ProxySpinMiddleware": 610,
}
PROXYSPIN_FILE = "proxies.txt"
PROXYSPIN_STRATEGY = "sticky"       # one proxy per target host
PROXYSPIN_BAN_CODES = [403, 429]    # these responses rotate the proxy
PROXYSPIN_MAX_RETRIES = 3
Enter fullscreen mode Exit fullscreen mode

Any 403/429 (configurable) marks the proxy as failed and retries the request through the next healthy one. No spider code changes.

Playwright

Playwright takes a proxy per browser context, which is the natural rotation unit — great for one-IP-per-account flows:

from playwright.sync_api import sync_playwright
from proxyspin import ProxyPool
from proxyspin.playwright_helper import proxy_settings

pool = ProxyPool.from_file("proxies.txt", strategy="sticky")

with sync_playwright() as p:
    browser = p.chromium.launch()
    for account in accounts:
        context = browser.new_context(proxy=proxy_settings(pool, key=account.id))
        # each account keeps its own IP for the whole session
Enter fullscreen mode Exit fullscreen mode

requests

A drop-in Session that rotates on every call and retries failures through another proxy:

from proxyspin import ProxyPool
from proxyspin.requests_adapter import RotatingSession

session = RotatingSession(ProxyPool.from_file("proxies.txt"))
print(session.get("https://httpbin.org/ip").json())   # new IP per call
Enter fullscreen mode Exit fullscreen mode

Check a list first

Bad proxies waste run time. The bundled CLI tests a whole list concurrently and writes out the survivors:

proxyspin check proxies.txt --workers 100 --alive-out alive.txt
Enter fullscreen mode Exit fullscreen mode
OK   45.155.10.4:8000        612 ms  HTTP 200
DEAD 91.10.77.2:3128                 TimeoutError
...
118/200 alive
wrote 118 proxies to alive.txt
Enter fullscreen mode Exit fullscreen mode

Getting proxies to test with

Want to try it right now without your own proxies? Bootstrap straight from a live public list:

pool = ProxyPool.from_url("https://raw.githubusercontent.com/gproxynet/free-proxy-list/main/all.txt")
Enter fullscreen mode Exit fullscreen mode

Fair warning: public proxies are unreliable by nature — they're shared, slow, and die within minutes. They're fine for kicking the tires, not for a real crawl. For production you'll want dedicated proxies (residential/mobile/datacenter); a pool of one gateway entry per endpoint is all proxyspin needs since rotation happens server-side.

Wrapping up

One pool, the same health model everywhere, and you stop babysitting dead proxies. The code is MIT-licensed and on GitHub — issues and PRs welcome. If you've been limping along on an unmaintained rotation middleware, give it a spin.

Top comments (0)