This is a runnable mini-lab for evaluating rotating residential proxies in scraping and monitoring. You’ll generate evidence in 60–90 minutes: rotation proof, sticky-session proof, pool collision metrics under concurrency, a ramp-and-soak signal report, and CP1K. The deeper acceptance gates live in the hub: Rotating Residential Proxies Evaluation Playbook for Web Scraping in 2026
Run the exact same harness against every provider you’re considering, including MaskProxy, so your results compare cleanly. Decide up front what “success” means for your pipeline (not just HTTP 200), and set a hard request budget so you don’t burn time chasing noisy data.
Set scope, budget, and evidence fields
Pick two targets you are allowed to test:
Easy target: stable baseline for exit IP and latency (an IP echo endpoint works).
Defended target: a real site that matches your production pain (price intel, availability checks, SERP monitoring), tested within policy and terms.
Write down a request budget and stop conditions:
Stop if 403 or 429 rates stay high for 2–3 minutes.
Stop if p95 latency doubles and stays there.
Stop if challenge pages dominate “success.”
If you need a quick reference for rotation modes you’ll see in the wild (rotate every request vs sticky for N minutes), keep the definition consistent with how you set your session model: Rotating Proxies
Log one JSON record per request. Keep the fields stable across all tests so you can diff runs and compute metrics without hand-waving:
ts, test, target, url, status, latency_ms, exit_ip, session, bytes, retry, sig
Build the tiny harness with JSONL logs
Create a timestamped run folder and write one JSON line per request. This makes your lab reproducible and audit-friendly when someone asks “what exactly did we measure?”
lab.py
import os, json, time, uuid, asyncio
from typing import Optional, Dict, Any
import httpx
RUN_ID = time.strftime("%Y%m%d-%H%M%S") + "-" + uuid.uuid4().hex[:6]
OUTDIR = f"./runs/{RUN_ID}"
os.makedirs(OUTDIR, exist_ok=True)
LOG_PATH = f"{OUTDIR}/requests.jsonl"
EASY_URL = os.getenv("EASY_URL", "https://api.ipify.org?format=json")
DEFENDED_URL = os.getenv("DEFENDED_URL", "https://example.com/")
MAX_REQUESTS = int(os.getenv("MAX_REQUESTS", "4000"))
MAX_MINUTES = int(os.getenv("MAX_MINUTES", "90"))
PROXY_URL = os.getenv("PROXY_URL") # http://user:pass@host:port
TIMEOUT_S = float(os.getenv("TIMEOUT_S", "20"))
MAX_RETRIES = int(os.getenv("MAX_RETRIES", "2"))
def jlog(rec: Dict[str, Any]) -> None:
with open(LOG_PATH, "a", encoding="utf-8") as f:
f.write(json.dumps(rec, ensure_ascii=False) + "\n")
Wrap requests with timing, retries, and signatures
You need three things on every request: exit IP, latency, and a lightweight signature that labels rate limiting, blocking, or challenge behavior. For HTTP semantics, RFC 9110 is the baseline reference when you’re debugging edge behavior: RFC 9110
CHALLENGE_MARKERS = ["captcha", "challenge", "cf-chl", "recaptcha", "px-captcha", "akamai"]
def classify(status: int, body_text: str) -> str:
lower = (body_text or "").lower()
if status == 429:
return "rate_limited"
if status == 403:
return "blocked"
if any(m in lower for m in CHALLENGE_MARKERS):
return "soft_challenge"
if status == 0:
return "error"
return "ok"
async def get_exit_ip(client: httpx.AsyncClient, session: Optional[str]) -> str:
headers = {"User-Agent": "eval-lab/1.0"}
if session:
headers["X-Session"] = session # map to your provider’s sticky-session mechanism
r = await client.get(EASY_URL, headers=headers, timeout=TIMEOUT_S)
return r.json().get("ip", "")
async def fetch(client: httpx.AsyncClient, test: str, target: str, url: str,
session: Optional[str]=None) -> Dict[str, Any]:
headers = {"User-Agent": "eval-lab/1.0"}
if session:
headers["X-Session"] = session
start = time.time()
last_err = None
for attempt in range(MAX_RETRIES + 1):
try:
r = await client.get(url, headers=headers, timeout=TIMEOUT_S, follow_redirects=True)
latency_ms = int((time.time() - start) * 1000)
body = (r.text[:2000] if "text" in (r.headers.get("content-type") or "") else "")
sig = classify(r.status_code, body)
rec = {
"ts": int(time.time()),
"test": test,
"target": target,
"url": url,
"status": r.status_code,
"latency_ms": latency_ms,
"session": session,
"bytes": len(r.content or b""),
"retry": attempt,
"sig": sig,
}
jlog(rec)
return rec
except Exception as e:
last_err = repr(e)
if attempt == MAX_RETRIES:
rec = {
"ts": int(time.time()),
"test": test,
"target": target,
"url": url,
"status": 0,
"latency_ms": int((time.time() - start) * 1000),
"session": session,
"bytes": 0,
"retry": attempt,
"sig": "error",
"err": last_err,
}
jlog(rec)
return rec
await asyncio.sleep(0.5 * (2 ** attempt))
Prove rotation and sticky sessions with a repeatable test
This test answers two questions that matter for “rotating residential proxies free trial” evaluations:
Does the pool rotate when you do not pin a session?
Does the exit IP stay stable when you do pin a session?
def uniq(seq): return len(set(seq))
async def test_rotation_and_sticky():
async with httpx.AsyncClient(proxies=PROXY_URL) as client:
rot_ips = [await get_exit_ip(client, session=None) for _ in range(30)]
sticky_a = [await get_exit_ip(client, session="A") for _ in range(15)]
sticky_b = [await get_exit_ip(client, session="B") for _ in range(15)]
print("rotation unique:", uniq(rot_ips), "of", len(rot_ips))
print("sticky A unique:", uniq(sticky_a), "of", len(sticky_a))
print("sticky B unique:", uniq(sticky_b), "of", len(sticky_b))
print("A vs B overlap:", len(set(sticky_a) & set(sticky_b)))
Expected signals:
Rotation should produce meaningfully more unique IPs than sticky.
Sticky A should be mostly stable, and sticky B should differ from sticky A most of the time.
If rotation uniqueness is tiny, you’re functionally testing a small shared pool with frequent IP reuse.
If you want a product-level reference point while you interpret these rotation and sticky behaviors, use: Rotating Residential Proxies
Measure pool collisions and IP reuse under concurrency
Collisions are the hidden killer for scraping throughput. If 100 workers share 10 exit IPs, one IP-level reputation event becomes a fleet-wide failure pattern. This is where providers can look fine in single-thread tests and fall apart under real concurrency.
Run a micro-burst at your expected in-flight concurrency (50–200). Keep it short and measurable.
async def burst_collisions(concurrency=80, total=400):
sem = asyncio.Semaphore(concurrency)
async with httpx.AsyncClient(proxies=PROXY_URL) as client:
async def one():
async with sem:
ip = await get_exit_ip(client, session=None)
jlog({"ts": int(time.time()), "test": "burst_ip", "target": "easy", "exit_ip": ip})
return ip
ips = await asyncio.gather(*[one() for _ in range(total)])
uniq_ips = len(set(ips))
collision_rate = 1 - (uniq_ips / max(1, len(ips)))
print("total:", len(ips), "unique:", uniq_ips, "collision_rate:", round(collision_rate, 3))
Expected signals:
Collision rate increases with concurrency, but it should not instantly collapse into a handful of IPs.
If you see heavy top-IP concentration, treat “residential proxy pool capacity” as a gating risk for monitoring jobs.
Run this exact burst against MaskProxy and any other candidate pool with the same concurrency and total requests. You want to see which pool degrades first.
Run a ramp-and-soak and collect p95, 429, 403, and challenge signals
Use a simple load shape: warm-up → ramp → soak. This makes “day-3 style” decay show up within one session.
When you interpret 429, don’t invent semantics. The definitive reference for 429 is RFC 6585: RFC 6585
For practical status summaries, MDN is a useful quick check: MDN 429
and MDN 403
async def ramp_soak():
phases = [
("warmup", 2*60, 20),
("ramp", 8*60, 60),
("soak", 15*60, 60),
]
async with httpx.AsyncClient(proxies=PROXY_URL) as client:
for name, seconds, conc in phases:
end = time.time() + seconds
while time.time() < end:
sem = asyncio.Semaphore(conc)
async def one():
async with sem:
return await fetch(client, name, "defended", DEFENDED_URL, session=None)
await asyncio.gather(*[one() for _ in range(conc)])
What you’re looking for:
p95 latency drift during soak suggests pool saturation, retry amplification, or target throttling.
sustained 429 indicates a rate limit wall; sustained 403 indicates refusal or policy blocks.
“soft_challenge” is a failure for most pipelines unless you explicitly solve it.
Compute CP1K from collected numbers
CP1K is cost per 1,000 successful requests. Define success as what your pipeline needs: for many scraping workloads it’s “2xx and not a challenge page.”
Start with a simple model: total cost in USD for the run window (your plan proration + traffic charges if applicable), divided by successes per 1,000.
import json
def compute_cp1k(log_path: str, total_cost_usd: float) -> None:
attempts = 0
successes = 0
with open(log_path, "r", encoding="utf-8") as f:
for line in f:
rec = json.loads(line)
if rec.get("test") not in ("warmup", "ramp", "soak"):
continue
attempts += 1
status = rec.get("status", 0)
sig = rec.get("sig")
if 200 <= status < 300 and sig == "ok":
successes += 1
cp1k = (total_cost_usd / (successes / 1000)) if successes else float("inf")
print("attempts:", attempts, "successes:", successes, "CP1K_USD:", round(cp1k, 2))
When you plug in pricing inputs, use the correct unit basis (per GB vs per request vs plan minimum) so CP1K doesn’t lie. For a concrete pricing reference when you compute CP1K, use: Rotating Residential Proxies Pricing
If MaskProxy yields stable soak signals but a higher CP1K than another pool, you now have a real tradeoff discussion: reliability and operability versus raw unit cost.
Wrap-up
You should now have a run folder with JSONL evidence: rotation and sticky behavior, collision rate under concurrency, ramp-and-soak stability, and a CP1K number you can defend in a go or no-go review. If you want the decision structure that turns these signals into acceptance criteria, close the loop with the hub: Rotating Residential Proxies Evaluation Playbook for Web Scraping in 2026
Top comments (0)