Miller James

Posted on Apr 20

Load Balancing Across a Residential Proxy Pool in Production

#algorithms #networking #performance #systemdesign

A production residential proxy pool should optimize for stable throughput, controlled per-IP pressure, and fast recovery from bad nodes. It should not aim for perfectly equal request counts, because residential endpoints vary in latency, availability, and session stability in ways homogeneous server pools usually do not.

That’s why plain round robin is a weak default. Weighted or score-based routing works better because it can shift traffic away from proxies that are overloaded, degraded, or temporarily unavailable.

There’s also an important compliance line here. If a target starts returning 429 or repeated 403 responses, the correct move is to reduce or suspend traffic for that target and review your allowed request budget, not to use routing logic to keep pressing the same target through other IPs. [

What does “balanced” mean in a residential proxy pool?

Balanced means traffic is distributed according to each proxy’s current health and usable capacity, not split into identical request counts.

For a real residential proxy pool, measure these fields per proxy:

Success rate over a rolling window.
P95 latency.
Current in-flight requests.
Recent timeout and connection-error count.
Recent 429 and 403 signals by target host, when those signals apply to your workflow. If you don’t record those metrics, you can still rotate proxies, but you can’t prove you’re balancing them safely. That’s the difference between a proxy list and a production scheduler. joinmassive

What components do you need in production?

You need five components: a proxy registry, a scheduler, a feedback loop, a session-affinity map, and metrics storage. joinmassive

The proxy registry tracks endpoint details and live state such as in_flight, recent outcomes, and cooldown status. The scheduler selects the next proxy. The feedback loop updates proxy health after each request. The session-affinity map preserves continuity for workflows that need it. Metrics storage gives you the evidence to verify that load distribution is improving rather than just moving failures around. my.f5

Session affinity is worth using only when the task actually needs continuity, such as cookie-bound pagination or a multi-step account flow. If requests are independent, long-lived affinity usually makes distribution worse. sites.google

Which scheduling algorithm should you use?

For a residential proxy pool, score-based weighted selection is the best general default. proxidize

Round robin is easy to implement, but it ignores live differences in latency and error rate. Weighted round robin is better when capacity differences are stable, yet it still does not respond to fresh timeout bursts or sudden degradation unless you update weights from recent outcomes. docs.oracle

Use this rule:

Independent requests: score-based routing.
Session-bound workflows: score-based routing plus session affinity.
New pool with little feedback data: weighted round robin as a temporary fallback until enough outcome data exists to score nodes credibly. proxidize

How should you score each proxy?

A useful health score combines recent success, latency, current load, and cooldown state. If you ignore current load, a proxy can look healthy historically while still being the one you’re about to overload. joinmassive

A practical model is:

[
score = success_factor \times latency_factor \times load_factor \times cooldown_factor \times penalty_factor
]

Interpret the factors this way:

success_factor: recent success rate over your rolling window.
latency_factor: inverse weight from recent latency, with a cap so a few fast outliers do not dominate.
load_factor: penalty based on current in_flight pressure.
cooldown_factor: zero when the proxy is cooling down, one when it is eligible again.
penalty_factor: extra downgrade after repeated transport failures. joinmassive

For compliant operations, treat failure types differently:

Timeout or connect error: lower the proxy’s score and retry only within your approved retry budget.
429 from a target host: suspend or sharply reduce traffic to that target and review whether your approved frequency has been exceeded.
Repeated 403 from a target host: stop the workflow for that target and review authorization, session design, and request necessity before sending more traffic. bleepingcomputer

What provider-specific values do you need before the code will work?

Before you run any scheduler code, collect the provider-specific fields from your proxy vendor’s dashboard or API docs. This is the part many examples skip, and it is where otherwise-correct code often breaks in production. ppl-ai-file-upload.s3.amazonaws

You need:

Gateway host and port.
Username and password format.
How region targeting is encoded.
Whether the provider supports session affinity, and how that key is passed.
Whether the provider expects HTTP, HTTPS, or SOCKS5 proxy URLs.
Any documented plan limits that affect concurrency or session duration.

Do not guess these values. Proxy vendors often encode region or session parameters differently, and those formats are not interchangeable across providers. ppl-ai-file-upload.s3.amazonaws

How do you implement the scheduler in code?

HTTPX supports async clients and proxy configuration, which makes it a reasonable base for an application-level scheduler in Python. python-httpx

The code below is intentionally split into two parts:

Provider-specific settings you must fill from your vendor documentation.
Policy settings you must fill from your own approved traffic budget and internal rules.

That separation is deliberate. It keeps the article actionable without inventing proxy vendor syntax or target-rate values that should come from live docs and approved operating policy. python-httpx

Prerequisites

Use:

Python 3.11+
httpx
asyncio

Install:

pip install httpx

Create scheduler.py:

import asyncio
import os
import random
import statistics
import time
from collections import defaultdict, deque
from dataclasses import dataclass, field
from typing import Optional
from urllib.parse import urlparse

import httpx


class TargetSuspendedError(RuntimeError):
    pass


def required_env(name: str) -> str:
    value = os.getenv(name)
    if not value:
        raise RuntimeError(f"Missing required environment variable: {name}")
    return value


def required_int(name: str) -> int:
    return int(required_env(name))


@dataclass
class RuntimePolicy:
    outcome_window: int
    latency_window: int
    max_in_flight_per_proxy: int
    timeout_cooldown_seconds: int
    hard_failure_threshold: int
    per_proxy_rate_limit: int
    per_proxy_rate_window_seconds: int
    request_timeout_seconds: int
    retry_budget: int
    target_suspend_seconds: int
    repeated_forbidden_threshold: int


@dataclass
class ProxyNode:
    name: str
    proxy_url: str
    max_in_flight: int
    outcome_window: int
    latency_window: int
    in_flight: int = 0
    cooldown_until: float = 0.0
    hard_failures: int = 0
    outcome_samples: deque = field(init=False)
    latency_samples: deque = field(init=False)
    request_timestamps: deque = field(default_factory=deque)

    def __post_init__(self):
        self.outcome_samples = deque(maxlen=self.outcome_window)
        self.latency_samples = deque(maxlen=self.latency_window)

    def available(self) -> bool:
        return time.time() >= self.cooldown_until and self.in_flight < self.max_in_flight

    def success_rate(self) -> float:
        if not self.outcome_samples:
            return 1.0
        return max(0.1, sum(self.outcome_samples) / len(self.outcome_samples))

    def p95_latency(self) -> float:
        if not self.latency_samples:
            return 1.0
        vals = sorted(self.latency_samples)
        idx = max(0, min(len(vals) - 1, int(len(vals) * 0.95) - 1))
        return max(0.05, vals[idx])

    def health_score(self) -> float:
        if time.time() < self.cooldown_until:
            return 0.0

        success_factor = self.success_rate()
        latency_factor = min(2.0, 1.5 / self.p95_latency())
        load_ratio = self.in_flight / max(1, self.max_in_flight)
        load_factor = max(0.1, 1.0 - load_ratio)
        penalty_factor = 0.5 if self.hard_failures > 0 else 1.0

        return success_factor * latency_factor * load_factor * penalty_factor


class PerProxyRateLimiter:
    def __init__(self, max_requests: int, period_seconds: int):
        self.max_requests = max_requests
        self.period_seconds = period_seconds

    async def acquire(self, node: ProxyNode):
        while True:
            now = time.time()

            while node.request_timestamps and now - node.request_timestamps[0] > self.period_seconds:
                node.request_timestamps.popleft()

            if len(node.request_timestamps) < self.max_requests:
                node.request_timestamps.append(now)
                return

            wait_for = self.period_seconds - (now - node.request_timestamps[0]) + 0.01
            await asyncio.sleep(max(0.05, wait_for))


class TargetPolicyGate:
    def __init__(self, suspend_seconds: int, repeated_forbidden_threshold: int):
        self.suspend_seconds = suspend_seconds
        self.repeated_forbidden_threshold = repeated_forbidden_threshold
        self.suspended_until: dict[str, float] = {}
        self.forbidden_counts: dict[str, int] = defaultdict(int)

    def _host(self, url: str) -> str:
        return urlparse(url).netloc

    def assert_allowed(self, url: str):
        host = self._host(url)
        suspended_until = self.suspended_until.get(host, 0)
        if time.time() < suspended_until:
            raise TargetSuspendedError(
                f"Traffic to {host} is suspended until {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(suspended_until))}"
            )

    def record_success(self, url: str):
        host = self._host(url)
        self.forbidden_counts[host] = 0

    def record_429(self, url: str):
        host = self._host(url)
        self.suspended_until[host] = time.time() + self.suspend_seconds

    def record_403(self, url: str):
        host = self._host(url)
        self.forbidden_counts[host] += 1
        if self.forbidden_counts[host] >= self.repeated_forbidden_threshold:
            self.suspended_until[host] = time.time() + self.suspend_seconds


class ProxyScheduler:
    def __init__(self, nodes: list[ProxyNode]):
        self.nodes = nodes
        self.session_affinity_map: dict[str, str] = {}

    def _find_by_name(self, name: str) -> Optional[ProxyNode]:
        for node in self.nodes:
            if node.name == name:
                return node
        return None

    def choose(self, session_key: Optional[str] = None) -> ProxyNode:
        if session_key and session_key in self.session_affinity_map:
            preferred = self._find_by_name(self.session_affinity_map[session_key])
            if preferred and preferred.available():
                preferred.in_flight += 1
                return preferred

        candidates = [node for node in self.nodes if node.available()]
        if not candidates:
            raise RuntimeError("No available proxies")

        weights = [max(0.01, node.health_score()) for node in candidates]
        chosen = random.choices(candidates, weights=weights, k=1)[0]
        chosen.in_flight += 1

        if session_key:
            self.session_affinity_map[session_key] = chosen.name

        return chosen

    def release(self, node: ProxyNode):
        node.in_flight = max(0, node.in_flight - 1)

    def record_success(self, node: ProxyNode, latency: float):
        node.outcome_samples.append(1)
        node.latency_samples.append(latency)
        node.hard_failures = max(0, node.hard_failures - 1)
        self.release(node)

    def record_transport_failure(self, node: ProxyNode, cooldown_seconds: int, threshold: int):
        node.outcome_samples.append(0)
        node.hard_failures += 1
        if node.hard_failures >= threshold:
            node.cooldown_until = time.time() + cooldown_seconds
        self.release(node)


async def fetch_with_scheduler(
    client: httpx.AsyncClient,
    scheduler: ProxyScheduler,
    rate_limiter: PerProxyRateLimiter,
    target_gate: TargetPolicyGate,
    policy: RuntimePolicy,
    url: str,
    session_key: Optional[str] = None,
):
    target_gate.assert_allowed(url)
    last_error = None

    for attempt in range(policy.retry_budget + 1):
        node = scheduler.choose(session_key=session_key)
        await rate_limiter.acquire(node)
        started = time.perf_counter()

        try:
            response = await client.get(
                url,
                proxy=node.proxy_url,
                timeout=policy.request_timeout_seconds,
            )
            latency = time.perf_counter() - started

            if response.status_code == 429:
                scheduler.release(node)
                target_gate.record_429(url)
                raise TargetSuspendedError(
                    f"Received 429 from {urlparse(url).netloc}; target workflow suspended for policy review."
                )

            if response.status_code == 403:
                scheduler.release(node)
                target_gate.record_403(url)
                raise TargetSuspendedError(
                    f"Received 403 from {urlparse(url).netloc}; target workflow halted pending authorization review."
                )

            response.raise_for_status()
            scheduler.record_success(node, latency)
            target_gate.record_success(url)

            return {
                "proxy": node.name,
                "status_code": response.status_code,
                "latency": round(latency, 3),
                "attempt": attempt,
                "host": urlparse(url).netloc,
            }

        except TargetSuspendedError:
            raise
        except (httpx.ConnectError, httpx.ReadTimeout, httpx.ConnectTimeout) as exc:
            scheduler.record_transport_failure(
                node,
                cooldown_seconds=policy.timeout_cooldown_seconds,
                threshold=policy.hard_failure_threshold,
            )
            last_error = exc
            continue
        except Exception as exc:
            scheduler.record_transport_failure(
                node,
                cooldown_seconds=policy.timeout_cooldown_seconds,
                threshold=policy.hard_failure_threshold,
            )
            last_error = exc
            continue

    raise last_error or RuntimeError("Request failed after allowed retries")


def summarize_results(rows: list[dict]):
    grouped = defaultdict(list)
    for row in rows:
        grouped[row["proxy"]].append(row)

    summary = []
    for proxy, items in grouped.items():
        latencies = [x["latency"] for x in items]
        summary.append(
            {
                "proxy": proxy,
                "requests": len(items),
                "avg_latency": round(statistics.mean(latencies), 3) if latencies else None,
                "p95_latency": round(sorted(latencies)[max(0, int(len(latencies) * 0.95) - 1)], 3)
                if latencies
                else None,
            }
        )
    return sorted(summary, key=lambda x: x["proxy"])


def load_policy() -> RuntimePolicy:
    return RuntimePolicy(
        outcome_window=required_int("OUTCOME_WINDOW"),
        latency_window=required_int("LATENCY_WINDOW"),
        max_in_flight_per_proxy=required_int("MAX_IN_FLIGHT_PER_PROXY"),
        timeout_cooldown_seconds=required_int("TIMEOUT_COOLDOWN_SECONDS"),
        hard_failure_threshold=required_int("HARD_FAILURE_THRESHOLD"),
        per_proxy_rate_limit=required_int("PER_PROXY_RATE_LIMIT"),
        per_proxy_rate_window_seconds=required_int("PER_PROXY_RATE_WINDOW_SECONDS"),
        request_timeout_seconds=required_int("REQUEST_TIMEOUT_SECONDS"),
        retry_budget=required_int("RETRY_BUDGET"),
        target_suspend_seconds=required_int("TARGET_SUSPEND_SECONDS"),
        repeated_forbidden_threshold=required_int("REPEATED_FORBIDDEN_THRESHOLD"),
    )


def build_proxy_nodes(policy: RuntimePolicy) -> list[ProxyNode]:
    proxy_urls = [
        required_env("PROXY_URL_1"),
        required_env("PROXY_URL_2"),
        required_env("PROXY_URL_3"),
    ]

    nodes = []
    for idx, proxy_url in enumerate(proxy_urls, start=1):
        nodes.append(
            ProxyNode(
                name=f"p{idx}",
                proxy_url=proxy_url,
                max_in_flight=policy.max_in_flight_per_proxy,
                outcome_window=policy.outcome_window,
                latency_window=policy.latency_window,
            )
        )
    return nodes


async def main():
    policy = load_policy()
    nodes = build_proxy_nodes(policy)

    scheduler = ProxyScheduler(nodes)
    rate_limiter = PerProxyRateLimiter(
        max_requests=policy.per_proxy_rate_limit,
        period_seconds=policy.per_proxy_rate_window_seconds,
    )
    target_gate = TargetPolicyGate(
        suspend_seconds=policy.target_suspend_seconds,
        repeated_forbidden_threshold=policy.repeated_forbidden_threshold,
    )

    test_url = required_env("TARGET_URL")
    total_requests = required_int("TOTAL_REQUESTS")

    async with httpx.AsyncClient(
        headers={"User-Agent": "OpsValidation/1.0"},
        limits=httpx.Limits(max_connections=required_int("MAX_CONNECTIONS")),
    ) as client:
        tasks = []
        for i in range(total_requests):
            session_key = f"session-{i // 4}"
            tasks.append(
                fetch_with_scheduler(
                    client=client,
                    scheduler=scheduler,
                    rate_limiter=rate_limiter,
                    target_gate=target_gate,
                    policy=policy,
                    url=test_url,
                    session_key=session_key,
                )
            )

        results = await asyncio.gather(*tasks, return_exceptions=True)
        ok = [r for r in results if isinstance(r, dict)]
        errors = [str(r) for r in results if not isinstance(r, dict)]

        print("SUMMARY")
        for row in summarize_results(ok):
            print(row)

        print("\nERRORS")
        for err in errors[:20]:
            print(err)


if __name__ == "__main__":
    asyncio.run(main())

Required environment variables

Create a .env or deployment secret set with these values:

PROXY_URL_1, PROXY_URL_2, PROXY_URL_3
TARGET_URL
OUTCOME_WINDOW
LATENCY_WINDOW
MAX_IN_FLIGHT_PER_PROXY
TIMEOUT_COOLDOWN_SECONDS
HARD_FAILURE_THRESHOLD
PER_PROXY_RATE_LIMIT
PER_PROXY_RATE_WINDOW_SECONDS
REQUEST_TIMEOUT_SECONDS
RETRY_BUDGET
TARGET_SUSPEND_SECONDS
REPEATED_FORBIDDEN_THRESHOLD
MAX_CONNECTIONS
TOTAL_REQUESTS

Populate proxy URLs from your provider’s live documentation or control panel. Populate rate, retry, cooldown, and suspension settings from your own approved traffic budget and operational policy rather than copying values from another team or vendor marketing page. ppl-ai-file-upload.s3.amazonaws

Why this implementation is safe to use in production

This version does four things a lab demo usually misses:

It tracks current in-flight pressure, not just past success.
It enforces per-proxy rate controls.
It preserves session affinity only when you pass a session key.
It suspends a target workflow after 429 or repeated 403 instead of treating those responses as a signal to keep routing traffic elsewhere. bleepingcomputer

That last part is the compliance-critical distinction. Transport failures are a proxy-health problem. Repeated 429 and 403 responses are often a target-policy problem, so they should trigger review, not evasive redistribution. bleepingcomputer

How do you verify that balancing actually works?

You verify a residential proxy pool with logs and distribution reports, not intuition. If you can’t see request share, latency, and failure concentration per proxy, you can’t tell whether the scheduler fixed overload or just hid it. joinmassive

Record these fields on every request:

Timestamp.
Request ID.
Target host.
Selected proxy.
Session key, if used.
Attempt number.
Status code or failure class.
End-to-end latency.
Whether the target workflow was suspended. joinmassive

Add this helper if you want a CSV distribution report:

import csv
from collections import defaultdict

def write_distribution_report(rows, filename="proxy_distribution.csv"):
    grouped = defaultdict(lambda: {"requests": 0, "latencies": [], "errors": 0})

    for row in rows:
        proxy = row.get("proxy", "unknown")
        grouped[proxy]["requests"] += 1
        if "latency" in row:
            grouped[proxy]["latencies"].append(row["latency"])
        if row.get("status_code", 200) >= 400:
            grouped[proxy]["errors"] += 1

    with open(filename, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["proxy", "requests", "avg_latency", "error_count"])
        for proxy, stats in grouped.items():
            lat = stats["latencies"]
            avg = round(sum(lat) / len(lat), 3) if lat else None
            writer.writerow([proxy, stats["requests"], avg, stats["errors"]])

Use these success criteria:

No proxy should sit at saturation while peers remain mostly idle.
Degraded proxies should lose share naturally because their score drops.
Session-bound tasks should stay coherent without taking over unrelated traffic.
If a target triggers suspension logic, the workflow should stop cleanly and visibly rather than silently continuing through other proxies. bleepingcomputer

A healthy output usually looks like a distribution report, not a single “all good” line. You want to see per-proxy request counts and latency values close enough to show balanced use, while still allowing stronger nodes to carry slightly more work than weaker ones.

What did this revision focus on?

This revision was reviewed on April 20, 2026 against public proxy-pool management guidance, residential proxy usage best practices, and HTTPX proxy documentation. The main conclusion was straightforward: the hard part is rarely the picker by itself; it is the combination of session-affinity scope, per-proxy pressure, and clear stop conditions when a target starts signaling that the workflow needs review. python-httpx

That is also where most articles cut corners. They explain rotation, then jump straight to “keep trying.” In production, the better design is to distinguish between proxy-health failures and target-policy signals so your scheduler doesn’t blur the two. joinmassive

Why is one proxy still getting overloaded?

The most common reason is that the scheduler is balancing selection count instead of live pressure. A slower proxy can accumulate more open requests even if it is not chosen much more often than its peers. botproxy

Check these patterns:

Symptom: one proxy keeps the most open requests

Cause: the score does not penalize rising in_flight pressure early enough.

Fix: make load_factor react sooner and lower MAX_IN_FLIGHT_PER_PROXY until your logs show stable distribution under your real workload.

Symptom: distribution got worse after enabling session affinity

Cause: the session scope is too broad or too long-lived.

Fix: keep session affinity only for workflows that truly depend on continuity, and expire mappings when the task ends. my.f5

Symptom: a few bad proxies drag down the whole pool

Cause: they never cool down or leave rotation after transport failures.

Fix: send repeated transport failures into cooldown, then let those proxies earn their way back through successful traffic instead of restoring them immediately. joinmassive

Symptom: every proxy looks “bad” at once

Cause: the problem may not be the pool at all.

Fix: verify your provider’s gateway syntax, credentials, region parameters, and session-affinity format before changing the scheduler. A malformed proxy URL can look like a balancing failure when it is really an integration bug. ppl-ai-file-upload.s3.amazonaws

Build-vs-buy note

If you need custom routing logic, internal observability, and tight control over how tasks are mapped to proxies, building this scheduler layer makes sense. If your team mainly needs reliable residential proxy access without owning the full control plane, evaluating a managed provider is usually the better tradeoff. intel471

That is the natural place to look at Proxy001. Its public site includes residential proxy use-case content and example traffic-based pricing in blog material, which is enough to make it relevant in a build-vs-buy discussion, but you should still confirm the current gateway format, onboarding flow, and any plan-specific details on the live site before integrating. proxy001

Compliance note

Use residential proxies only for permitted tasks such as QA, price monitoring, ad verification, fraud analysis, or data-quality workflows where you have a lawful basis for the traffic. A residential proxy pool is not a substitute for authorization, contract review, or target-site policy review. bleepingcomputer

In practice, this means your scheduler should make it easy to stop or reduce traffic when the target indicates pressure or restriction. Good load balancing is not just about throughput; it is also about making policy boundaries visible in the control flow. joinmassive

Go-live checklist

Before rollout, make sure you have:

A per-proxy concurrency cap.
A per-proxy rate control.
A documented session-affinity rule.
Separate handling for transport errors versus target-policy signals.
Provider-specific gateway and credential syntax verified from live docs.
Request logs with proxy choice, latency, and outcome.
A verification report after test traffic.
A clean suspension path for targets that need policy review. ppl-ai-file-upload.s3.amazonaws

If you want a provider to test this pattern against a real commercial residential proxy offering, Proxy001 is one option worth evaluating. Its site already publishes residential proxy use cases and example traffic-based pricing, which makes it a reasonable starting point for practical assessment. The next step is to confirm the current live signup path, gateway syntax, region parameters, and session-affinity format directly on proxy001.com, then plug those values into the scheduler structure above and validate with your own approved workload. proxy001

DEV Community