Your proxy provider goes down at 3 AM. Your scraping pipeline stops. Your accounts cannot log in. Your monitoring goes blind. Single points of failure in proxy infrastructure are unacceptable for serious operations. Here is how to build resilient systems.
Common Failure Modes
Provider-Level Failures
- Complete outage — Provider gateway goes offline
- Partial degradation — Slow responses, increased error rates
- Pool exhaustion — All IPs in your target geo are used or blocked
- Authentication issues — API key expired, billing problems
Network-Level Failures
- DNS resolution failure — Cannot resolve proxy gateway hostname
- Connection timeouts — Network path to proxy is congested
- SSL/TLS errors — Certificate issues on proxy gateway
IP-Level Failures
- IP burned — Target platform blocked the proxy IP
- IP misclassified — Proxy IP detected as datacenter instead of residential
- Geographic mismatch — IP geolocates to wrong location
Failover Architecture
Your Application
|
v
Proxy Router
/ | \\
Provider Provider Provider
A B C
(Primary) (Secondary) (Tertiary)
Multi-Provider Setup
class ResilientProxyManager:
def __init__(self):
self.providers = [
{
"name": "primary",
"gateway": "gateway.provider-a.com:8080",
"auth": "user_a:pass_a",
"priority": 1,
"healthy": True,
"consecutive_failures": 0
},
{
"name": "secondary",
"gateway": "gateway.provider-b.com:8080",
"auth": "user_b:pass_b",
"priority": 2,
"healthy": True,
"consecutive_failures": 0
},
{
"name": "tertiary",
"gateway": "gateway.provider-c.com:8080",
"auth": "user_c:pass_c",
"priority": 3,
"healthy": True,
"consecutive_failures": 0
}
]
def get_proxy(self):
# Sort by priority, filter healthy
available = sorted(
[p for p in self.providers if p["healthy"]],
key=lambda p: p["priority"]
)
if not available:
# All providers down - reset and try again
self.reset_all()
available = self.providers
provider = available[0]
return f"http://{provider["auth"]}@{provider["gateway"]}"
def report_failure(self, provider_name):
provider = self.get_provider(provider_name)
provider["consecutive_failures"] += 1
if provider["consecutive_failures"] >= 3:
provider["healthy"] = False
self.schedule_health_check(provider, delay=60)
def report_success(self, provider_name):
provider = self.get_provider(provider_name)
provider["consecutive_failures"] = 0
provider["healthy"] = True
Circuit Breaker Pattern
The circuit breaker prevents hammering a failed provider:
import time
class CircuitBreaker:
CLOSED = "closed" # Normal operation
OPEN = "open" # Provider failed, blocking requests
HALF_OPEN = "half_open" # Testing if provider recovered
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.state = self.CLOSED
self.failures = 0
self.threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = 0
def can_execute(self):
if self.state == self.CLOSED:
return True
if self.state == self.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = self.HALF_OPEN
return True
return False
if self.state == self.HALF_OPEN:
return True
return False
def record_success(self):
self.failures = 0
self.state = self.CLOSED
def record_failure(self):
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.threshold:
self.state = self.OPEN
Health Check System
import threading
class HealthChecker:
def __init__(self, proxy_manager, check_interval=30):
self.proxy_manager = proxy_manager
self.interval = check_interval
self.running = True
def start(self):
thread = threading.Thread(target=self.run, daemon=True)
thread.start()
def run(self):
while self.running:
for provider in self.proxy_manager.providers:
self.check_provider(provider)
time.sleep(self.interval)
def check_provider(self, provider):
proxy = f"http://{provider["auth"]}@{provider["gateway"]}"
try:
start = time.time()
response = requests.get(
"https://httpbin.org/ip",
proxies={"http": proxy, "https": proxy},
timeout=10
)
latency = time.time() - start
if response.status_code == 200:
provider["healthy"] = True
provider["latency"] = latency
provider["consecutive_failures"] = 0
else:
self.handle_check_failure(provider)
except Exception:
self.handle_check_failure(provider)
def handle_check_failure(self, provider):
provider["consecutive_failures"] += 1
if provider["consecutive_failures"] >= 3:
provider["healthy"] = False
send_alert(f"Provider {provider["name"]} is DOWN")
Failover Best Practices
- Use at least 2 providers — Single provider is single point of failure
- Test failover regularly — Simulate provider failures to verify your system works
- Set up monitoring — Know about failures before they impact operations
- Maintain warm standby — Secondary providers should have active credentials and tested connectivity
- Document provider SLAs — Know what uptime each provider guarantees
- Budget for redundancy — Failover providers cost money even when idle
- Automate recovery — Manual failover at 3 AM is not reliable
Cost of Downtime vs Cost of Redundancy
| Metric | Single Provider | Multi-Provider |
|---|---|---|
| Monthly proxy cost | $200 | $250-300 |
| Expected uptime | 99% | 99.9%+ |
| Monthly downtime | ~7 hours | ~43 minutes |
| Impact of outage | Total stop | Seamless failover |
The extra 25-50% cost for redundancy prevents hours of downtime per month.
For proxy failover architecture and infrastructure resilience guides, visit DataResearchTools.
Top comments (0)