Miller James

Posted on Dec 22, 2025 • Edited on Feb 2

Web Scraping Proxies: Why Requests Work Locally but Get Blocked in the Cloud (A Diagnosis Framework)

#aws #networking #devops

Your scraper runs perfectly on your laptop. You deploy it to AWS, and it fails immediately. This is not a code bug—it is an environment detection problem. This article provides a diagnostic framework to identify why your web scraping proxy configuration works locally but fails in cloud or datacenter environments, with validation steps and defensive-only remediation paths.

Direct Answer: Why Local Works but Cloud Fails

The core mechanism: When a scraper runs correctly on your local machine but fails in production, the scraper itself has no issue—the website detects something about your running environment. Your local machine uses a residential IP with high trust, browser-native TLS signatures, and natural request timing. A cloud server uses a datacenter IP with low trust (often pre-blocked), HTTP library TLS fingerprints, and parallel request patterns. Websites detect these differences at multiple layers simultaneously.

Five detection layers cause local-vs-cloud blocking:

IP Trust Score: Datacenter IPs from providers like AWS, GCP, and Azure are flagged before any request reaches the server. Cloud providers publish their IP subnet lists, which websites use for immediate blocking. An estimated 99% of traffic from traceable datacenter IPs is bot traffic.
ASN Recognition: AWS WAF maintains a HostingProviderIPList containing all known hosting providers, with inclusion determined on an ASN basis. If your proxy provider's IP range falls within a known datacenter ASN, you may be blocked before sending a single request.
TLS Fingerprint Mismatch: HTTP client libraries produce TLS Client Hello messages with parameters distinct from real browsers. The JA3 fingerprint algorithm hashes five fields (TLSVersion, Ciphers, Extensions, EllipticCurves, EllipticCurvePointFormats), and anti-scraping services maintain databases of whitelisted browser fingerprints versus blacklisted scraping tool fingerprints.
Header and Behavioral Signals: Default library user-agents explicitly identify automation. Beyond headers, modern anti-bot systems use machine learning to analyze request patterns, timing, and behavioral fingerprints across dozens of signals.
Browser Fingerprint Leakage: For browser automation, headless Chrome differs from standard Chrome in subtle fingerprint ways. The navigator.webdriver property returns true for automation-controlled browsers, and websites gather browser API information including audio/video devices and WebGL renderer data.

Key insight: Websites use a layered approach combining network-level signals (JA3/JA4, IP geolocation, ASN reputation) with application-level signals (missing fonts, unusual screen sizes, headless-detection scripts). Fixing one layer while ignoring others will not resolve blocking.

Diagnostic Flowchart: Identifying Your Blocking Cause

Use this decision tree to systematically identify why your scraper fails in cloud deployment. Start at the top and follow the branches based on your observed symptoms.

START: Scraper works locally but fails in cloud?
    │
    ▼
[1] Check if anti-bot is installed on target
    (Use Wappalyzer browser extension to detect protection)
    │
    ├─► Anti-bot detected → Refer to anti-bot specific diagnostic 
    │                        (protection-specific validation required)
    │
    └─► No anti-bot detected
            │
            ▼
[2] Does manual browser access work from same cloud IP?
    (SSH to server, curl or browser test)
    │
    ├─► NO: Browser also blocked
    │       │
    │       ▼
    │   CAUSE: IP/ASN-based blocking
    │   → Validate: Check if IP belongs to known datacenter ASN
    │   → Test: Try residential proxy or different datacenter provider
    │   → See: "IP-Based Blocking" row in Troubleshooting Matrix
    │
    └─► YES: Browser works, scraper fails
            │
            ▼
[3] Is your scraper browserless or browser automation?
    │
    ├─► BROWSERLESS (requests, httpx, aiohttp, etc.)
    │       │
    │       ▼
    │   [4] What HTTP error code is returned?
    │       │
    │       ├─► 403 on ALL requests immediately
    │       │   → Likely: TLS fingerprint or IP blocking
    │       │   → Test: Use TLS impersonation library (curl_cffi)
    │       │   → If still blocked: IP/ASN issue
    │       │
    │       ├─► 403 after some successful requests
    │       │   → Likely: Header mismatch or rate detection
    │       │   → Test: Copy exact browser headers from network tab
    │       │   → Validate header order and capitalization
    │       │
    │       ├─► 429 Too Many Requests
    │       │   → Likely: Rate limiting
    │       │   → Fix: Reduce threads, add random delays
    │       │   → Consider: Session-based IP rotation
    │       │
    │       └─► Timeout / Connection errors
    │           → Likely: Rate limiting or IP ban mid-session
    │           → Reduce parallel requests
    │           → Rotate IPs more frequently
    │
    └─► BROWSER AUTOMATION (Puppeteer, Playwright, Selenium)
            │
            ▼
        [5] Check browser fingerprint signals
            │
            ├─► navigator.webdriver = true?
            │   → Fix: Add --disable-blink-features=AutomationControlled
            │
            ├─► Missing plugins/fonts/WebGL anomalies?
            │   → Apply stealth plugin
            │   → Test at fingerprint validation sites
            │
            └─► Fingerprint appears normal but still blocked?
                → Likely: IP/ASN blocking compound issue
                → Test with residential proxy
                → Check for behavioral pattern detection

Terminal diagnosis mapping:

Diagnosis	Primary Cause	Validation Method	Remediation Path
Blocked before any content loads	ASN/IP range pre-blocking	Check error 1005 or immediate 403	Switch to residential proxies or different ASN
Blocked with library but not browser	TLS fingerprint mismatch	Compare JA3 hash of client vs browser	Use TLS impersonation library or browser automation
Works locally, 403 in cloud with same code	Datacenter IP detection	Test same code with residential proxy	Use rotating residential proxies
Initial success then failures	Rate limiting or behavioral detection	Monitor error rate over time	Reduce concurrency, add delays, implement session management
Geo-specific blocking (error 1009)	Country-based access control	Test with geo-targeted proxy	Use proxy for web scraping with country parameter

Troubleshooting Matrix: Symptoms, Causes, Validation, and Fixes

This matrix maps observable symptoms to likely causes, provides validation steps to confirm the diagnosis, and offers defensive-only remediation approaches.

Row 1: IP-Based Blocking (Cloudflare 1005, Immediate 403)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
Cloudflare error 1005 or immediate 403 on first request	ASN/IP range is pre-blocked; datacenter IP recognized	Check if IP belongs to AWS/GCP published ranges; test with manual browser from same IP	Use datacenter proxy from different ASN, or switch to residential proxy	If datacenter proxies from multiple ASNs fail within 50 requests, escalate to residential proxies
All requests blocked regardless of headers/timing	Target maintains HostingProviderIPList blocking known hosting ASNs	ASN lookup on proxy IP; compare against AWS/GCP published ranges	Use proxy providers for web scraping with residential or ISP IP ranges	If residential also blocked, target may use advanced fingerprinting

Key mechanism: Cloud providers like AWS publish their IP subnet lists, and websites proactively block these ranges. Some targets block entire subnets instead of single IPs—this is the case for high-security platforms. The solution when facing subnet blocking is to use multiple proxy providers and distribute requests across geographic regions.

Row 2: TLS Fingerprint Mismatch (Blocked Before Content Loads)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
403 immediately with no page content; works with browser from same IP	HTTP library TLS fingerprint differs from browser	Compare JA3 hash of your client to known browser fingerprints	Use TLS impersonation library (curl_cffi for Python, tls-client for Go)	If TLS impersonation fails, switch to full browser automation
Request rejected at TLS handshake level	Anti-scraping service maintains JA3/JA4 blacklist	Test with curl-impersonate command-line tool	Browser automation tools (Puppeteer, Playwright) use authentic TLS fingerprints	Consider managed web scraping proxy service with built-in TLS handling

Technical detail: JA3 fingerprint uses five fields from the TLS Client Hello: TLSVersion, Ciphers, Extensions, EllipticCurves, EllipticCurvePointFormats. These are concatenated and hashed with MD5. JA4 is the successor with improved accuracy for TLS 1.3 and QUIC/HTTP3 traffic. Standard HTTP libraries produce fingerprints that differ from browsers and are catalogued by anti-scraping services.

Row 3: Header/User-Agent Detection (403 with Partial Content)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
403 returned with error page content (not blank)	Default library user-agent identifies automation	Check User-Agent header; compare against browser DevTools network tab	Configure spider to send browser-like User-Agent; optimize all request headers	If headers fix doesn't work, IP is likely flagged—need rotating proxy pool
Same headers work in one library but fail in another	Framework adds extra headers; header capitalization differs	Compare raw requests between libraries (Scrapy vs requests); check header case	Use raw HTTP client or middleware to control exact header output	Test with proxy web scraping configuration to isolate header issues

Library-specific issue: Scrapy (built on Twisted) uses a different HTTP client than the requests library (which uses urllib3). This causes different HTTP requests even with the same configured headers. Some websites detect bots by looking for capitalized headers versus lowercase headers from real browsers.

Row 4: Browser Fingerprint Detection (Headless/Automation Signals)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
Browser automation blocked; manual browser works	navigator.webdriver returns true; headless fingerprint differences	Check navigator.webdriver in console; test at fingerprint validation sites	Use --disable-blink-features=AutomationControlled flag; apply stealth plugin	If stealth patches fail, check for WebGL/canvas fingerprint leaks
Blocked despite stealth plugin	Websites gather browser API information: audio/video devices, WebGL renderer	Compare browser fingerprint between headless and headed mode	Use full headed browser with virtual display; rotate fingerprint components	Sophisticated detection uses ML across multiple signals—may need managed service

Detection mechanism: Headless Chrome differs from standard Chrome in subtle fingerprint ways. Stealth plugins attempt to hide these differences by erasing navigator.webdriver=true using the --disable-blink-features=AutomationControlled flag and overriding navigator.plugins. However, HTTP/TLS fingerprints become less useful for detection when stealth plugin is used with a realistic user agent, shifting detection to behavioral signals.

Row 5: Rate Limiting (429 After N Requests)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
429 Too Many Requests after initial success	Request rate exceeds threshold; pattern detected	Monitor error rate over time; check if 429 correlates with request count	Randomize intervals, reduce request frequency, rotate IP addresses, vary user-agent	If rate limiting persists at low volume, behavioral fingerprinting is active
Timeout errors after initial success	Rate limiting manifests as timeouts instead of explicit 429	Check if timeouts correlate with request volume or elapsed time	Reduce parallel threads and add request delays	Implement exponential backoff on failures

Behavioral signals: Modern anti-bot systems use machine learning to spot automated traffic by examining request patterns, browser fingerprints, and dozens of other signals. Fixed intervals between requests are a strong automation signal.

Row 6: Session/Cookie Inconsistency (Works Initially Then Fails)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
Scraper works for first requests then fails	Session state lost between requests; cookies not maintained	Check if session cookies are persisted across requests	Implement session management with consistent cookies per IP	Use rotating proxies for web scraping with session locking
Works with one IP, fails when rotating	Same cookie sent from multiple IPs (impossible for real user)	Audit cookie handling across IP rotations	Never send a single cookie from multiple IP addresses; can send multiple cookies from single IP	Use proxy sessions to lock specific IP for consistent use

Critical rule: Sending a single cookie from multiple IPs is impossible in reality and is an immediate automation signal. However, sending multiple cookies from a single IP is normal since many users share public IPs. Proxy sessions allow locking a certain IP, enabling you to pair an IP address with human-like cookies and headers.

Row 7: Geo-Blocking (1009 Region-Based)

Symptom	Likely Cause	Validation Step	Fix / Mitigation	Escalation Path
Error 1009 or region-specific access denied	Target restricts access by country; proxy geo doesn't match	Check proxy IP geolocation against target's expected region	Use proxy for web scraping with country parameter (e.g., country-US)	If geo-targeted proxy still fails, target may use additional validation

Measurement Plan Template

To diagnose blocking causes and validate fixes, establish baseline metrics and monitoring before deployment. This template provides the fields to track—actual thresholds depend on your target site and should be determined through testing.

Baseline Metrics to Establish

Metric	Description	Measurement Method	Validation
Response Time (baseline)	Average response time under normal conditions	Log response times for first 100 successful requests	Deviation >2x baseline may indicate throttling
Error Rate (baseline)	Percentage of non-2xx responses	Track HTTP status codes over sample period	Spike in 403/429 signals suspicion triggering
Data Completeness	Percentage of expected fields retrieved	Compare output against expected schema	Partial data may indicate soft blocking
Success Rate per IP	Requests completed before block per IP	Track block events correlated to IP lifecycle	Determine when to rotate IPs
Success Rate per Session	Requests completed before block per session	Track block events correlated to session lifecycle	Determine session duration limits

Monitoring Fields for Production

Field	What It Indicates	Action Threshold
403 Forbidden rate	Header/IP/TLS detection	Investigate at >YOUR_THRESHOLD_PERCENT
429 Too Many Requests rate	Rate limiting triggered	Reduce concurrency or add delays
Timeout rate	Soft blocking or rate limiting	Check for pattern correlation with volume
Response body anomaly	Content served differs from expected (captcha, empty)	Fingerprint or behavioral detection
IP block correlation	Which IPs get blocked and when	Identify problematic ASNs or providers

Acceptance Criteria Template

Before declaring your proxy web scraping configuration production-ready:

[ ] Error rate remains below YOUR_ACCEPTABLE_RATE over YOUR_SAMPLE_SIZE requests
[ ] Same configuration works from at least YOUR_SAMPLE_REGIONS geographic regions
[ ] Success rate per IP exceeds YOUR_MIN_REQUESTS_PER_IP before rotation needed
[ ] No 403/429 spikes correlated with deployment timing
[ ] Data completeness matches local testing baseline

Note: Specific thresholds (percentages, request counts, timeframes) must be determined through testing against your target site. The knowledge base provides the measurement method framework but not universal threshold values.

Integration Snippets: Validated Code Patterns

The following code examples are extracted verbatim from authoritative sources. Use these as starting points, adapting credentials and endpoints to your environment.

Proxy Configuration with Session Locking

When implementing web scraping with proxy servers, maintaining consistent IP for session-based requests is critical:

http://session-my_session_id_123:p455w0rd@proxy.apify.com:8000

Proxy with Country Parameter for Geo-Targeting

To access region-restricted content, specify country in proxy URL:

http://country-US:p455w0rd@proxy.apify.com:8000

Python Requests with Proxy Server

Basic proxy server for web scraping configuration:

import requests

proxy_servers = {
   'http': 'http://proxy.apify.com:8000',
   'https': 'http://proxy.apify.com:8000',
}

auth = ('auto', 'p455w0rd')
response = requests.get('https://example.com', proxies=proxy_servers, auth=auth)

Complete Browser Headers Configuration

To avoid header-based detection, include full browser header set:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://www.google.com/',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Cache-Control': 'max-age=0',
}

Disabling Automation Detection Flag (Browser Automation)

For Playwright or Puppeteer, disable the navigator.webdriver detection:

const browser = await chromium.launch({
    args: ['--disable-blink-features=AutomationControlled']
});

Session Pool for Rotating Proxies

Managing multiple sessions with consistent fingerprints per session:

const user1 = {
    sessionId: 'user1',
    headers: {
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
    },
    cookieJar: new CookieJar(),
}

Error Handling with Exponential Backoff

Graceful retry logic for 403 errors:

import time, math
retry_delay = 1
for attempt in range(10):
    response = requests.get(url)
    if response.status_code == 403:
        print(f'403! Retrying in {retry_delay} seconds...')
        retry_delay = math.pow(2, attempt)
        time.sleep(retry_delay)
    else:
        break

TLS Impersonation Template (TEMPLATE)

For sites blocking standard HTTP library TLS fingerprints, use TLS impersonation. The curl_cffi library can simulate browser TLS/JA3 and HTTP/2 fingerprints, unlike standard requests or httpx.

# Standard example (not verbatim from sources)
# curl_cffi wraps curl-impersonate to mimic browser TLS handshakes

from curl_cffi import requests

response = requests.get(
    'YOUR_TARGET_URL',
    impersonate='YOUR_BROWSER_CHOICE',  # e.g., 'chrome110'
    proxies={'https': 'YOUR_PROXY_URL'}
)

# Validation Steps:
# 1. Compare response status vs standard requests library
# 2. Verify JA3 fingerprint matches browser at fingerprint testing service
# 3. Confirm response body contains expected content (not block page)

Decision Matrix: Selecting Proxy Type After Diagnosis

Once you have diagnosed the blocking cause, use this matrix to select the appropriate proxy type. This table uses vendor-claimed success rates for reference; actual performance requires testing against your specific target.

Target Protection Level	Recommended Proxy Type	Expected Success Rate (Vendor Claims)	Cost Model	When to Escalate
No anti-bot protection	Datacenter proxy	High (test with 50+ requests to validate)	Per IP or flat-rate bandwidth	If 403s persist after headers fix, check TLS layer
Basic protection (rate limits only)	Rotating datacenter proxy + session management	Moderate-High	Per IP	If blocked within 50 requests per IP, escalate to residential
Moderate (Cloudflare basic, WAF)	Residential proxy or ISP proxies	Vendor claims: 95-99% on protected sites	Per GB of traffic	If JS challenge persists despite residential IP
Aggressive (advanced bot management)	Rotating residential proxies + browser automation	Vendor claims: 40-60% datacenter vs 95-99% residential on protected sites	Per GB + compute costs	Consider managed web scraping proxy service

Decision rules:

Always try datacenter proxies first: They are faster, more stable, and cheaper than residential proxies. Test with at least 50 requests per IP before concluding they don't work.
Escalate to residential proxies when: Datacenter IPs are blocked regardless of headers/TLS fixes, error 1005 appears, or target maintains aggressive HostingProviderIPList.
When to buy rotating residential proxies: High-security targets (e-commerce, social media, travel aggregators) where datacenter success rate is unacceptable and session management alone doesn't resolve blocking.
When to buy datacenter proxy: Target is low-security, speed is critical, budget is constrained, and testing confirms acceptable success rate. Consider static datacenter proxies for consistent IP assignment.
When rotating proxy for scraping is needed: Rate limiting triggers at volumes that require IP diversity beyond what sticky sessions provide.

Vendor selection considerations: When evaluating the best web scraping proxies or the best web scraping proxy for your use case, test against your actual target site rather than relying solely on vendor claims. The best proxies for web scraping or the best proxy for web scraping depends on your specific target's protection mechanisms, not generic benchmarks. Explore options at proxy001.com to compare proxy types.

Procurement Due Diligence Checklist (TEMPLATE)

Before purchasing from proxy providers for web scraping, validate these criteria. Fields marked with YOUR_* require input from vendor documentation or testing.

Technical Validation

[ ] IP Type Verification: Confirm whether IPs are datacenter, residential, or ISP
[ ] ASN Diversity: Check if provider offers IPs from multiple ASNs (reduces subnet blocking risk)
[ ] Geographic Coverage: Verify available locations match your target sites' expected regions
[ ] Session Support: Confirm ability to lock IP for session-based requests (session parameter in proxy URL)
[ ] Protocol Support: Verify HTTP/HTTPS/SOCKS support as needed
[ ] Rotation Options: Understand rotation frequency and control (per request, timed, manual)

Operational Validation

[ ] Request Volume Testing: Test with at least 50 requests per IP against your actual target before committing
[ ] Error Rate Baseline: Establish baseline error rate before production deployment
[ ] Support Responsiveness: Test support channel response time for technical issues
[ ] Dashboard/API Availability: Verify monitoring and usage tracking capabilities
[ ] Documentation Quality: Review developer documentation for integration clarity

Compliance and Risk

[ ] Proxy Sourcing Transparency: Understand how residential IPs are sourced (ethical sourcing verification)
[ ] Terms of Service Review: Confirm permitted use cases align with your application
[ ] Data Retention Policy: Understand what request logs are retained and for how long
[ ] Exclusion Risk: For residential proxies, understand if provider's ASN appears on HostingProviderIPLists (some residential ranges advertised from hosting ASNs may be flagged)

Trial and Escalation

[ ] Trial Availability: Request demo or trial access before commitment
[ ] Escalation Path: Understand options if current proxy type insufficient (datacenter → ISP → residential)
[ ] Contract Flexibility: Verify ability to adjust volume or type based on testing results

Risk Boundary Box: Compliance and Defensive Limits

This framework is for diagnosing and resolving legitimate scraping failures. The following boundaries define allowed diagnostic activities versus prohibited actions.

Allowed Activities (Defensive Diagnosis and Reliability Engineering)

Diagnosing why your scraper fails in cloud environment
Testing different proxy types (datacenter proxy, residential proxy, rotating residential proxy, rotating datacenter proxy) to find working configuration
Adjusting headers and TLS settings to match browser behavior for public data access
Implementing rate limiting and session management to reduce server load
Using legitimate anti-bot bypass techniques for public data collection where legally permissible
Validating that your rotating proxies for web scraping configuration maintains ethical request rates

Boundary Conditions (Must Be Maintained)

Respect robots.txt directives where applicable to your use case
Do not overload target servers—implement appropriate delays and concurrency limits
Do not bypass authentication or access control for private data
Do not use scraped data in violation of terms of service
Do not scrape personal data without legal basis (GDPR, CCPA, etc.)

Stop Conditions (Immediate Halt Required)

If receiving legal notices from target site, consult legal counsel immediately before continuing
If scraping causes measurable target site performance degradation, reduce load or stop
If data is behind paywall or login, do not circumvent access controls

What This Guide Does NOT Cover

This diagnostic framework does not provide:

Techniques to bypass CAPTCHAs or challenge pages requiring human verification
Methods to access authenticated or paywalled content without authorization
Guidance on scraping personal data or content with specific legal restrictions
Tools or configurations for attacking or overloading target infrastructure

Pre-Deployment Checklist: Validating Before Cloud Deployment

Before moving your scraper from local development to cloud production, validate each layer. This checklist synthesizes the diagnostic framework into actionable verification steps.

IP and Network Layer

[ ] Target site does NOT block datacenter IPs (validated with 50+ test requests)
[ ] Proxy provider ASN is not on known blocklists (check against AWS/GCP published ranges if using datacenter)
[ ] Geo-location of proxy matches target site's expected region
[ ] IP rotation configured with appropriate session management (session ID parameter)
[ ] For residential proxies, sourcing ethics verified with provider
[ ] For static datacenter proxies, ASN diversity confirmed

TLS Layer

[ ] Using TLS impersonation library (curl_cffi, tls-client) OR full browser automation
[ ] JA3/JA4 fingerprint matches target browser (verify at fingerprint testing service)
[ ] HTTP/2 support enabled if target expects it
[ ] For browserless scraping, confirm library TLS fingerprint is not blacklisted

HTTP Headers Layer

[ ] User-Agent matches real browser (not library default like "python-requests/2.x")
[ ] All standard browser headers included (Accept, Accept-Language, Accept-Encoding, Referer)
[ ] Header order and capitalization matches browser (some sites detect capitalized headers as bot signal)
[ ] Referer header set appropriately for navigation context

Browser Automation Layer (If Applicable)

[ ] navigator.webdriver flag disabled (--disable-blink-features=AutomationControlled)
[ ] Stealth plugin applied (puppeteer-extra-stealth or equivalent)
[ ] Plugins/fonts/WebGL fingerprint appears normal (test at fingerprint validation sites)
[ ] No automation indicators in JavaScript environment (window.chrome, navigator.languages)

Behavioral Patterns Layer

[ ] Request rate configured below target's rate limit threshold (determined through testing)
[ ] Random delays between requests (not fixed intervals—fixed timing is automation signal)
[ ] Session cookies maintained consistently per IP (never send same cookie from multiple IPs)
[ ] Concurrency level appropriate for target capacity
[ ] Error handling includes exponential backoff

Monitoring and Observability

[ ] Baseline metrics established (response time, error rate, data completeness)
[ ] Alerting configured for error rate spikes (403, 429 thresholds)
[ ] IP block events logged with correlation to request patterns
[ ] Fallback path defined (escalation to different proxy type if primary fails)

For production-grade unlimited residential proxies or static residential proxies, ensure your provider supports the session management and geographic targeting your use case requires.