Why Your Web Scraper Works Locally But Fails in Production (and How to Fix It)

When you build a web scraper locally it feels invincible. curl works, Playwright works, your Python script pulls data without a hitch. Then you deploy — and within minutes you see 403s, 429s, Cloudflare challenges, and blank pages.

This is the most common debugging trap in web scraping. The culprit is almost always your IP.

The local vs production gap

On your laptop you have a real residential IP — assigned by your home ISP, with years of browsing history behind it. Bot detection systems score it low risk. You look human.

In production you are almost certainly on AWS, GCP, Hetzner, or DigitalOcean. Those ASNs are public knowledge. Cloudflare, Akamai, and PerimeterX flag them automatically — before evaluating any other signal. Your perfectly crafted headers don't matter. The IP gives it away.

The fix: residential proxies

Residential proxies route traffic through real ISP-assigned IPs on home broadband connections. Bot detection sees a request from a real user in a real location, not a data center.

Key things to evaluate:

Pool size: larger = less reuse = less chance of hitting burned IPs. Under 10M is risky for serious work.
ASN type: must be genuine residential, not repackaged datacenter ranges relabeled "residential."
Sticky sessions: login flows and multi-step scraping need the same IP across requests.
Price: most enterprise providers charge $7-10/GB — brutal at scale.

What I use

After testing a few, I settled on v-proxies.com for most projects:

$0.99/GB — vs Bright Data ($8.40), Smartproxy ($7), Oxylabs ($8+)
84M+ residential IPs across 196+ countries, 5,000+ cities
Sticky sessions up to 60 minutes — encoded directly in the proxy username
Unlimited concurrency — no thread caps
99.97% uptime, avg response under 0.8s
Pay-as-you-go, credits never expire, $5 minimum to start

Proxy format:

# Rotating (new IP per request)
http://username:password@gate.v-proxies.com:9000

# City-targeted
http://username-country-us-city-new_york:password@gate.v-proxies.com:9000

# Sticky 30min session
http://username-session-abc123-sticky-30:password@gate.v-proxies.com:9000

Playwright example

from playwright.async_api import async_playwright

async def scrape(url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy={
                "server": "http://gate.v-proxies.com:9000",
                "username": "username-country-us",
                "password": "your_password"
            }
        )
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")
        content = await page.content()
        await browser.close()
        return content

Python requests example

import requests

proxies = {
    "http": "http://username-country-us:password@gate.v-proxies.com:9000",
    "https": "http://username-country-us:password@gate.v-proxies.com:9000"
}

response = requests.get("https://target-site.com", proxies=proxies)

Rule of thumb

Always test against the real target with production-equivalent proxies before you ship. The moment you introduce different IP types between dev and prod, you are setting yourself up for phantom failures that are miserable to debug.

At $0.99/GB there is no reason not to. A session that fetches 10,000 pages at ~200KB each uses roughly 2GB — that is $2.

Have you hit this local-vs-production gap? What helped you close it?