When you build a web scraper locally it feels invincible. curl works, Playwright works, your Python script pulls data without a hitch. Then you deploy — and within minutes you see 403s, 429s, Cloudflare challenges, and blank pages.
This is the most common debugging trap in web scraping. The culprit is almost always your IP.
The local vs production gap
On your laptop you have a real residential IP — assigned by your home ISP, with years of browsing history behind it. Bot detection systems score it low risk. You look human.
In production you are almost certainly on AWS, GCP, Hetzner, or DigitalOcean. Those ASNs are public knowledge. Cloudflare, Akamai, and PerimeterX flag them automatically — before evaluating any other signal. Your perfectly crafted headers don't matter. The IP gives it away.
The fix: residential proxies
Residential proxies route traffic through real ISP-assigned IPs on home broadband connections. Bot detection sees a request from a real user in a real location, not a data center.
Key things to evaluate:
- Pool size: larger = less reuse = less chance of hitting burned IPs. Under 10M is risky for serious work.
- ASN type: must be genuine residential, not repackaged datacenter ranges relabeled "residential."
- Sticky sessions: login flows and multi-step scraping need the same IP across requests.
- Price: most enterprise providers charge $7-10/GB — brutal at scale.
What I use
After testing a few, I settled on v-proxies.com for most projects:
- $0.99/GB — vs Bright Data ($8.40), Smartproxy ($7), Oxylabs ($8+)
- 84M+ residential IPs across 196+ countries, 5,000+ cities
- Sticky sessions up to 60 minutes — encoded directly in the proxy username
- Unlimited concurrency — no thread caps
- 99.97% uptime, avg response under 0.8s
- Pay-as-you-go, credits never expire, $5 minimum to start
Proxy format:
# Rotating (new IP per request)
http://username:password@gate.v-proxies.com:9000
# City-targeted
http://username-country-us-city-new_york:password@gate.v-proxies.com:9000
# Sticky 30min session
http://username-session-abc123-sticky-30:password@gate.v-proxies.com:9000
Playwright example
from playwright.async_api import async_playwright
async def scrape(url):
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={
"server": "http://gate.v-proxies.com:9000",
"username": "username-country-us",
"password": "your_password"
}
)
page = await browser.new_page()
await page.goto(url, wait_until="networkidle")
content = await page.content()
await browser.close()
return content
Python requests example
import requests
proxies = {
"http": "http://username-country-us:password@gate.v-proxies.com:9000",
"https": "http://username-country-us:password@gate.v-proxies.com:9000"
}
response = requests.get("https://target-site.com", proxies=proxies)
Rule of thumb
Always test against the real target with production-equivalent proxies before you ship. The moment you introduce different IP types between dev and prod, you are setting yourself up for phantom failures that are miserable to debug.
At $0.99/GB there is no reason not to. A session that fetches 10,000 pages at ~200KB each uses roughly 2GB — that is $2.
Have you hit this local-vs-production gap? What helped you close it?
Top comments (0)