Residential and mobile proxy bandwidth is expensive — $5-50 per GB. Every wasted byte is wasted money. Here are practical techniques to dramatically reduce your proxy bandwidth consumption.
Where Bandwidth Gets Wasted
1. Downloading Full Pages When You Need One Data Point
A typical web page is 2-5 MB. If you only need a price or title, you are wasting 99% of the bandwidth on images, CSS, JavaScript, and ads.
2. Loading Resources You Do Not Need
Images, videos, fonts, tracking scripts, and ads consume the bulk of page weight but are rarely needed for data extraction.
3. Redundant Requests
Scraping the same page multiple times because of poor caching or retry logic.
4. Failed Requests
Requests that fail (403, CAPTCHA, timeout) still consume bandwidth.
Optimization Techniques
1. Block Unnecessary Resources
In Playwright or Puppeteer, intercept and block heavy resources:
from playwright.sync_api import sync_playwright
def create_optimized_page(browser):
page = browser.new_page()
# Block images, fonts, stylesheets, media
page.route("**/*.{png,jpg,jpeg,gif,svg,webp}", lambda route: route.abort())
page.route("**/*.{woff,woff2,ttf,eot}", lambda route: route.abort())
page.route("**/*.css", lambda route: route.abort())
page.route("**/analytics*", lambda route: route.abort())
page.route("**/tracking*", lambda route: route.abort())
page.route("**/ads*", lambda route: route.abort())
return page
This alone can reduce bandwidth by 60-80%.
2. Use HTTP Instead of Headless Browsers
Headless browsers download and render everything. Direct HTTP requests download only the HTML.
import requests
# Headless browser: Downloads 3-5 MB per page
# Direct HTTP: Downloads 50-200 KB per page
response = requests.get(
url,
proxies=proxy,
headers={"Accept-Encoding": "gzip, deflate, br"}, # Enable compression
timeout=15
)
Use headless browsers only when JavaScript rendering is required.
3. Enable Compression
Always request compressed responses:
headers = {
"Accept-Encoding": "gzip, deflate, br",
# Server will send compressed response
# requests library decompresses automatically
}
Compression typically reduces HTML payload by 70-80%.
4. Use APIs Instead of Scraping
Many sites have APIs (official or unofficial) that return structured JSON — much smaller than full HTML pages.
# Scraping HTML: ~200 KB per product
response = requests.get("https://site.com/product/123")
# Using API: ~2 KB per product (100x smaller)
response = requests.get("https://api.site.com/products/123")
5. Implement Smart Caching
import hashlib
import json
import time
class ProxyCache:
def __init__(self, cache_ttl=3600):
self.cache = {}
self.ttl = cache_ttl
def get(self, url):
key = hashlib.md5(url.encode()).hexdigest()
if key in self.cache:
entry = self.cache[key]
if time.time() - entry["timestamp"] < self.ttl:
return entry["data"] # Cache hit - zero bandwidth
return None
def set(self, url, data):
key = hashlib.md5(url.encode()).hexdigest()
self.cache[key] = {
"data": data,
"timestamp": time.time()
}
6. Conditional Requests
Use HTTP conditional headers to avoid downloading unchanged content:
# First request
response = requests.get(url, proxies=proxy)
etag = response.headers.get("ETag")
last_modified = response.headers.get("Last-Modified")
# Subsequent requests
headers = {}
if etag:
headers["If-None-Match"] = etag
if last_modified:
headers["If-Modified-Since"] = last_modified
response = requests.get(url, proxies=proxy, headers=headers)
if response.status_code == 304:
# Content unchanged - minimal bandwidth used
pass
7. Minimize Retry Bandwidth
When retrying failed requests, do not retry immediately on the same proxy:
def smart_retry(url, proxy_manager, max_retries=3):
for attempt in range(max_retries):
proxy = proxy_manager.get_fresh_proxy() # Different proxy each time
try:
response = requests.get(url, proxies=proxy, timeout=10)
if response.status_code == 200:
return response
elif response.status_code in [403, 429]:
proxy_manager.mark_failed(proxy)
continue # Try different proxy
except requests.Timeout:
proxy_manager.mark_slow(proxy)
continue
return None
Bandwidth Savings Summary
| Technique | Bandwidth Reduction |
|---|---|
| Block images/media | 60-80% |
| HTTP vs headless browser | 90-95% |
| Enable compression | 70-80% |
| Use APIs vs scraping | 95-99% |
| Caching | 100% on cache hits |
| Conditional requests | 95%+ on unchanged content |
Cost Impact Example
Scraping 10,000 pages per day:
| Method | Per-Page Size | Daily Bandwidth | Monthly Cost ($10/GB) |
|---|---|---|---|
| Headless, no optimization | 3 MB | 30 GB | $300 |
| Headless, blocked resources | 500 KB | 5 GB | $50 |
| Direct HTTP, compressed | 50 KB | 500 MB | $5 |
| API requests | 2 KB | 20 MB | $0.20 |
Optimization can reduce your proxy costs by 99%.
For proxy optimization guides and cost-saving strategies, visit DataResearchTools.
Top comments (0)