DEV Community

Xavier Fok
Xavier Fok

Posted on

Proxy Bandwidth Optimization: Cut Costs Without Sacrificing Performance

Residential and mobile proxy bandwidth is expensive — $5-50 per GB. Every wasted byte is wasted money. Here are practical techniques to dramatically reduce your proxy bandwidth consumption.

Where Bandwidth Gets Wasted

1. Downloading Full Pages When You Need One Data Point

A typical web page is 2-5 MB. If you only need a price or title, you are wasting 99% of the bandwidth on images, CSS, JavaScript, and ads.

2. Loading Resources You Do Not Need

Images, videos, fonts, tracking scripts, and ads consume the bulk of page weight but are rarely needed for data extraction.

3. Redundant Requests

Scraping the same page multiple times because of poor caching or retry logic.

4. Failed Requests

Requests that fail (403, CAPTCHA, timeout) still consume bandwidth.

Optimization Techniques

1. Block Unnecessary Resources

In Playwright or Puppeteer, intercept and block heavy resources:

from playwright.sync_api import sync_playwright

def create_optimized_page(browser):
    page = browser.new_page()

    # Block images, fonts, stylesheets, media
    page.route("**/*.{png,jpg,jpeg,gif,svg,webp}", lambda route: route.abort())
    page.route("**/*.{woff,woff2,ttf,eot}", lambda route: route.abort())
    page.route("**/*.css", lambda route: route.abort())
    page.route("**/analytics*", lambda route: route.abort())
    page.route("**/tracking*", lambda route: route.abort())
    page.route("**/ads*", lambda route: route.abort())

    return page
Enter fullscreen mode Exit fullscreen mode

This alone can reduce bandwidth by 60-80%.

2. Use HTTP Instead of Headless Browsers

Headless browsers download and render everything. Direct HTTP requests download only the HTML.

import requests

# Headless browser: Downloads 3-5 MB per page
# Direct HTTP: Downloads 50-200 KB per page

response = requests.get(
    url,
    proxies=proxy,
    headers={"Accept-Encoding": "gzip, deflate, br"},  # Enable compression
    timeout=15
)
Enter fullscreen mode Exit fullscreen mode

Use headless browsers only when JavaScript rendering is required.

3. Enable Compression

Always request compressed responses:

headers = {
    "Accept-Encoding": "gzip, deflate, br",
    # Server will send compressed response
    # requests library decompresses automatically
}
Enter fullscreen mode Exit fullscreen mode

Compression typically reduces HTML payload by 70-80%.

4. Use APIs Instead of Scraping

Many sites have APIs (official or unofficial) that return structured JSON — much smaller than full HTML pages.

# Scraping HTML: ~200 KB per product
response = requests.get("https://site.com/product/123")

# Using API: ~2 KB per product (100x smaller)
response = requests.get("https://api.site.com/products/123")
Enter fullscreen mode Exit fullscreen mode

5. Implement Smart Caching

import hashlib
import json
import time

class ProxyCache:
    def __init__(self, cache_ttl=3600):
        self.cache = {}
        self.ttl = cache_ttl

    def get(self, url):
        key = hashlib.md5(url.encode()).hexdigest()
        if key in self.cache:
            entry = self.cache[key]
            if time.time() - entry["timestamp"] < self.ttl:
                return entry["data"]  # Cache hit - zero bandwidth
        return None

    def set(self, url, data):
        key = hashlib.md5(url.encode()).hexdigest()
        self.cache[key] = {
            "data": data,
            "timestamp": time.time()
        }
Enter fullscreen mode Exit fullscreen mode

6. Conditional Requests

Use HTTP conditional headers to avoid downloading unchanged content:

# First request
response = requests.get(url, proxies=proxy)
etag = response.headers.get("ETag")
last_modified = response.headers.get("Last-Modified")

# Subsequent requests
headers = {}
if etag:
    headers["If-None-Match"] = etag
if last_modified:
    headers["If-Modified-Since"] = last_modified

response = requests.get(url, proxies=proxy, headers=headers)
if response.status_code == 304:
    # Content unchanged - minimal bandwidth used
    pass
Enter fullscreen mode Exit fullscreen mode

7. Minimize Retry Bandwidth

When retrying failed requests, do not retry immediately on the same proxy:

def smart_retry(url, proxy_manager, max_retries=3):
    for attempt in range(max_retries):
        proxy = proxy_manager.get_fresh_proxy()  # Different proxy each time
        try:
            response = requests.get(url, proxies=proxy, timeout=10)
            if response.status_code == 200:
                return response
            elif response.status_code in [403, 429]:
                proxy_manager.mark_failed(proxy)
                continue  # Try different proxy
        except requests.Timeout:
            proxy_manager.mark_slow(proxy)
            continue
    return None
Enter fullscreen mode Exit fullscreen mode

Bandwidth Savings Summary

Technique Bandwidth Reduction
Block images/media 60-80%
HTTP vs headless browser 90-95%
Enable compression 70-80%
Use APIs vs scraping 95-99%
Caching 100% on cache hits
Conditional requests 95%+ on unchanged content

Cost Impact Example

Scraping 10,000 pages per day:

Method Per-Page Size Daily Bandwidth Monthly Cost ($10/GB)
Headless, no optimization 3 MB 30 GB $300
Headless, blocked resources 500 KB 5 GB $50
Direct HTTP, compressed 50 KB 500 MB $5
API requests 2 KB 20 MB $0.20

Optimization can reduce your proxy costs by 99%.

For proxy optimization guides and cost-saving strategies, visit DataResearchTools.

Top comments (0)