Vhub Systems

Posted on Apr 3

User Agent Rotation for Web Scraping: What Works, What Doesn't, and What You Actually Need

#webscraping #python #tutorial #javascript

User-agent rotation is one of the most commonly misunderstood anti-detection techniques. It helps — but it's not magic. Here's what actually works and why.

What is a user agent?

The user agent string is a header sent with every HTTP request that identifies the client software:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

This tells the server:

OS: Windows 10, 64-bit
Browser engine: WebKit/Blink
Browser: Chrome 122

Websites use this to:

Serve device-appropriate content (mobile vs desktop)
Detect bots (unusual or outdated user agents)
Identify scraper frameworks (Python's requests sends python-requests/2.28.0 by default)

The basics: always set a real user agent

import requests

# BAD - sends "python-requests/2.28.0" — immediately flagged
response = requests.get('https://example.com')

# GOOD - looks like a real browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
}
response = requests.get('https://example.com', headers=headers)

Simple rotation pool

import requests
import random

USER_AGENTS = [
    # Chrome on Windows (most common)
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    # Chrome on Mac
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
    # Firefox on Windows
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0',
    # Safari on Mac
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_3_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Safari/605.1.15',
    # Edge on Windows
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0',
]

def get_random_ua() -> str:
    return random.choice(USER_AGENTS)

def make_request(url: str) -> requests.Response:
    headers = {'User-Agent': get_random_ua()}
    return requests.get(url, headers=headers, timeout=15)

Using fake-useragent library

from fake_useragent import UserAgent

ua = UserAgent()

def make_request(url: str) -> requests.Response:
    headers = {'User-Agent': ua.random}
    return requests.get(url, headers=headers, timeout=15)

# Specific browser types
chrome_ua = ua.chrome
firefox_ua = ua.firefox
mobile_ua = ua.random  # Sometimes returns mobile

print(ua.chrome)
# Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0

Install: pip install fake-useragent

Stateful rotation: consistent sessions

Some sites track user agent consistency across requests. Using a different UA on every request looks suspicious. Maintain UA per session:

import requests
import random

USER_AGENTS = [...]

class ScrapingSession:
    def __init__(self):
        self.session = requests.Session()
        # Pick one UA for this session's lifetime
        self.user_agent = random.choice(USER_AGENTS)
        self.session.headers.update({'User-Agent': self.user_agent})

    def get(self, url: str, **kwargs) -> requests.Response:
        return self.session.get(url, **kwargs)

    def rotate_ua(self):
        """Call this when starting a new session/domain"""
        self.user_agent = random.choice(USER_AGENTS)
        self.session.headers.update({'User-Agent': self.user_agent})

# Example usage
session = ScrapingSession()
session.get('https://example.com')       # Request 1 - UA: "Chrome 122"
session.get('https://example.com/page2') # Request 2 - same UA: "Chrome 122"

session.rotate_ua()  # New "browsing session"
session.get('https://example.com/page3') # Request 3 - new UA: "Firefox 124"

What user agent rotation does NOT fix

User agent rotation alone is insufficient against modern bot detection. Sites like Cloudflare, Akamai, and Imperva check:

TLS fingerprint — Python's requests library sends a TLS handshake that looks nothing like Chrome, regardless of the User-Agent header:

# Even with a Chrome UA, this still fails TLS fingerprint check
import requests
headers = {'User-Agent': 'Mozilla/5.0 Chrome/122.0.0.0'}
response = requests.get('https://cloudflare-protected.com', headers=headers)
# Returns 403 despite "correct" user agent

Fix: Use curl_cffi which replicates Chrome's actual TLS fingerprint:

from curl_cffi import requests as cf_requests

# This passes TLS fingerprint checks
response = cf_requests.get(
    'https://cloudflare-protected.com',
    impersonate='chrome122',  # Chrome TLS fingerprint
)

Browser fingerprint — If you're using Playwright/Puppeteer, they expose automation markers that user agent rotation doesn't hide:

navigator.webdriver = true
Missing browser plugins
Different canvas/WebGL fingerprint

IP reputation — A consistent user agent from a known datacenter IP still gets flagged.

The right combination for serious anti-detection

For sites with real bot detection:

from curl_cffi import requests as cf_requests
import random

# Residential proxy + Chrome TLS fingerprint + rotating UA
proxies = {
    "https": "http://user:pass@residential-proxy.example.com:8080"
}

chrome_uas = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
]

response = cf_requests.get(
    "https://protected-site.com",
    impersonate="chrome122",    # TLS fingerprint
    headers={"User-Agent": random.choice(chrome_uas)},  # UA rotation
    proxies=proxies             # IP rotation
)

This combination handles ~80% of anti-bot systems.

Quick reference: what each technique fixes

Technique	Fixes
Set any browser UA	Python requests signature
UA rotation	Pattern-based rate limiting
Session-based UA	Session consistency checks
curl_cffi + impersonate	TLS fingerprint detection
Playwright stealth	Browser automation markers
Residential proxies	IP reputation, rate limits

UA rotation is step one, not step ten. Most sites need the full stack.

Skip the stack entirely

The Apify Scrapers Bundle ($29) includes 35+ actors that handle UA rotation, TLS fingerprinting, and proxy rotation internally. No configuration needed.

n8n AI Automation Pack ($39) — 5 production-ready workflows

Production-Ready Scrapers

For scraping at scale without managing infrastructure:

DEV Community