DEV Community

guardlabs_team
guardlabs_team

Posted on • Originally published at guardlabs.online

How to Stop Selenium Scrapers from Getting Blocked

How to Stop Selenium Scrapers from Getting Blocked

Websites block Selenium because default configurations broadcast automation signatures (such as the navigator.webdriver flag) and exhibit predictable, non-human behavior. To prevent blocks, you must modify your browser fingerprint, rotate IP addresses, randomize request timing, and send realistic headers.

1. Mask Browser Fingerprints with Undetected ChromeDriver

Standard Selenium drivers leave JavaScript variables (like cdc_adoQpoasnfa76pfcZLmcfl_Array) that anti-bot systems instantly detect. The easiest way to bypass this is using the undetected-chromedriver library, which automatically patches these variables.

import undetected_chromedriver as uc

options = uc.ChromeOptions()
# Run headless only if necessary; headful mode is less suspicious
options.add_argument('--headless') 

driver = uc.Chrome(options=options)
driver.get("https://targetwebsite.com")

Enter fullscreen mode Exit fullscreen mode

2. Configure Realistic Headers and User-Agents

If your User-Agent string indicates an outdated browser or doesn't match your actual browser engine, you will be flagged. Set a modern User-Agent and ensure your request headers match those of a standard consumer browser.

import undetected_chromedriver as uc

options = uc.ChromeOptions()
# Use a real, up-to-date User-Agent string
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
options.add_argument(f'--user-agent={user_agent}')

driver = uc.Chrome(options=options)

Enter fullscreen mode Exit fullscreen mode

3. Implement Randomized Delays and Human Interactions

Repetitive, rapid actions trigger rate limits. Avoid static wait times (like time.sleep(5)). Instead, use randomized intervals and simulate basic human interactions like scrolling.

import time
import random
from selenium.webdriver.common.by import By

# 1. Randomized sleep intervals
time.sleep(random.uniform(2.0, 6.0))

# 2. Simulate natural scrolling
driver.execute_script("window.scrollTo(0, document.body.scrollHeight * 0.5);")
time.sleep(random.uniform(1.0, 3.0))

# 3. Use explicit waits instead of hardcoded sleeps for elements
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "content-loaded"))
)

Enter fullscreen mode Exit fullscreen mode

4. Rotate IPs with Proxies

Making hundreds of requests from a single IP address will result in an IP ban. For production scrapers, route your traffic through a rotating proxy service (preferably residential or mobile proxies).

import undetected_chromedriver as uc

options = uc.ChromeOptions()

# Format: IP:PORT or proxy provider gateway
PROXY = "192.168.1.100:8080" 
options.add_argument(f'--proxy-server={PROXY}')

driver = uc.Chrome(options=options)

Enter fullscreen mode Exit fullscreen mode

Note: If your proxy requires username/password authentication, standard Chrome command-line arguments do not support it directly. You will need to use a proxy-auth extension or a tool like Selenium Wire.

Summary Checklist for Evading Detection

- **Disable the WebDriver flag:** Use `undetected-chromedriver` or manually exclude the switches.
- **Match your headers:** Ensure your User-Agent matches the browser version you are running.
- **Use Residential Proxies:** Datacenter IPs are easily flagged and blocked by Cloudflare and Akamai.
- **Limit concurrent requests:** Distribute your scraping load over time to mimic organic traffic.
Enter fullscreen mode Exit fullscreen mode

While these techniques mitigate blocks, highly secured sites using advanced behavioral analysis may still detect automated browsers. No single method guarantees 100% bypass rates indefinitely.
Need this done fast? order a fix on Kwork (https://kwork.com/scripting/52990947/python-script-parser-or-automation-for-your-routine-task).

Top comments (0)