I recently worked on a small market analysis project where I needed pricing data from a competitor’s website. Sounds simple — until I realized the site was protected with aggressive bot detection.
Basic requests got blocked instantly.
Headless browsers triggered CAPTCHAs.
Even rate limiting kicked in after a few attempts.
After a lot of testing, I found a setup that worked surprisingly well:
rotating user agents
realistic request headers
slower request timing
and a scraping API that handles proxy rotation + anti-bot protection
Here’s a simplified example:
import requests
from fake_useragent import UserAgent
ua = UserAgent()
headers = {
'User-Agent': ua.random
}
url = 'https://competitor.com/pricing'
response = requests.get(url, headers=headers)
if response.status_code == 200:
print("Successfully retrieved content")
else:
print(f"Blocked with status: {response.status_code}")
For heavily protected sites, I eventually switched to a dedicated scraping endpoint that handled:
IP rotation
browser fingerprinting
CAPTCHA bypassing
retries automatically
That made the process much more reliable.
A few things I learned:
user-agent rotation alone isn’t enough anymore
request timing patterns matter
residential proxies perform much better
some sites specifically detect headless Chrome behavior
I’m curious what everyone else is using these days for scraping protected sites.
Still using Selenium/Playwright?
Proxy pools?
Dedicated APIs?
Or custom browser automation setups?
Top comments (0)