Getting blocked is the #1 frustration in web scraping. Here's how to avoid it.
Rule 1: Always Set User-Agent
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
};
Without a User-Agent, many sites return 403 immediately.
Rule 2: Add Random Delays
const delay = (ms) => new Promise(r => setTimeout(r, ms));
for (const url of urls) {
const data = await scrape(url);
await delay(1000 + Math.random() * 3000); // 1-4s random
}
Fixed delays get detected. Random delays look human.
Rule 3: Rotate User-Agents
const agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];
const randomAgent = agents[Math.floor(Math.random() * agents.length)];
Rule 4: Handle Rate Limits Gracefully
async function fetchWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const res = await fetch(url, { headers });
if (res.status === 429) {
const wait = Math.pow(2, i) * 5000; // exponential backoff
await delay(wait);
continue;
}
return res;
}
throw new Error('Max retries reached');
}
Rule 5: Use APIs Instead
The best anti-bot strategy: don't scrape HTML at all. Use the site's JSON API.
7 sites that return JSON directly — no blocks, no captchas.
When You Actually Need Proxies
- Scraping 10,000+ pages from one site
- Site uses IP-based rate limiting
- Need geographic diversity (prices vary by country)
For most small jobs (< 1000 pages), delays + user-agent rotation is enough.
More Resources
Getting blocked? I'll handle it. $20-50 depending on complexity. 77 production scrapers. Email: Spinov001@gmail.com | Hire me
Top comments (0)