Amazon is the most-requested scraping target. Here's how to do it right.
The Challenge
Amazon actively blocks scrapers with:
- IP-based rate limiting
- CAPTCHAs
- Request fingerprinting
- Bot detection scripts
Strategy 1: Product Advertising API (Official)
Amazon has an official API for affiliates. No scraping needed.
// Amazon Product Advertising API v5
const params = {
Keywords: 'wireless headphones',
SearchIndex: 'Electronics',
Resources: ['ItemInfo.Title', 'Offers.Listings.Price']
};
Requires an Amazon Associates account (free).
Strategy 2: API-First Scraping
Amazon product pages make AJAX calls for price/review data. Intercept them instead of parsing HTML.
// Using Playwright to intercept API calls
const apiData = [];
page.on('response', async (response) => {
const url = response.url();
if (url.includes('/api/') || url.includes('data-ajax')) {
try {
const json = await response.json();
apiData.push(json);
} catch (e) {}
}
});
await page.goto(productUrl);
Strategy 3: Respectful HTML Scraping
If you must parse HTML:
const delay = (ms) => new Promise(r => setTimeout(r, ms));
async function scrapeProduct(url) {
const res = await fetch(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept-Language': 'en-US,en;q=0.9'
}
});
if (res.status === 503) {
console.log('Blocked! Waiting 30s...');
await delay(30000);
return scrapeProduct(url); // retry
}
const html = await res.text();
// Parse with Cheerio...
}
// Always add delays between requests
for (const url of productUrls) {
await scrapeProduct(url);
await delay(3000 + Math.random() * 5000);
}
Key Rules
- Use official APIs first — PA API for products, SP API for sellers
- Respect rate limits — 3-8 second delays between requests
- Rotate user agents — look like different browsers
- Don't scrape at scale without proxies
- Cache results — don't re-scrape unchanged pages
Resources
Need Amazon data extracted? Product listings, reviews, prices, competitor analysis — $20-50. Email: Spinov001@gmail.com | Hire me
Top comments (0)