DEV Community

Алексей Спинов
Алексей Спинов

Posted on

How to Scrape Amazon Product Data Without Getting Blocked

Amazon is the most-requested scraping target. Here's how to do it right.

The Challenge

Amazon actively blocks scrapers with:

  • IP-based rate limiting
  • CAPTCHAs
  • Request fingerprinting
  • Bot detection scripts

Strategy 1: Product Advertising API (Official)

Amazon has an official API for affiliates. No scraping needed.

// Amazon Product Advertising API v5
const params = {
  Keywords: 'wireless headphones',
  SearchIndex: 'Electronics',
  Resources: ['ItemInfo.Title', 'Offers.Listings.Price']
};
Enter fullscreen mode Exit fullscreen mode

Requires an Amazon Associates account (free).

Strategy 2: API-First Scraping

Amazon product pages make AJAX calls for price/review data. Intercept them instead of parsing HTML.

// Using Playwright to intercept API calls
const apiData = [];
page.on('response', async (response) => {
  const url = response.url();
  if (url.includes('/api/') || url.includes('data-ajax')) {
    try {
      const json = await response.json();
      apiData.push(json);
    } catch (e) {}
  }
});
await page.goto(productUrl);
Enter fullscreen mode Exit fullscreen mode

Strategy 3: Respectful HTML Scraping

If you must parse HTML:

const delay = (ms) => new Promise(r => setTimeout(r, ms));

async function scrapeProduct(url) {
  const res = await fetch(url, {
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Accept-Language': 'en-US,en;q=0.9'
    }
  });

  if (res.status === 503) {
    console.log('Blocked! Waiting 30s...');
    await delay(30000);
    return scrapeProduct(url); // retry
  }

  const html = await res.text();
  // Parse with Cheerio...
}

// Always add delays between requests
for (const url of productUrls) {
  await scrapeProduct(url);
  await delay(3000 + Math.random() * 5000);
}
Enter fullscreen mode Exit fullscreen mode

Key Rules

  1. Use official APIs first — PA API for products, SP API for sellers
  2. Respect rate limits — 3-8 second delays between requests
  3. Rotate user agents — look like different browsers
  4. Don't scrape at scale without proxies
  5. Cache results — don't re-scrape unchanged pages

Resources


Need Amazon data extracted? Product listings, reviews, prices, competitor analysis — $20-50. Email: Spinov001@gmail.com | Hire me

Top comments (0)