DEV Community

Charles
Charles

Posted on

The Complete Guide to Web Scraping E-Commerce Sites in 2026

The Complete Guide to Web Scraping E-Commerce Sites in 2026

E-commerce scraping is the most common — and most difficult — scraping task. Here's the complete playbook.

Why E-Commerce is Hard

  • Anti-bot protection: Amazon, Walmart, Target all use aggressive bot detection
  • Dynamic content: Products load via JavaScript, not HTML
  • Rate limits: Aggressive throttling after N requests
  • Session tracking: Behavioral analysis tracks mouse movements and scroll patterns

Step-by-Step Strategy

Step 1: Choose Your Approach

Approach Best For Difficulty
API Simple sites, small scale Easy
Headless Browser JS-rendered, moderate scale Medium
Scraping API Any site, any scale Easy (just configure)

Step 2: Handle Product Pages

Key data to extract:

  • Title, price, availability
  • Reviews and ratings
  • Specifications
  • Images (URLs)
  • SKU/ASIN

Step 3: Handle Pagination

Most e-commerce sites paginate. Solutions:

  • URL parameter cycling (?page=1, ?page=2)
  • "Show More" button clicking (requires headless browser)
  • Infinite scroll (requires headless browser)

Step 4: Handle Variants

Products come in colors, sizes, models. Each variant has a different SKU and often a different URL.

Step 5: Scale

Use concurrent requests (5-10 parallel), rotate proxies, add random delays.

Quick Start with XCrawl

const { XcrawlScraper } = require('xcrawl-scraper');
const client = new XcrawlScraper({ apiKey: 'YOUR_KEY' });

const product = await client.scrape({
  url: 'https://amazon.com/dp/EXAMPLE',
  js_render: true,
  proxy: { country: 'US' },
  extraction: {
    mode: 'llm',
    schema: { title: 'string', price: 'string', rating: 'number' }
  }
});
Enter fullscreen mode Exit fullscreen mode

Scrape e-commerce sites reliably: XCrawl API

Top comments (0)