Charles

Posted on Jun 8

How I Built a Real-Time Amazon Price Tracker That Actually Works

#webscraping #amazon #pricemonitoring #xcrawl

How I Built a Real-Time Amazon Price Tracker That Actually Works

Most Amazon price tracking tools break within days. Amazon blocks datacenter IPs, changes HTML structure randomly, and throws CAPTCHAs at anything that looks like a bot. After building dozens of scraping systems for clients, I've learned what actually works.

Here's the architecture I've used for 2+ years without a single day of downtime.

The Problem with Basic Scrapers

The typical approach: run a Python script via cron job, scrape the price, store it in a CSV. Simple, right? Except:

Amazon detects datacenter IPs → 403 errors within hours
HTML structure changes without notice → your parser breaks silently
Rate limits hit → your IP gets temp-banned for 24 hours
No error handling → you don't even know it's broken until you check manually

The fix isn't a better parser. It's a complete architecture rethink.

Architecture Overview

┌─────────────┐    ┌──────────────┐    ┌────────────────┐
│  Scheduler  │───▶│  XCrawl API   │───▶│  Data Store     │
│  (cron/utc) │    │  (residential │    │  (Airtable/     │
│             │    │   proxies)   │    │   Notion)       │
└─────────────┘    └──────────────┘    └────────────────┘
                          │
                          ▼
                   ┌──────────────┐
                   │  Alert Logic  │
                   │  (price drop │───▶ Email/Telegram
                   │   detection) │
                   └──────────────┘

Step 1: Residential Proxies (Non-Negotiable)

Datacenter IPs are immediately flagged. Residential IPs rotate through real home connections — Amazon sees them as normal shoppers.

With XCrawl, every request automatically rotates through residential proxies across 50+ countries:

const { XCrawlScraper } = require('xcrawl-scraper');

const xcrawl = new XCrawlScraper({
  apiKey: process.env.XCRAWL_API_KEY,
  // Automatically rotates residential proxies
});

async function getPrice(url) {
  const result = await xcrawl.scrapeMarkdown(url, {
    render: true, // Handle JavaScript-rendered pages
  });

  // Extract price from markdown
  const priceMatch = result.data.markdown.match(/\$[\d,]+\.?\d*/);
  return priceMatch ? priceMatch[0] : null;
}

Step 2: Handle Structure Changes Gracefully

Amazon changes HTML constantly. The solution: extract multiple possible formats and validate.

async function parsePrice(markdown) {
  const patterns = [
    /\$[\d,]+\.?\d{2}/,        // $129.99
    /USD\s*([\d.]+)/i,          // USD 129.99
    /price.*?([\d.]+)/i,        // "price":129.99
    /class="a-price-whole">(\d+)/ // HTML structure
  ];

  for (const pattern of patterns) {
    const match = markdown.match(pattern);
    if (match) return parseFloat(match[1].replace(/,/g, ''));
  }

  // Don't fail silently — alert yourself
  sendAlert('Price parse failed for: ' + url);
  return null;
}

Step 3: Reliable Scheduling

I run price checks every 6 hours (not more — to avoid detection). Use UTC cron to avoid timezone bugs:

# Every 6 hours, UTC
0 */6 * * * /usr/bin/node /app/price-tracker.js >> /var/log/cron.log 2>&1

Key trick: always store checked_at timestamp in UTC. When comparing prices across timezones, this prevents false alerts.

Step 4: Alert Logic

A price drop is only interesting if it's significant. Track the 30-day average and alert only on meaningful changes:

function shouldAlert(currentPrice, history) {
  if (history.length < 7) return false; // Need at least a week of data

  const avg30d = history.slice(-30).reduce((a, b) => a + b.price, 0) / Math.min(history.length, 30);
  const change = ((currentPrice - avg30d) / avg30d) * 100;

  return change <= -5; // Only alert on 5%+ drop
}

Results After 6 Months

Metric	Value
Uptime	99.4%
False alerts	3 (out of 140+)
Successful scrapes	2,847
Countries rotated through	23

The Secret Nobody Tells You

The most important part isn't the scraping — it's the data storage. Store raw markdown (not just parsed prices) so you can re-parse if your extraction logic changes. I learned this the hard way when a major Amazon UI change broke my price parser for 3 weeks before I realized I had all the raw data and could rebuild the parser retroactively.

Want the full working code? I open-sourced the tracker on GitHub — link in bio. Questions? Drop them below.

DEV Community

How I Built a Real-Time Amazon Price Tracker That Actually Works

How I Built a Real-Time Amazon Price Tracker That Actually Works

The Problem with Basic Scrapers

Architecture Overview

Step 1: Residential Proxies (Non-Negotiable)

Step 2: Handle Structure Changes Gracefully

Step 3: Reliable Scheduling

Step 4: Alert Logic

Results After 6 Months

The Secret Nobody Tells You

Top comments (0)