DEV Community

Charles
Charles

Posted on

How I Built a Real-Time Amazon Price Tracker That Actually Works

How I Built a Real-Time Amazon Price Tracker That Actually Works

Most Amazon price tracking tools break within days. Amazon blocks datacenter IPs, changes HTML structure randomly, and throws CAPTCHAs at anything that looks like a bot. After building dozens of scraping systems for clients, I've learned what actually works.

Here's the architecture I've used for 2+ years without a single day of downtime.

The Problem with Basic Scrapers

The typical approach: run a Python script via cron job, scrape the price, store it in a CSV. Simple, right? Except:

  • Amazon detects datacenter IPs → 403 errors within hours
  • HTML structure changes without notice → your parser breaks silently
  • Rate limits hit → your IP gets temp-banned for 24 hours
  • No error handling → you don't even know it's broken until you check manually

The fix isn't a better parser. It's a complete architecture rethink.

Architecture Overview

┌─────────────┐    ┌──────────────┐    ┌────────────────┐
│  Scheduler  │───▶│  XCrawl API   │───▶│  Data Store     │
│  (cron/utc) │    │  (residential │    │  (Airtable/     │
│             │    │   proxies)   │    │   Notion)       │
└─────────────┘    └──────────────┘    └────────────────┘
                          │
                          ▼
                   ┌──────────────┐
                   │  Alert Logic  │
                   │  (price drop │───▶ Email/Telegram
                   │   detection) │
                   └──────────────┘
Enter fullscreen mode Exit fullscreen mode

Step 1: Residential Proxies (Non-Negotiable)

Datacenter IPs are immediately flagged. Residential IPs rotate through real home connections — Amazon sees them as normal shoppers.

With XCrawl, every request automatically rotates through residential proxies across 50+ countries:

const { XCrawlScraper } = require('xcrawl-scraper');

const xcrawl = new XCrawlScraper({
  apiKey: process.env.XCRAWL_API_KEY,
  // Automatically rotates residential proxies
});

async function getPrice(url) {
  const result = await xcrawl.scrapeMarkdown(url, {
    render: true, // Handle JavaScript-rendered pages
  });

  // Extract price from markdown
  const priceMatch = result.data.markdown.match(/\$[\d,]+\.?\d*/);
  return priceMatch ? priceMatch[0] : null;
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Handle Structure Changes Gracefully

Amazon changes HTML constantly. The solution: extract multiple possible formats and validate.

async function parsePrice(markdown) {
  const patterns = [
    /\$[\d,]+\.?\d{2}/,        // $129.99
    /USD\s*([\d.]+)/i,          // USD 129.99
    /price.*?([\d.]+)/i,        // "price":129.99
    /class="a-price-whole">(\d+)/ // HTML structure
  ];

  for (const pattern of patterns) {
    const match = markdown.match(pattern);
    if (match) return parseFloat(match[1].replace(/,/g, ''));
  }

  // Don't fail silently — alert yourself
  sendAlert('Price parse failed for: ' + url);
  return null;
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Reliable Scheduling

I run price checks every 6 hours (not more — to avoid detection). Use UTC cron to avoid timezone bugs:

# Every 6 hours, UTC
0 */6 * * * /usr/bin/node /app/price-tracker.js >> /var/log/cron.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Key trick: always store checked_at timestamp in UTC. When comparing prices across timezones, this prevents false alerts.

Step 4: Alert Logic

A price drop is only interesting if it's significant. Track the 30-day average and alert only on meaningful changes:

function shouldAlert(currentPrice, history) {
  if (history.length < 7) return false; // Need at least a week of data

  const avg30d = history.slice(-30).reduce((a, b) => a + b.price, 0) / Math.min(history.length, 30);
  const change = ((currentPrice - avg30d) / avg30d) * 100;

  return change <= -5; // Only alert on 5%+ drop
}
Enter fullscreen mode Exit fullscreen mode

Results After 6 Months

Metric Value
Uptime 99.4%
False alerts 3 (out of 140+)
Successful scrapes 2,847
Countries rotated through 23

The Secret Nobody Tells You

The most important part isn't the scraping — it's the data storage. Store raw markdown (not just parsed prices) so you can re-parse if your extraction logic changes. I learned this the hard way when a major Amazon UI change broke my price parser for 3 weeks before I realized I had all the raw data and could rebuild the parser retroactively.


Want the full working code? I open-sourced the tracker on GitHub — link in bio. Questions? Drop them below.

Top comments (0)