Jerry A. Henley

Posted on Apr 3

Automate Competitor Price Tracking: Turn One-Off Scrapes into Weekly Audits

#webdev #python #ai #webscraping

Manual competitor analysis is a silent productivity killer. For growth and revenue teams, keeping tabs on a competitor like Zappos often involves an exhausting cycle of spot-checking top-selling items or waiting weeks for a busy engineering team to prioritize a custom scraper. By the time you get the data, the sale is over or the stock has shifted.

The problem with spot-checking is that it only captures a moment, not a trend. To truly understand a competitor's strategy, you need to monitor entire categories—like "Nike Running Shoes"—to see how prices fluctuate and when items go out of stock.

This guide shows you how to build a repeatable Zappos price monitor using Node.js and Playwright. We’ll use production-ready patterns from the ScrapeOps Zappos Scraper Bank to extract entire category listings and create a "Diff" engine that alerts you to price drops and new arrivals.

The Strategy: From Spot-Check to Audit

Most developers start by scraping a single product page. While useful, the real business value lies in Category Snapshotting.

Instead of looking at one shoe, we scrape the entire "Men's Running" category on Day 1. We then repeat the process on Day 7. By comparing these two snapshots, we can calculate the delta (the change) between the datasets. This allows us to identify:

Price Drops: Items where the current price is lower than last week.
Stock-outs: Items that were present last week but are missing now.
New Arrivals: Fresh inventory that didn't exist in the previous snapshot.

Prerequisites

To follow this tutorial, you'll need:

Node.js installed on your machine.
A text editor like VS Code.
A ScrapeOps API Key for proxy rotation (you can get a free one here).
Basic familiarity with the terminal.

Setup: Configuring the Category Scraper

We’ll use Playwright because Zappos uses dynamic content that simple HTML parsers often miss. Playwright renders the page fully, ensuring we capture prices even if they load via JavaScript.

Start by cloning the repository and installing the dependencies:

# Clone the repository
git clone https://github.com/scraper-bank/Zappos.com-Scrapers.git

# Navigate to the Playwright category scraper directory
cd Zappos.com-Scrapers/node/playwright/product_category

# Install dependencies
npm install playwright-extra puppeteer-extra-plugin-stealth cheerio

Open scraper/zappos_scraper_product_category_v1.js and replace YOUR-API_KEY with your actual ScrapeOps API key. This routes your requests through residential proxies to avoid anti-bot detection.

Code Walkthrough: Ensuring Data Integrity

When building a price monitor, data reliability is everything. If your scraper saves the same product twice because of a page refresh or a sorting shift, your analysis will be skewed.

The ScrapeOps implementation uses a DataPipeline class to handle this. It uses a Set to track itemsSeen. Even if Zappos shifts product order during pagination, the script only records each item once.

class DataPipeline {
    constructor(outputFile = CONFIG.outputFile) {
        this.itemsSeen = new Set();
        this.outputFile = outputFile;
        this.writeFile = promisify(fs.appendFile);
    }

    isDuplicate(data) {
        // Use the unique Product ID or URL as the key
        const itemKey = data.productId || data.url;
        if (this.itemsSeen.has(itemKey)) {
            console.warn('Duplicate item found, skipping');
            return true;
        }
        this.itemsSeen.add(itemKey);
        return false;
    }

    async addData(scrapedData) {
        if (!this.isDuplicate(scrapedData)) {
            const jsonLine = JSON.stringify(scrapedData) + '\n';
            await this.writeFile(this.outputFile, jsonLine, 'utf8');
        }
    }
}

Why JSONL?

The script outputs to .jsonl (JSON Lines). Unlike a standard JSON array, JSONL stores one object per line. This is a recommended approach for web scraping because:

It’s streamable: You can read the file line-by-line without loading massive files into memory.
It’s reliable: If the scraper crashes on page 10, the first 9 pages of data are already safely saved on disk.

Execution: Running the Scrape

To monitor a specific category, find the URL on Zappos, such as a filtered search for "Nike Running Shoes," and pass it to the script.

Run this command in your terminal:

node scraper/zappos_scraper_product_category_v1.js

The scraper will navigate through the pagination. You'll see logs as it saves items:
Saved item to zappos_com_product_category_page_scraper_data_20240214_120000.jsonl

Pro Tip: Rename your output files with the date, like nike_running_week_01.jsonl. This makes the comparison step much easier.

The "Diff" Strategy: Finding Insights

Once you have two files (e.g., week1.jsonl and week2.jsonl), you need to find the differences. We can use a simple Node.js utility script to perform this audit.

This script loads the old data into a Map for fast lookup and then iterates through the new data to find changes.

const fs = require('fs');

function loadData(filePath) {
    const data = new Map();
    const lines = fs.readFileSync(filePath, 'utf8').split('\n');
    lines.forEach(line => {
        if (line) {
            const item = JSON.parse(line);
            data.set(item.productId, item);
        }
    });
    return data;
}

const oldWeek = loadData('nike_running_week_01.jsonl');
const newWeek = loadData('nike_running_week_02.jsonl');

newWeek.forEach((newItem, pid) => {
    const oldItem = oldWeek.get(pid);

    if (oldItem) {
        if (newItem.price < oldItem.price) {
            console.log(`🚨 PRICE DROP: ${newItem.name} is now $${newItem.price} (was $${oldItem.price})`);
        }
    } else {
        console.log(`✨ NEW ARRIVAL: ${newItem.name} added at $${newItem.price}`);
    }
});

oldWeek.forEach((oldItem, pid) => {
    if (!newWeek.has(pid)) {
        console.log(`❌ STOCK OUT: ${oldItem.name} is no longer listed.`);
    }
});

This logic transforms raw text files into actionable business intelligence. You can now see exactly which products your competitors are discounting and what new inventory they are betting on.

Operationalizing: Making it Repeatable

Running a script manually every Monday is still manual work. To turn this into a true monitor, you should automate the execution.

Using Cron (Linux/Mac)

You can schedule the scraper to run every Monday at 9:00 AM by adding a line to your crontab:
0 9 * * 1 /usr/local/bin/node /path/to/zappos_scraper.js

Storing Data

Create a dedicated folder for your snapshots. Over six months, these timestamped JSONL files become a powerful historical dataset. You can use them to identify seasonal pricing trends. For example, does Zappos always drop prices on specific brands in October?

To wrap up

Building a professional-grade price monitor doesn't require a massive engineering team. By using robust open-source scrapers and a "snapshot and diff" strategy, you can automate your competitor research.

Key Takeaways:

Category Scrapes > Product Scrapes: Monitoring whole categories provides context that single-product tracking lacks.
Data Integrity: Use unique identifiers like productId and pipelines to prevent duplicates from ruining your analysis.
JSONL is better: Use streamable formats for long-running scrapes to ensure data safety.
Automate the Audit: Use a simple comparison script to highlight price drops and stock changes automatically.

DEV Community