DEV Community: Jonathan D. Fisher

Market Gap Analysis with Beautylish Data: A Node.js Guide

Jonathan D. Fisher — Sat, 14 Mar 2026 15:55:00 +0000

In the competitive world of e-commerce, knowing what your competitors sell is only half the battle. To gain a real edge, you need to understand their Share of Shelf, which is the percentage of total inventory a specific brand occupies within a category. Manually counting products across dozens of pages is a recipe for burnout, but you can automate this process in minutes using Node.js.

This guide walks through building a data pipeline to scrape Beautylish category data for market gap analysis. We will identify which brands dominate "New Arrivals," calculate average price points, and spot potential gaps where new products could thrive.

Prerequisites & Setup

You’ll need Node.js installed on your machine. We will use the Beautylish Scrapers repository as a foundation.

Clone the repository and navigate to the Cheerio-Axios category scraper:

git clone https://github.com/scraper-bank/Beautylish.com-Scrapers.git
cd Beautylish.com-Scrapers/node/cheerio-axios/product_category
npm install

You also need a ScrapeOps API Key. Beautylish employs anti-bot measures that often block standard Axios requests. ScrapeOps provides a proxy wrapper that handles retries and rotates IP addresses to keep your scraper running. You can get a free API key here.

Step 1: The Category Scraper Strategy

To perform a market analysis, we target "Category" or "Browse" pages. Unlike a single product page, a category page contains a grid of items where each card offers a snapshot of data, specifically the brand name, product name, and price.

The beautylish_scraper_product_category_v1.js script in the repository uses Cheerio for fast HTML parsing and Axios for requests. The core extraction logic lives in the extractData function:

// node/cheerio-axios/product_category/scraper/beautylish_scraper_product_category_v1.js

const items = $(".product-list-item, .product-grid-item, .product-card");

items.each((i, el) => {
    const s = $(el);
    const product = {};

    // Target the Brand and Name
    product.brand = s.find(".product-brand, .brand-name").first().text().trim();
    product.name = s.find(".product-name, .product-title").first().text().trim();

    // Extract Price and Currency
    const priceText = s.find(".product-price, .price").first().text().trim();
    if (priceText) {
        product.priceValue = parseFloat(priceText.replace(/[^0-9.]/g, ""));
        product.currency = detectCurrency(priceText);
    }
    products.push(product);
});

This approach is efficient because we extract data for 20-50 products in one request instead of visiting every individual product URL.

Step 2: Handling Pagination for Complete Data

Analyzing just the first page of "New Arrivals" gives a biased view of the most recent stock. To get the full picture, we have to traverse the pagination.

Beautylish uses a URL parameter like ?page=2. We can modify the logic to loop through these pages until no more products are found. While the base script handles a single URL, you can wrap it in a simple loop:

async function scrapeAllPages(baseUrl) {
    let currentPage = 1;
    let hasMore = true;

    while (hasMore) {
        const url = `${baseUrl}&page=${currentPage}`;
        console.log(`Scraping Page ${currentPage}...`);

        const result = await scrapePage(url, pipeline);

        // Stop the loop if the page returns no products
        if (!result || result.products.length === 0) {
            hasMore = false;
        } else {
            currentPage++;
        }
    }
}

By setting maxConcurrency: 1 in the script's CONFIG, we avoid overwhelming the servers. This is a better approach for staying undetected and practicing ethical scraping.

Step 3: Running the Extraction

Update beautylish_scraper_product_category_v1.js with your API_KEY and the target URL. For this analysis, we will target the "New Arrivals" section.

const API_KEY = 'YOUR_SCRAPEOPS_API_KEY';
const urls = ['https://www.beautylish.com/shop/browse?tag=new-arrivals'];

Run the script from your terminal:

node scraper/beautylish_scraper_product_category_v1.js

The script generates a .jsonl file. JSONL (JSON Lines) is the preferred format here because it allows you to stream data line-by-line during analysis, which uses much less memory than loading a giant JSON array.

Step 4: Building the Market Gap Analyzer

Now we process the raw data. Create a new script called analyze_trends.js to calculate two metrics:

Share of Shelf: The product count for each brand.
Average Price: The typical price point for those products.

const fs = require('fs');
const readline = require('readline');

async function analyzeData(filePath) {
    const fileStream = fs.createReadStream(filePath);
    const rl = readline.createInterface({ input: fileStream, crlfDelay: Infinity });

    const stats = {};

    for await (const line of rl) {
        const item = JSON.parse(line);
        item.products.forEach(product => {
            const brand = product.brand || "Unknown";
            const price = product.priceValue || 0;

            if (!stats[brand]) {
                stats[brand] = { count: 0, totalCash: 0 };
            }
            stats[brand].count++;
            stats[brand].totalCash += price;
        });
    }

    const report = Object.keys(stats).map(brand => ({
        Brand: brand,
        ProductCount: stats[brand].count,
        AvgPrice: (stats[brand].totalCash / stats[brand].count).toFixed(2)
    })).sort((a, b) => b.ProductCount - a.ProductCount);

    console.table(report.slice(0, 10)); // Show Top 10
}

analyzeData('your_output_file.jsonl');

Step 5: Interpreting the Data

The analyzer produces a table revealing the power dynamics of the category. Here is a hypothetical example of skincare data:

Brand	Product Count	Avg Price	Share of Shelf
Brand A	45	$12.50	22.5%
Brand B	30	$85.00	15.0%
Brand C	12	$42.00	6.0%

Identifying the Gaps

Pricing Gaps: If Brand B dominates the high-end ($80+) and Brand A dominates the budget tier ($10-$20), but there are few products in the $40-$60 range, you have identified a pricing gap.
Brand Saturation: If the top three brands account for 60% of "New Arrivals," the category is highly consolidated. A newcomer would need a significant marketing budget to compete for visibility.
Assortment Gaps: If 80% of "New Arrivals" are serums but only 2% are cleansers, there is a clear product type gap.

Recommended Approaches & Anti-Bot Considerations

When scraping e-commerce sites, keep these points in mind:

Respect the Server: Even if you can send 100 requests per second, don't. Use a concurrency of 1 or 2 to stay under the radar and prevent server strain.
Use Proxies: Beautylish uses bot protection that often flags data center IPs. The ScrapeOps integration in these scripts helps bypass 403 Forbidden errors by using residential proxies.
Data Cleaning: Scraped data is rarely perfect. Use fallbacks in your code, such as brand || "Unknown", to prevent the analysis script from crashing on malformed entries.

To Wrap Up

By moving from manual browsing to automated extraction, you turn a website into a structured database. This workflow—Scrape, Clean, Analyze—is the foundation of modern e-commerce intelligence.

You now have a system to extract brand and price data, handle large datasets with JSONL, and calculate the metrics needed to find market gaps. To take this further, try running the scraper on a schedule. By comparing Share of Shelf week-over-week, you can see which brands are losing momentum and which newcomers are starting to take over.

Beyond requests.get: Analyzing the Architecture of an AI-Generated Spider

Jonathan D. Fisher — Tue, 10 Mar 2026 16:13:00 +0000

There is a common stigma that AI-generated code is "toy-grade"—fine for a quick script, but too messy for a production pipeline. We often expect to see spaghetti code that lacks error handling, deduplication, or stealth.

However, the reality is shifting. Modern AI-generated scrapers increasingly use sophisticated design patterns that many developers miss on their first pass. We’ve seen this in the Beautylish.com-Scrapers repository, which contains production-ready spiders for both Python and Node.js.

By dissecting the beautylish_scraper_product_data_v1.py script, we can look past simple requests.get calls to see how to implement stealth, robust data pipelines, and intelligent extraction strategies that withstand modern anti-bot measures.

Why requests.get Fails

Modern e-commerce sites like Beautylish present significant hurdles for basic scraping scripts. Fetching a product page using a standard HTTP client usually leads to three major problems:

Dynamic Content: Beautylish uses frontend frameworks like React or Next.js. Much of the product data is hydrated into the DOM via JavaScript after the initial page load. A simple GET request sees only an empty shell.
Anti-Bot Measures: High-traffic retail sites use fingerprinting to detect automated scripts, looking for "headless" browser signatures and non-residential IP ranges.
Data Fragility: Layouts change. If a scraper relies on a single CSS selector for the price, it breaks the moment the UI is updated.

To solve this, the architecture in our repository moves away from simple requests toward a "Browser-First" approach using Playwright and Puppeteer integrated with residential proxies.

Architecture and Configuration

A professional scraper should be maintainable. The Beautylish script follows a clear separation of concerns, splitting logic into three distinct layers:

Configuration: Centralized settings for API keys, retries, and browser timeouts.
Data Pipeline: A dedicated class for handling deduplication and storage.
Extraction Logic: A strategy-based function that tries multiple ways to find data.

The script also focuses on dynamic output. Rather than overwriting files, it uses a timestamping utility to ensure every run is isolated, preventing data corruption and simplifying debugging.

def generate_output_filename() -> str:
    """Generate output filename with current timestamp."""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    return f"beautylish_com_product_page_scraper_data_{timestamp}.jsonl"

The Stealth Layer

To bypass anti-bot protections, the script implements a "Stealth Layer." It uses playwright_stealth to mask common automation signals, such as the navigator.webdriver flag, that websites use to identify bots.

It also integrates ScrapeOps Residential Proxies. Unlike data center IPs, which are easily flagged, residential proxies route traffic through home devices, making it indistinguishable from a standard user.

The architecture initializes the browser context like this:

async def scrape_page(browser: Browser, url: str, pipeline: DataPipeline, retries: int = 3) -> None:
    context = await browser.new_context(
        ignore_https_errors=True,
        viewport={"width": 1920, "height": 1080},
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36..."
    )

    page = await context.new_page()
    await stealth_async(page) 

    # Block unnecessary resources to save bandwidth and speed up load
    async def block_resources(route, request):
        if request.resource_type in ["image", "media", "font"]:
            await route.abort()
        else:
            await route.continue_()

    await page.route("**/*", block_resources)
    await page.goto(url, wait_until="domcontentloaded", timeout=60000)

Blocking images and fonts reduces proxy load while still allowing JavaScript to execute and populate the data.

The DataPipeline Class: Handling Scale

One of the most effective parts of this architecture is the DataPipeline class. Beginners often store scraped data in a list and write it to a JSON file at the end. This is risky: if the script crashes at item 999 of 1,000, you lose everything.

The DataPipeline avoids this by using JSON Lines (JSONL) and atomic writes.

class DataPipeline:
    def __init__(self, jsonl_filename="output.jsonl"):
        self.items_seen = set()
        self.jsonl_filename = jsonl_filename

    def is_duplicate(self, input_data: ScrapedData):
        item_key = input_data.url
        if item_key in self.items_seen:
            logger.warning(f"Duplicate item found: {item_key}. Skipping.")
            return True
        self.items_seen.add(item_key)
        return False

    def add_data(self, scraped_data: ScrapedData):
        if not self.is_duplicate(scraped_data):
            # Append mode ('a') ensures we don't lose data if the script restarts
            with open(self.jsonl_filename, mode="a", encoding="UTF-8") as output_file:
                json_line = json.dumps(asdict(scraped_data), ensure_ascii=False)
                output_file.write(json_line + "\n")
            logger.info(f"Saved item to {self.jsonl_filename}")

This approach provides three main benefits:

Memory Efficiency: Writing line-by-line means the script doesn't keep the entire dataset in RAM.
Resume Capability: If the scraper stops, the .jsonl file contains all data collected up to that moment.
Deduplication: The items_seen set prevents saving the same product twice if the crawler hits a circular link.

Intelligent Extraction and Fallback Strategies

The extract_data function doesn't just look for a CSS class; it uses a multi-tiered strategy to remain resilient against website updates.

Strategy 1: JSON-LD

Most modern e-commerce sites embed JSON-LD (Linked Data) for SEO. This structured JSON object is hidden in a <script> tag and is highly reliable because it follows the Schema.org standard.

json_ld_scripts = await page.locator("script[type='application/ld+json']").all_text_contents()
json_data = None
for script in json_ld_scripts:
    try:
        data = json.loads(script)
        if isinstance(data, dict) and data.get("@type", "").lower() == "product":
            json_data = data
            break
    except Exception:
        continue

Strategy 2: CSS Fallbacks

If JSON-LD is missing or incomplete, the script falls back to DOM scraping.

if not brand:
    brand_el = page.locator(".product-brand").first
    brand = (await brand_el.inner_text()).strip() if await brand_el.count() > 0 else ""

By prioritizing invisible data (JSON-LD) over the visible UI, the scraper survives even if the website theme changes entirely.

Concurrency and Error Handling

The architecture is built on Python's asyncio, allowing the script to handle non-blocking I/O. While one request waits for a proxy response, the CPU processes data from another page.

The script wraps execution in try/except blocks and uses the logging module rather than print statements. This is essential for production, as it allows you to pipe logs to a file or a monitoring service.

async def main():
    tasks = [scrape_page(browser, url, pipeline) for url in urls]
    await asyncio.gather(*tasks) # Run multiple scrapes concurrently

To Wrap Up

This Beautylish scraper demonstrates that AI can implement design patterns that ensure data integrity and stealth at scale.

Key Takeaways:

Stealth is mandatory: Use plugins like playwright-stealth and residential proxies to avoid detection.
JSONL over JSON: Use streamable formats to protect data from crashes and minimize RAM usage.
Extract structure, not style: Prioritize JSON-LD and Schema.org data over brittle CSS selectors.
Deduplicate at the source: Use a DataPipeline class to manage state and prevent duplicate records.

To see these patterns in action, you can clone the full repository:

git clone https://github.com/scraper-bank/Beautylish.com-Scrapers.git

Use this architecture as a template for your next project. It solves the common problems of scraping—blocking, duplicates, and storage—right from the start.

Stop Waiting for Engineers: Build a Competitor Price Monitor in 15 Minutes

Jonathan D. Fisher — Mon, 09 Mar 2026 16:08:00 +0000

The "Data Breadline" is a frustrating place to be. It’s that invisible queue where growth marketers, pricing analysts, and product managers wait for weeks—sometimes months—for engineering to fulfill a "simple" data extraction ticket. In the fast-moving world of e-commerce, waiting three weeks to see a competitor's price drop means you've already lost the sale.

Dynamic pricing isn't just for airlines anymore. Retail giants like Crate & Barrel adjust prices constantly, and to stay competitive, you need that data now.

This guide shows you how to bypass the engineering backlog entirely. By the end, you will have a functional competitor price monitor running on your machine. It will extract product names, prices, and availability from Crate & Barrel into a clean spreadsheet. No coding mastery required—just a bit of confidence with a terminal.

Prerequisites

You’ll need a few basic tools installed. These are standard "set it and forget it" utilities.

Python 3.x: The engine that runs the script. Download it at python.org.
VS Code (Optional): A clean text editor to view your files. Download it at code.visualstudio.com.
A ScrapeOps API Key: This handles proxy rotation and anti-bot bypass so you don't get blocked. You can grab a free key at ScrapeOps.io.

Phase 1: The Setup

We aren't going to write a scraper from scratch. Instead, we’ll use a production-ready template from the ScrapeOps Scraper Bank. This repository contains optimized logic specifically for Crate & Barrel.

First, download the code. You can use Git or simply download the ZIP file from the Crate & Barrel Scraper repository.

Once downloaded, open your terminal (or Command Prompt) and navigate to the project folder. We need to install two essential Python libraries: requests, to send messages to the website, and beautifulsoup4, to read the HTML data.

Run this command:

pip install requests beautifulsoup4

Phase 2: Configuring the Script

Now, let’s point the script in the right direction. We will use the BeautifulSoup implementation because it is fast, lightweight, and perfect for price monitoring.

Locate this file in your folder:
python/BeautifulSoup/product_data/scraper/crateandbarrel_scraper_product_data_v1.py

Open it in your text editor. You only need to change one line to make it work. Look for the API_KEY variable at the top of the file:

import requests
from bs4 import BeautifulSoup

# PASTE YOUR KEY HERE
API_KEY = "YOUR_SCRAPEOPS_API_KEY"

Monitoring Multiple Products

The default script is set up to scrape a single example URL. To turn this into a real monitor, we want to loop through a list of products, such as your Top 50 SKUs.

Replace the execution block at the bottom of your script with this snippet:

if __name__ == "__main__":
    # List the URLs you want to track
    competitor_urls = [
        "https://www.crateandbarrel.com/example-product-1",
        "https://www.crateandbarrel.com/example-product-2",
    ]

    pipeline = DataPipeline(jsonl_filename="competitor_prices.jsonl")

    for url in competitor_urls:
        print(f"Monitoring: {url}")
        # Logic to call extract_data goes here...

Phase 3: Running the Monitor

Go back to your terminal, ensure you are in the directory containing your script, and run:

python crateandbarrel_scraper_product_data_v1.py

You’ll see text scrolling by. These are logs telling you exactly what the script is doing. Look for a message like:
INFO:root:Saved item: [Product Name]

This confirms the script successfully bypassed Crate & Barrel's anti-bot protections using the ScrapeOps proxy and saved the data to a file named output.jsonl.

Phase 4: From JSONL to Excel

The script outputs data in JSONL (JSON Lines) format. Developers use this format because it’s efficient for large datasets, but most analysts prefer a spreadsheet.

If you try to open a .jsonl file in Excel, it will look like a jumbled mess. We can fix this with a small helper script. Create a new file named converter.py in the same folder and paste this code:

import pandas as pd
import json

# This script turns your raw data into a clean spreadsheet
data = []
with open('competitor_prices.jsonl', 'r', encoding='utf-8') as f:
    for line in f:
        data.append(json.loads(line))

df = pd.DataFrame(data)
# We only want the most important columns for our monitor
columns_to_keep = ['name', 'productId', 'price', 'availability', 'url']
df[columns_to_keep].to_csv('price_report.csv', index=False)

print("Success! Open price_report.csv in Excel or Google Sheets.")

Run python converter.py to generate a clean CSV file ready for your weekly pricing meeting.

Phase 5: Scaling & Automation

You’ve built a monitor for one competitor, but you likely need to track others like Pottery Barn or IKEA.

The logic we used—sending a request, parsing the HTML with BeautifulSoup, and saving to JSONL—is the standard blueprint for web scraping. While every site has different "CSS selectors" (the labels for price and name), the underlying infrastructure remains the same.

If you don't want to hunt for selectors on every new site, you can use the ScrapeOps AI Scraper Generator. Provide a URL, and it generates the Python code for you, formatted exactly like the Crate & Barrel script we used.

To Wrap Up

You have officially graduated from the "Data Breadline." By using pre-made open-source scripts and a reliable proxy API, you've built a professional data pipeline in minutes rather than weeks.

Key Takeaways:

Don't reinvent the wheel: Use the Scraper Bank for production-ready templates.
Prevent blocks early: Use ScrapeOps to handle proxy rotation so you don't waste time debugging 403 Forbidden errors.
Format for the user: Always include a conversion step to get data into CSV or Excel for your stakeholders.

Your next step? Set a calendar reminder to run this script every Monday morning, or look into "Cron Jobs" to automate the execution entirely. You now have the tools to make data-driven pricing decisions in real-time.

Tracking Search Rankings & SEO on Depop

Jonathan D. Fisher — Sat, 07 Mar 2026 16:10:00 +0000

Visibility drives sales on Depop. For high-volume sellers and fashion brands, slipping from the first row of search results to the tenth is the difference between a quick sale and a stale listing. Because the Depop algorithm prioritizes fresh, relevant content, your search position changes constantly.

Monitoring these positions manually is tedious, especially if you manage dozens of items across multiple keywords. This guide demonstrates how to build an automated Depop SEO tool using Python and Selenium. We will use the open-source Depop.com-Scrapers repository to extract search data and implement logic to track exactly where your products rank over time.

Understanding Depop’s Search Structure

Before writing code, we need to look at the technical layout of a Depop search page. When you search for "vintage nike sweatshirt," Depop returns a grid of products.

Technically, these results are an ordered list of product objects. A product's rank is its index in that list, plus one to make it human-readable. For example, the first item in the results array has an index of 0 and a rank of 1.

To track rankings reliably, use a unique identifier. Tracking by title is unreliable because sellers often use similar titles or update them for SEO. Instead, use the productId, a unique string assigned by Depop that never changes. The logic follows these steps:

Send a search query to Depop.
Extract the list of product IDs from the results.
Find the index of your TARGET_PRODUCT_ID.
Log the rank.

Step 1: Setting Up the Search Scraper

We’ll use the Selenium implementation from the ScrapeOps repository, as it handles Depop’s dynamic content effectively.

First, clone the repository and install the dependencies:

git clone https://github.com/scraper-bank/Depop.com-Scrapers.git
cd Depop.com-Scrapers/python/selenium
pip install -r requirements.txt

Configuring the ScrapeOps API Key

Depop uses anti-bot measures on their search pages. To avoid blocks or CAPTCHAs, you need proxy rotation. The repository is pre-configured to work with ScrapeOps.

Open product_search/scraper/depop_scraper_product_search_v1.py and find the API_KEY variable. Replace it with your key from the ScrapeOps Dashboard.

# python/selenium/product_search/scraper/depop_scraper_product_search_v1.py
API_KEY = "YOUR_SCRAPEOPS_API_KEY"

This routes your Selenium requests through a residential proxy network, rotating your IP address with every request.

Step 2: Extracting Search Results

The base scraper uses the extract_data function to parse search results into a structured ScrapedData object. This object contains a list of products, each with its own productId, name, and price.

The scraper identifies individual items using CSS selectors:

# Snippet from extract_data in the repository
items = driver.find_elements(By.CSS_SELECTOR, "li.styles_listItem__Uv9lb")

for item in items:
    # Logic to extract href, price, and image
    p_id = href.strip("/").split("/")[-1] if href else ""
    product["productId"] = p_id
    products.append(product)

This provides a clean list of every product visible on the search page.

Step 3: Implementing the Rank Finder Logic

Next, create a wrapper script to import the scraper, perform a search, and locate your item. Create a new file named rank_tracker.py:

import logging
from scraper.depop_scraper_product_search_v1 import get_driver, extract_data

# Configuration
TARGET_PRODUCT_ID = "12345678"  # Replace with your Depop Product ID
KEYWORD = "vintage 90s windbreaker"

def get_product_rank(keyword, target_id):
    driver = get_driver()
    search_url = f"https://www.depop.com/search/?q={keyword.replace(' ', '+')}"

    try:
        driver.get(search_url)
        scraped_result = extract_data(driver, search_url)

        if not scraped_result or not scraped_result.products:
            return -1 # Search failed or no results

        for index, product in enumerate(scraped_result.products):
            if product['productId'] == target_id:
                return index + 1  # Ranks are 1-based

        return 0  # Item not found in the current results
    finally:
        driver.quit()

rank = get_product_rank(KEYWORD, TARGET_PRODUCT_ID)
print(f"Your item is currently ranked: {rank if rank > 0 else 'Not Found'}")

How it works

get_driver(): Initializes the undetected-chromedriver with ScrapeOps proxy settings.
extract_data(): Scrapes the page and returns the product list.
enumerate(): Loops through the list to find the matching productId.

Step 4: Handling Pagination and Depth

Depop uses infinite scrolling. If your item isn't in the first 30 results, a basic scrape will miss it. You need to tell Selenium to scroll down to increase the search depth.

Modify the logic to include a scroll loop:

import time

def scroll_to_depth(driver, max_items=100):
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        items = driver.find_elements(By.CSS_SELECTOR, "li.styles_listItem__Uv9lb")
        if len(items) >= max_items:
            break

        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)  # Wait for products to load

        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break # Reached the end of results
        last_height = new_height

Integrating this before calling extract_data ensures you check the top 100 items. Checking beyond 200 items is rarely necessary, as click-through rates drop significantly after the first few pages.

Step 5: Automating History

A single rank check is just a snapshot. To see if SEO efforts like refreshing listings or changing tags work, you need historical data. You can store findings in a CSV file:

import csv
from datetime import datetime

def log_rank(keyword, product_id, rank):
    file_exists = False
    try:
        with open('rank_history.csv', 'r') as f:
            file_exists = True
    except FileNotFoundError:
        pass

    with open('rank_history.csv', 'a', newline='') as f:
        writer = csv.writer(f)
        if not file_exists:
            writer.writerow(['Date', 'Keyword', 'ProductID', 'Rank'])

        writer.writerow([
            datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
            keyword,
            product_id,
            rank
        ])

# Usage
current_rank = get_product_rank(KEYWORD, TARGET_PRODUCT_ID)
log_rank(KEYWORD, TARGET_PRODUCT_ID, current_rank)

Running this script daily via a Cron job creates a dataset that reveals ranking volatility. If a rank drops from 5 to 50 overnight, it’s a clear signal to update the listing or check for new competitors.

Recommended Approaches to Avoid Bans

When building a rank tracker, the main risk is getting your IP flagged for excessive search requests.

Use Proxy Rotation: Search pages are more heavily guarded than product pages. Use ScrapeOps proxy rotation to distribute the load.
Control Frequency: Don't check your rank every 10 minutes. Depop's search index doesn't update that fast. Once or twice a day is sufficient.
Randomize Delays: If you are checking multiple keywords, add time.sleep(random.uniform(5, 15)) between queries to mimic human browsing.
Headless Mode: The repository uses --headless=new by default. This is faster and uses fewer resources. Ensure your User-Agent is set correctly to avoid detection.

To Wrap Up

A custom Depop SEO tool replaces guesswork with data. By combining ScrapeOps scrapers with a rank-finding script, you can detect ranking drops before they impact sales, test which keywords perform best, and monitor competitor movements.

You can expand this by turning your TARGET_PRODUCT_ID into a dictionary to loop through all your top items. You could even integrate a messaging service like Slack or Discord to send an alert whenever an item drops out of the top 10.

For the full source code or alternative implementations using Playwright or Node.js, visit the Depop.com-Scrapers repository.

How to Track Competitor Pricing on StockX: A Low-Code Guide

Jonathan D. Fisher — Fri, 06 Mar 2026 05:10:32 +0000

Agility is the lifeblood of growth teams in the sneaker and collectible markets. However, that agility often dies when hours are spent manually refreshing StockX pages to monitor competitor bids, price volatility, and market trends. If you are still manually checking multiple SKUs to spot arbitrage opportunities, you are already behind the curve.

The solution isn't to hire an expensive engineering team to build a custom monitoring platform. You can use open-source tools to automate the heavy lifting. This guide shows you how to use a pre-built Python script to extract real-time StockX product data and transform it into an actionable spreadsheet for price tracking and market analysis.

Prerequisites & Setup (Low-Code Friendly)

This workflow is accessible even if you aren't a full-time developer. You only need a few basic tools to get started.

1. Install Python

Ensure you have Python installed on your machine. Check this by opening your terminal (or Command Prompt) and typing:

python --version

If you don't have it, download the latest version from python.org.

2. Download the Scrapers

Use the open-source Stockx.com-Scrapers repository. You can either clone it via Git or download it as a ZIP file.

git clone https://github.com/scraper-bank/Stockx.com-Scrapers.git
cd Stockx.com-Scrapers

3. Get a ScrapeOps API Key

StockX employs sophisticated anti-bot protections that block standard requests. Use ScrapeOps to handle proxy rotation and browser fingerprinting automatically.

Sign up for a free account at ScrapeOps.
Copy your API Key from the dashboard. You’ll need this to bypass StockX's blocks.

Step 1: Configuring the Python Scraper

The repository contains several implementations. To track specific products, use the Playwright version. Playwright is a browser automation tool that acts like a real human user, making it much harder for StockX to detect.

Navigate to this directory:
python/playwright/product_data/

Open stockx_scraper_product_data_v1.py in any text editor like VS Code or Notepad++. You only need to modify two sections.

1. Insert Your API Key

Locate the API_KEY variable at the top of the script and paste your ScrapeOps key:

# Inside stockx_scraper_product_data_v1.py
API_KEY = "YOUR-SCRAPEOPS-API-KEY-HERE"

2. Define Your Target URLs

At the bottom of the script, define which products you want to track. You can pass a single URL or a list of competitor products:

# Example of targeting specific products for tracking
urls = [
    "https://stockx.com/nike-dunk-low-retro-white-black-2021",
    "https://stockx.com/adidas-yeezy-slide-pure-re-release-2021"
]

Step 2: Running the Scraper

Now you can install the necessary libraries and execute the script. In your terminal, run:

# Install the required Python libraries
pip install playwright playwright-stealth beautifulsoup4

# Install the browser binaries for Playwright
playwright install

Run the scraper:

python python/playwright/product_data/scraper/stockx_scraper_product_data_v1.py

As the script runs, it opens a stealth browser instance, navigates to the products, and extracts the data. Once finished, a new file will appear in your folder with a name like stockx_com_product_page_scraper_data_20240522.jsonl.

Step 3: From JSONL to Actionable Spreadsheet

The output file is in JSONL format (JSON Lines). While this is great for developers, growth teams usually need this data in Excel or Google Sheets.

Method 1: Importing into Microsoft Excel

Open Excel and go to the Data tab.
Select Get Data > From File > From JSON.
Select your .jsonl file.
The Power Query Editor will open. Click To Table in the top left.
Click the "Expand" icon (two arrows) in the column header to choose the fields you want, such as name, price, and market_data.

Method 2: Importing into Google Sheets

The easiest way is to use a free online "JSON to CSV" converter. Upload your .jsonl file, download the CSV, and open it in Google Sheets.

You now have a clean table containing:

Name: The specific SKU or sneaker name.
Price: The current lowest ask.
Market Data: Last sale price and historical volatility.

Step 4: Using the Data for Growth Strategies

With the data in hand, you can move from guessing to strategizing. Here are three ways to use your new tracker:

1. Spotting Arbitrage Opportunities

Compare the lowest_ask on StockX against prices on platforms like eBay or GOAT. If the StockX price is significantly lower than the market average elsewhere, it's a buy signal.

2. Optimized Bidding

If the last_sale was $200 but the lowest_ask is $240, setting a bid at $205 puts you at the front of the line without overpaying. Automated tracking allows you to adjust these bids daily as the market shifts.

3. Risk Management

Monitor the volatility metric. If a sneaker shows high price swings over a short period, it might be a "hype" play that is too risky to hold long-term. Stable, low-volatility items are better for consistent, slower growth.

Product	StockX Lowest Ask	Last Sale	Strategy
Nike Dunk Low	$180	$175	Bid $176
Yeezy Slide	$110	$115	Buy Now (Arbitrage)
Jordan 1 High	$350	$310	Pass (Overpriced)

Common Issues & Troubleshooting

Empty Output File: This usually happens if the StockX URL is incorrect or the site layout has changed. Double-check your URLs in the script.
403 Forbidden Errors: This means StockX has detected the bot. Ensure your ScrapeOps API key is active and you are using the playwright-stealth plugin included in the repo.
Missing Market Data: Some new or unreleased items don't have "Last Sale" data yet. The script will return null or 0 for these fields.

Summary

Moving away from manual checks and adopting a low-code automation strategy gives your team a competitive advantage. You can transform hours of tedious browsing into a data-rich spreadsheet in minutes.

Key Takeaways:

Automate efficiently: Use the Stockx.com-Scrapers repo to save on engineering costs.
Avoid blocks: Use ScrapeOps to ensure your scraper bypasses anti-bot measures.
Focus on Action: Use the extracted JSONL data to fuel arbitrage and bidding strategies in Excel.

To go further, try running the product_search script in the repository to discover trending products before they hit your main tracking list.

Stop Breaking Your Pipeline: Using Schema Validation to Clean Scraped Zappos Data

Jonathan D. Fisher — Thu, 05 Mar 2026 05:30:45 +0000

Web scraping is often described as the process of turning the "wild west" of the internet into structured data. However, anyone who has managed a production data pipeline knows that "structured" is a relative term. HTML is inherently chaotic. A price might be a string like "$120.00" on one page, "120" on another, or missing entirely on a third.

If your scraper simply dumps these raw strings into a database, your downstream applications—whether they are price trackers, AI models, or analytics dashboards—will eventually crash. The solution is Schema-First Extraction: an approach to scraping that enforces strict data types at the moment of collection.

We can explore how to implement this using the Zappos.com-Scrapers repository as a blueprint. This guide looks at using Python dataclasses and Node.js helper functions to ensure your data is clean, consistent, and pipeline-ready.

Prerequisites

To follow the code examples, you should have:

A basic understanding of Python (specifically dataclasses) or Node.js.
The Zappos.com-Scrapers repository cloned locally.
Playwright installed in your environment.

Phase 1: The Contract – Analyzing the `ScrapedData` Dataclass

In a high-quality scraper, the data structure isn't an afterthought. It is the contract that the scraper must fulfill. In the Zappos repository, this contract is defined using Python’s @dataclass.

The implementation in python/playwright/product_data/scraper/zappos_scraper_product_data_v1.py looks like this:

from dataclasses import dataclass, field
from typing import Dict, Any, Optional, List

@dataclass
class ScrapedData:
    aggregateRating: Dict[str, Any] = field(default_factory=dict)
    availability: str = "in_stock"
    brand: str = ""
    category: str = ""
    currency: str = "USD"
    description: str = ""
    features: List[str] = field(default_factory=list)
    images: List[Dict[str, Any]] = field(default_factory=list)
    name: str = ""
    preDiscountPrice: Optional[float] = None
    price: float = 0.0
    productId: str = ""
    url: str = ""

Why Explicit Types Matter

By using ScrapedData, we move away from generic, unpredictable dictionaries.

price: float = 0.0: This ensures that if a price is missing, we get a consistent numeric fallback rather than a NoneType error during a calculation.
List[str]: Explicitly typing lists tells your IDE and your pipeline exactly what to expect.
Optional[float]: This is vital for fields like preDiscountPrice. Not every item is on sale. Optional allows us to distinguish between a price of zero and a price that simply doesn't exist.

Phase 2: Enforcing Types – The Extraction Logic

Defining a schema is only half the battle. The second half is the "enforcer" logic, the code that bridges the gap between a messy HTML string and your strict types.

In the Zappos scraper, helper functions act as validators. Consider this parse_price logic:

def parse_price(price_str: str) -> float:
    if not price_str: 
        return 0.0
    # Remove commas for large numbers like 1,200.00
    cleaned = price_str.replace(",", "")
    # Use Regex to extract only the numeric parts, ignoring currency symbols
    match = re.search(r'[\d,]+\.?\d*', cleaned)
    if match:
        try:
            return float(match.group())
        except ValueError:
            return 0.0
    return 0.0

The Strategy

This function handles three common "dirty data" scenarios:

Currency Symbols: It strips $ or € using regex.
Formatting: It removes thousands-separator commas.
Missing Data: It returns a default 0.0 instead of raising an exception that would crash the entire scraping loop.

When building a scraper, use these cleaning utilities rather than accepting raw inner text.

Phase 3: Handling Nulls and Defaults Safely

One of the most frequent causes of pipeline failure is the "None" (null) value. If your database expects an array but receives null, the import fails.

The Zappos repository uses Python's field(default_factory=list) to solve this. This ensures that even if no features or images are found on the page, the resulting JSON contains [] instead of null.

# From python/playwright/product_data/scraper/zappos_scraper_product_data_v1.py

features: List[str] = field(default_factory=list)
images: List[Dict[str, Any]] = field(default_factory=list)

By using a default_factory, every instance of ScrapedData starts with a fresh, empty list. This maintains structural integrity. Your downstream code can always run for image in data['images'] without checking if the key exists or if it's null.

Phase 4: Node.js Comparison – Type Safety in JavaScript

While JavaScript lacks native dataclasses, the Zappos repository achieves the same discipline in its Node.js implementation.

The Node scraper uses a functional approach to mimic type safety in node/playwright/product_data/scraper/zappos_scraper_product_data_v1.js:

const parsePrice = (priceText) => {
    if (!priceText) return 0.0;
    // Strip commas and extract the float
    const match = priceText.replace(/,/g, '').match(/([\d,]+\.?\d*)/);
    return match ? parseFloat(match[1]) : 0.0;
};

// Usage inside the extraction logic
const outputData = {
    price: parsePrice($('.price-selector').text()),
    availability: "in_stock", // Default value
    features: [] // Initialized as empty array
};

Python offers better developer tooling through type hints, while Node.js requires more runtime discipline. However, by using a centralized parsePrice function, the Zappos repository ensures that the final JSON output is identical regardless of the language used.

Phase 5: Prompting for Strict Code

This repository was generated using the ScrapeOps AI Scraper Generator. When using AI to build scrapers, don't just ask it to "scrape Zappos." To get production-grade results, your prompt should include the schema requirements.

Example of a Schema-First Prompt:

"Extract product data from Zappos. Use the following JSON schema. Constraints: Prices must be floats (remove currency symbols), lists like 'features' must always return an empty array if no data is found, and 'availability' must be mapped to the string 'in_stock' or 'out_of_stock'."

Providing the schema as the primary requirement forces the generator to create the helper functions (parse_price, clean_float) shown in the Zappos repository. This moves the complexity from the data processing stage to the extraction stage, where it belongs.

To Wrap Up

Strict schema validation is the difference between a script and a data product. By enforcing types at the edge of your network—inside the scraper itself—you prevent technical debt from accumulating in your databases.

The Zappos.com-Scrapers repository demonstrates these principles:

Use Dataclasses to define a clear contract for your data.
Implement Helper Functions like parse_price to handle HTML inconsistencies.
Default to Empty Collections instead of nulls to keep pipelines running smoothly.
Ensure Language Agnosticism so your parsing logic produces identical JSON whether you use Python or Node.js.

If you're starting a new project, use the ScrapeOps AI Scraper Generator to build the base extraction logic, then add Pydantic for production-grade data validation.

DEV Community: Jonathan D. Fisher

Market Gap Analysis with Beautylish Data: A Node.js Guide

Prerequisites & Setup

Step 1: The Category Scraper Strategy

Step 2: Handling Pagination for Complete Data

Step 3: Running the Extraction

Step 4: Building the Market Gap Analyzer

Step 5: Interpreting the Data

Identifying the Gaps

Recommended Approaches & Anti-Bot Considerations

To Wrap Up

Beyond requests.get: Analyzing the Architecture of an AI-Generated Spider

Why requests.get Fails

Architecture and Configuration

The Stealth Layer

The DataPipeline Class: Handling Scale

Intelligent Extraction and Fallback Strategies

Strategy 1: JSON-LD

Strategy 2: CSS Fallbacks

Concurrency and Error Handling

To Wrap Up

Stop Waiting for Engineers: Build a Competitor Price Monitor in 15 Minutes

Prerequisites

Phase 1: The Setup

Phase 2: Configuring the Script

Monitoring Multiple Products

Phase 3: Running the Monitor

Phase 4: From JSONL to Excel

Phase 5: Scaling & Automation

To Wrap Up

Tracking Search Rankings & SEO on Depop

Understanding Depop’s Search Structure

Step 1: Setting Up the Search Scraper

Configuring the ScrapeOps API Key

Step 2: Extracting Search Results

Step 3: Implementing the Rank Finder Logic

How it works

Step 4: Handling Pagination and Depth

Step 5: Automating History

Recommended Approaches to Avoid Bans

To Wrap Up

How to Track Competitor Pricing on StockX: A Low-Code Guide

Prerequisites & Setup (Low-Code Friendly)

1. Install Python

2. Download the Scrapers

3. Get a ScrapeOps API Key

Step 1: Configuring the Python Scraper

1. Insert Your API Key

2. Define Your Target URLs

Step 2: Running the Scraper

Step 3: From JSONL to Actionable Spreadsheet

Method 1: Importing into Microsoft Excel

Method 2: Importing into Google Sheets

Step 4: Using the Data for Growth Strategies

1. Spotting Arbitrage Opportunities

2. Optimized Bidding

3. Risk Management

Common Issues & Troubleshooting

Summary

Stop Breaking Your Pipeline: Using Schema Validation to Clean Scraped Zappos Data

Prerequisites

Phase 1: The Contract – Analyzing the ScrapedData Dataclass

Why Explicit Types Matter

Phase 2: Enforcing Types – The Extraction Logic

The Strategy

Phase 3: Handling Nulls and Defaults Safely

Phase 4: Node.js Comparison – Type Safety in JavaScript

Phase 5: Prompting for Strict Code

Example of a Schema-First Prompt:

To Wrap Up

Phase 1: The Contract – Analyzing the `ScrapedData` Dataclass