Jonathan D. Fisher

Posted on Mar 14

Market Gap Analysis with Beautylish Data: A Node.js Guide

#webscraping #webdev #devops #node

In the competitive world of e-commerce, knowing what your competitors sell is only half the battle. To gain a real edge, you need to understand their Share of Shelf, which is the percentage of total inventory a specific brand occupies within a category. Manually counting products across dozens of pages is a recipe for burnout, but you can automate this process in minutes using Node.js.

This guide walks through building a data pipeline to scrape Beautylish category data for market gap analysis. We will identify which brands dominate "New Arrivals," calculate average price points, and spot potential gaps where new products could thrive.

Prerequisites & Setup

You’ll need Node.js installed on your machine. We will use the Beautylish Scrapers repository as a foundation.

Clone the repository and navigate to the Cheerio-Axios category scraper:

git clone https://github.com/scraper-bank/Beautylish.com-Scrapers.git
cd Beautylish.com-Scrapers/node/cheerio-axios/product_category
npm install

You also need a ScrapeOps API Key. Beautylish employs anti-bot measures that often block standard Axios requests. ScrapeOps provides a proxy wrapper that handles retries and rotates IP addresses to keep your scraper running. You can get a free API key here.

Step 1: The Category Scraper Strategy

To perform a market analysis, we target "Category" or "Browse" pages. Unlike a single product page, a category page contains a grid of items where each card offers a snapshot of data, specifically the brand name, product name, and price.

The beautylish_scraper_product_category_v1.js script in the repository uses Cheerio for fast HTML parsing and Axios for requests. The core extraction logic lives in the extractData function:

// node/cheerio-axios/product_category/scraper/beautylish_scraper_product_category_v1.js

const items = $(".product-list-item, .product-grid-item, .product-card");

items.each((i, el) => {
    const s = $(el);
    const product = {};

    // Target the Brand and Name
    product.brand = s.find(".product-brand, .brand-name").first().text().trim();
    product.name = s.find(".product-name, .product-title").first().text().trim();

    // Extract Price and Currency
    const priceText = s.find(".product-price, .price").first().text().trim();
    if (priceText) {
        product.priceValue = parseFloat(priceText.replace(/[^0-9.]/g, ""));
        product.currency = detectCurrency(priceText);
    }
    products.push(product);
});

This approach is efficient because we extract data for 20-50 products in one request instead of visiting every individual product URL.

Step 2: Handling Pagination for Complete Data

Analyzing just the first page of "New Arrivals" gives a biased view of the most recent stock. To get the full picture, we have to traverse the pagination.

Beautylish uses a URL parameter like ?page=2. We can modify the logic to loop through these pages until no more products are found. While the base script handles a single URL, you can wrap it in a simple loop:

async function scrapeAllPages(baseUrl) {
    let currentPage = 1;
    let hasMore = true;

    while (hasMore) {
        const url = `${baseUrl}&page=${currentPage}`;
        console.log(`Scraping Page ${currentPage}...`);

        const result = await scrapePage(url, pipeline);

        // Stop the loop if the page returns no products
        if (!result || result.products.length === 0) {
            hasMore = false;
        } else {
            currentPage++;
        }
    }
}

By setting maxConcurrency: 1 in the script's CONFIG, we avoid overwhelming the servers. This is a better approach for staying undetected and practicing ethical scraping.

Step 3: Running the Extraction

Update beautylish_scraper_product_category_v1.js with your API_KEY and the target URL. For this analysis, we will target the "New Arrivals" section.

const API_KEY = 'YOUR_SCRAPEOPS_API_KEY';
const urls = ['https://www.beautylish.com/shop/browse?tag=new-arrivals'];

Run the script from your terminal:

node scraper/beautylish_scraper_product_category_v1.js

The script generates a .jsonl file. JSONL (JSON Lines) is the preferred format here because it allows you to stream data line-by-line during analysis, which uses much less memory than loading a giant JSON array.

Step 4: Building the Market Gap Analyzer

Now we process the raw data. Create a new script called analyze_trends.js to calculate two metrics:

Share of Shelf: The product count for each brand.
Average Price: The typical price point for those products.

const fs = require('fs');
const readline = require('readline');

async function analyzeData(filePath) {
    const fileStream = fs.createReadStream(filePath);
    const rl = readline.createInterface({ input: fileStream, crlfDelay: Infinity });

    const stats = {};

    for await (const line of rl) {
        const item = JSON.parse(line);
        item.products.forEach(product => {
            const brand = product.brand || "Unknown";
            const price = product.priceValue || 0;

            if (!stats[brand]) {
                stats[brand] = { count: 0, totalCash: 0 };
            }
            stats[brand].count++;
            stats[brand].totalCash += price;
        });
    }

    const report = Object.keys(stats).map(brand => ({
        Brand: brand,
        ProductCount: stats[brand].count,
        AvgPrice: (stats[brand].totalCash / stats[brand].count).toFixed(2)
    })).sort((a, b) => b.ProductCount - a.ProductCount);

    console.table(report.slice(0, 10)); // Show Top 10
}

analyzeData('your_output_file.jsonl');

Step 5: Interpreting the Data

The analyzer produces a table revealing the power dynamics of the category. Here is a hypothetical example of skincare data:

Brand	Product Count	Avg Price	Share of Shelf
Brand A	45	$12.50	22.5%
Brand B	30	$85.00	15.0%
Brand C	12	$42.00	6.0%

Identifying the Gaps

Pricing Gaps: If Brand B dominates the high-end ($80+) and Brand A dominates the budget tier ($10-$20), but there are few products in the $40-$60 range, you have identified a pricing gap.
Brand Saturation: If the top three brands account for 60% of "New Arrivals," the category is highly consolidated. A newcomer would need a significant marketing budget to compete for visibility.
Assortment Gaps: If 80% of "New Arrivals" are serums but only 2% are cleansers, there is a clear product type gap.

Recommended Approaches & Anti-Bot Considerations

When scraping e-commerce sites, keep these points in mind:

Respect the Server: Even if you can send 100 requests per second, don't. Use a concurrency of 1 or 2 to stay under the radar and prevent server strain.
Use Proxies: Beautylish uses bot protection that often flags data center IPs. The ScrapeOps integration in these scripts helps bypass 403 Forbidden errors by using residential proxies.
Data Cleaning: Scraped data is rarely perfect. Use fallbacks in your code, such as brand || "Unknown", to prevent the analysis script from crashing on malformed entries.

To Wrap Up

By moving from manual browsing to automated extraction, you turn a website into a structured database. This workflow—Scrape, Clean, Analyze—is the foundation of modern e-commerce intelligence.

You now have a system to extract brand and price data, handle large datasets with JSONL, and calculate the metrics needed to find market gaps. To take this further, try running the scraper on a schedule. By comparing Share of Shelf week-over-week, you can see which brands are losing momentum and which newcomers are starting to take over.

DEV Community