<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jonathan D. Fisher</title>
    <description>The latest articles on DEV Community by Jonathan D. Fisher (@jonathanfisher).</description>
    <link>https://dev.to/jonathanfisher</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3795908%2F495af344-231e-4070-9dd6-0228ac5dd86d.jpg</url>
      <title>DEV Community: Jonathan D. Fisher</title>
      <link>https://dev.to/jonathanfisher</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jonathanfisher"/>
    <language>en</language>
    <item>
      <title>Market Gap Analysis with Beautylish Data: A Node.js Guide</title>
      <dc:creator>Jonathan D. Fisher</dc:creator>
      <pubDate>Sat, 14 Mar 2026 15:55:00 +0000</pubDate>
      <link>https://dev.to/jonathanfisher/market-gap-analysis-with-beautylish-data-a-nodejs-guide-4jn2</link>
      <guid>https://dev.to/jonathanfisher/market-gap-analysis-with-beautylish-data-a-nodejs-guide-4jn2</guid>
      <description>&lt;p&gt;In the competitive world of e-commerce, knowing what your competitors sell is only half the battle. To gain a real edge, you need to understand their &lt;strong&gt;Share of Shelf&lt;/strong&gt;, which is the percentage of total inventory a specific brand occupies within a category. Manually counting products across dozens of pages is a recipe for burnout, but you can automate this process in minutes using Node.js.&lt;/p&gt;

&lt;p&gt;This guide walks through building a data pipeline to scrape Beautylish category data for market gap analysis. We will identify which brands dominate "New Arrivals," calculate average price points, and spot potential gaps where new products could thrive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites &amp;amp; Setup
&lt;/h2&gt;

&lt;p&gt;You’ll need Node.js installed on your machine. We will use the &lt;a href="https://github.com/scraper-bank/Beautylish.com-Scrapers.git" rel="noopener noreferrer"&gt;Beautylish Scrapers repository&lt;/a&gt; as a foundation.&lt;/p&gt;

&lt;p&gt;Clone the repository and navigate to the Cheerio-Axios category scraper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/scraper-bank/Beautylish.com-Scrapers.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Beautylish.com-Scrapers/node/cheerio-axios/product_category
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You also need a &lt;strong&gt;ScrapeOps API Key&lt;/strong&gt;. Beautylish employs anti-bot measures that often block standard Axios requests. ScrapeOps provides a proxy wrapper that handles retries and rotates IP addresses to keep your scraper running. You can get a &lt;a href="https://scrapeops.io/app/register/ai-scraper" rel="noopener noreferrer"&gt;free API key here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: The Category Scraper Strategy
&lt;/h2&gt;

&lt;p&gt;To perform a market analysis, we target "Category" or "Browse" pages. Unlike a single product page, a category page contains a grid of items where each card offers a snapshot of data, specifically the brand name, product name, and price.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;beautylish_scraper_product_category_v1.js&lt;/code&gt; script in the repository uses &lt;strong&gt;Cheerio&lt;/strong&gt; for fast HTML parsing and &lt;strong&gt;Axios&lt;/strong&gt; for requests. The core extraction logic lives in the &lt;code&gt;extractData&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// node/cheerio-axios/product_category/scraper/beautylish_scraper_product_category_v1.js&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.product-list-item, .product-grid-item, .product-card&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

    &lt;span class="c1"&gt;// Target the Brand and Name&lt;/span&gt;
    &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.product-brand, .brand-name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.product-name, .product-title&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Extract Price and Currency&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;priceText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.product-price, .price&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;priceText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;priceText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;0-9.&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detectCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;priceText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach is efficient because we extract data for 20-50 products in one request instead of visiting every individual product URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Handling Pagination for Complete Data
&lt;/h2&gt;

&lt;p&gt;Analyzing just the first page of "New Arrivals" gives a biased view of the most recent stock. To get the full picture, we have to traverse the pagination.&lt;/p&gt;

&lt;p&gt;Beautylish uses a URL parameter like &lt;code&gt;?page=2&lt;/code&gt;. We can modify the logic to loop through these pages until no more products are found. While the base script handles a single URL, you can wrap it in a simple loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scrapeAllPages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;currentPage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;hasMore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hasMore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;page=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;currentPage&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Scraping Page &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;currentPage&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;scrapePage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// Stop the loop if the page returns no products&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;hasMore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;currentPage&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By setting &lt;code&gt;maxConcurrency: 1&lt;/code&gt; in the script's &lt;code&gt;CONFIG&lt;/code&gt;, we avoid overwhelming the servers. This is a better approach for staying undetected and practicing ethical scraping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Running the Extraction
&lt;/h2&gt;

&lt;p&gt;Update &lt;code&gt;beautylish_scraper_product_category_v1.js&lt;/code&gt; with your &lt;code&gt;API_KEY&lt;/code&gt; and the target URL. For this analysis, we will target the "New Arrivals" section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;YOUR_SCRAPEOPS_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.beautylish.com/shop/browse?tag=new-arrivals&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the script from your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node scraper/beautylish_scraper_product_category_v1.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script generates a &lt;code&gt;.jsonl&lt;/code&gt; file. &lt;strong&gt;JSONL&lt;/strong&gt; (JSON Lines) is the preferred format here because it allows you to stream data line-by-line during analysis, which uses much less memory than loading a giant JSON array.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Building the Market Gap Analyzer
&lt;/h2&gt;

&lt;p&gt;Now we process the raw data. Create a new script called &lt;code&gt;analyze_trends.js&lt;/code&gt; to calculate two metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Share of Shelf&lt;/strong&gt;: The product count for each brand.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Average Price&lt;/strong&gt;: The typical price point for those products.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;readline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;readline&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;analyzeData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createReadStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;readline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createInterface&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fileStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;crlfDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;rl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;brand&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceValue&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;totalCash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;totalCash&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;price&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;Brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ProductCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;AvgPrice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;totalCash&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ProductCount&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ProductCount&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="c1"&gt;// Show Top 10&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;analyzeData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your_output_file.jsonl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Interpreting the Data
&lt;/h2&gt;

&lt;p&gt;The analyzer produces a table revealing the power dynamics of the category. Here is a hypothetical example of skincare data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Brand&lt;/th&gt;
&lt;th&gt;Product Count&lt;/th&gt;
&lt;th&gt;Avg Price&lt;/th&gt;
&lt;th&gt;Share of Shelf&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Brand A&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;$12.50&lt;/td&gt;
&lt;td&gt;22.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brand B&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;$85.00&lt;/td&gt;
&lt;td&gt;15.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brand C&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;$42.00&lt;/td&gt;
&lt;td&gt;6.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Identifying the Gaps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Pricing Gaps&lt;/strong&gt;: If Brand B dominates the high-end ($80+) and Brand A dominates the budget tier ($10-$20), but there are few products in the $40-$60 range, you have identified a pricing gap.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Brand Saturation&lt;/strong&gt;: If the top three brands account for 60% of "New Arrivals," the category is highly consolidated. A newcomer would need a significant marketing budget to compete for visibility.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Assortment Gaps&lt;/strong&gt;: If 80% of "New Arrivals" are serums but only 2% are cleansers, there is a clear product type gap.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Recommended Approaches &amp;amp; Anti-Bot Considerations
&lt;/h2&gt;

&lt;p&gt;When scraping e-commerce sites, keep these points in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Respect the Server&lt;/strong&gt;: Even if you can send 100 requests per second, don't. Use a concurrency of 1 or 2 to stay under the radar and prevent server strain.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Use Proxies&lt;/strong&gt;: Beautylish uses bot protection that often flags data center IPs. The ScrapeOps integration in these scripts helps bypass 403 Forbidden errors by using residential proxies.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Data Cleaning&lt;/strong&gt;: Scraped data is rarely perfect. Use fallbacks in your code, such as &lt;code&gt;brand || "Unknown"&lt;/code&gt;, to prevent the analysis script from crashing on malformed entries.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  To Wrap Up
&lt;/h2&gt;

&lt;p&gt;By moving from manual browsing to automated extraction, you turn a website into a structured database. This workflow—&lt;strong&gt;Scrape, Clean, Analyze&lt;/strong&gt;—is the foundation of modern e-commerce intelligence.&lt;/p&gt;

&lt;p&gt;You now have a system to extract brand and price data, handle large datasets with JSONL, and calculate the metrics needed to find market gaps. To take this further, try running the scraper on a schedule. By comparing Share of Shelf week-over-week, you can see which brands are losing momentum and which newcomers are starting to take over.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>webdev</category>
      <category>devops</category>
      <category>node</category>
    </item>
    <item>
      <title>Beyond requests.get: Analyzing the Architecture of an AI-Generated Spider</title>
      <dc:creator>Jonathan D. Fisher</dc:creator>
      <pubDate>Tue, 10 Mar 2026 16:13:00 +0000</pubDate>
      <link>https://dev.to/jonathanfisher/beyond-requestsget-analyzing-the-architecture-of-an-ai-generated-spider-11da</link>
      <guid>https://dev.to/jonathanfisher/beyond-requestsget-analyzing-the-architecture-of-an-ai-generated-spider-11da</guid>
      <description>&lt;p&gt;There is a common stigma that AI-generated code is "toy-grade"—fine for a quick script, but too messy for a production pipeline. We often expect to see spaghetti code that lacks error handling, deduplication, or stealth.&lt;/p&gt;

&lt;p&gt;However, the reality is shifting. Modern AI-generated scrapers increasingly use sophisticated design patterns that many developers miss on their first pass. We’ve seen this in the &lt;a href="https://github.com/scraper-bank/Beautylish.com-Scrapers.git" rel="noopener noreferrer"&gt;Beautylish.com-Scrapers&lt;/a&gt; repository, which contains production-ready spiders for both Python and Node.js.&lt;/p&gt;

&lt;p&gt;By dissecting the &lt;code&gt;beautylish_scraper_product_data_v1.py&lt;/code&gt; script, we can look past simple &lt;code&gt;requests.get&lt;/code&gt; calls to see how to implement stealth, robust data pipelines, and intelligent extraction strategies that withstand modern anti-bot measures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why requests.get Fails
&lt;/h2&gt;

&lt;p&gt;Modern e-commerce sites like Beautylish present significant hurdles for basic scraping scripts. Fetching a product page using a standard HTTP client usually leads to three major problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Dynamic Content&lt;/strong&gt;: Beautylish uses frontend frameworks like React or Next.js. Much of the product data is hydrated into the DOM via JavaScript after the initial page load. A simple &lt;code&gt;GET&lt;/code&gt; request sees only an empty shell.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Anti-Bot Measures&lt;/strong&gt;: High-traffic retail sites use fingerprinting to detect automated scripts, looking for "headless" browser signatures and non-residential IP ranges.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Data Fragility&lt;/strong&gt;: Layouts change. If a scraper relies on a single CSS selector for the price, it breaks the moment the UI is updated.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To solve this, the architecture in our repository moves away from simple requests toward a "Browser-First" approach using Playwright and Puppeteer integrated with residential proxies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture and Configuration
&lt;/h2&gt;

&lt;p&gt;A professional scraper should be maintainable. The Beautylish script follows a clear separation of concerns, splitting logic into three distinct layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Configuration&lt;/strong&gt;: Centralized settings for API keys, retries, and browser timeouts.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Data Pipeline&lt;/strong&gt;: A dedicated class for handling deduplication and storage.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Extraction Logic&lt;/strong&gt;: A strategy-based function that tries multiple ways to find data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The script also focuses on dynamic output. Rather than overwriting files, it uses a timestamping utility to ensure every run is isolated, preventing data corruption and simplifying debugging.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_output_filename&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate output filename with current timestamp.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d_%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;beautylish_com_product_page_scraper_data_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Stealth Layer
&lt;/h2&gt;

&lt;p&gt;To bypass anti-bot protections, the script implements a "Stealth Layer." It uses &lt;code&gt;playwright_stealth&lt;/code&gt; to mask common automation signals, such as the &lt;code&gt;navigator.webdriver&lt;/code&gt; flag, that websites use to identify bots.&lt;/p&gt;

&lt;p&gt;It also integrates ScrapeOps Residential Proxies. Unlike data center IPs, which are easily flagged, residential proxies route traffic through home devices, making it indistinguishable from a standard user.&lt;/p&gt;

&lt;p&gt;The architecture initializes the browser context like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataPipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;ignore_https_errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;viewport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1920&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1080&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;stealth_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

    &lt;span class="c1"&gt;# Block unnecessary resources to save bandwidth and speed up load
&lt;/span&gt;    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;block_resources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resource_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;media&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;font&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;continue_&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block_resources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domcontentloaded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Blocking images and fonts reduces proxy load while still allowing JavaScript to execute and populate the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DataPipeline Class: Handling Scale
&lt;/h2&gt;

&lt;p&gt;One of the most effective parts of this architecture is the &lt;code&gt;DataPipeline&lt;/code&gt; class. Beginners often store scraped data in a list and write it to a JSON file at the end. This is risky: if the script crashes at item 999 of 1,000, you lose everything.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;DataPipeline&lt;/code&gt; avoids this by using JSON Lines (JSONL) and atomic writes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataPipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonl_filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items_seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl_filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jsonl_filename&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ScrapedData&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;item_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item_key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items_seen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Duplicate item found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Skipping.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items_seen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scraped_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ScrapedData&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scraped_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Append mode ('a') ensures we don't lose data if the script restarts
&lt;/span&gt;            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UTF-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;json_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scraped_data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;output_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_line&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saved item to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl_filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach provides three main benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Memory Efficiency&lt;/strong&gt;: Writing line-by-line means the script doesn't keep the entire dataset in RAM.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Resume Capability&lt;/strong&gt;: If the scraper stops, the &lt;code&gt;.jsonl&lt;/code&gt; file contains all data collected up to that moment.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Deduplication&lt;/strong&gt;: The &lt;code&gt;items_seen&lt;/code&gt; set prevents saving the same product twice if the crawler hits a circular link.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Intelligent Extraction and Fallback Strategies
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;extract_data&lt;/code&gt; function doesn't just look for a CSS class; it uses a multi-tiered strategy to remain resilient against website updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 1: JSON-LD
&lt;/h3&gt;

&lt;p&gt;Most modern e-commerce sites embed JSON-LD (Linked Data) for SEO. This structured JSON object is hidden in a &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag and is highly reliable because it follows the &lt;a href="https://schema.org/Product" rel="noopener noreferrer"&gt;Schema.org&lt;/a&gt; standard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;json_ld_scripts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;script[type=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/ld+json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all_text_contents&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;json_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;script&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;json_ld_scripts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;script&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;json_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strategy 2: CSS Fallbacks
&lt;/h3&gt;

&lt;p&gt;If JSON-LD is missing or incomplete, the script falls back to DOM scraping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;brand_el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-brand&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;
    &lt;span class="n"&gt;brand&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;brand_el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inner_text&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;brand_el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By prioritizing invisible data (JSON-LD) over the visible UI, the scraper survives even if the website theme changes entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concurrency and Error Handling
&lt;/h2&gt;

&lt;p&gt;The architecture is built on Python's &lt;code&gt;asyncio&lt;/code&gt;, allowing the script to handle non-blocking I/O. While one request waits for a proxy response, the CPU processes data from another page.&lt;/p&gt;

&lt;p&gt;The script wraps execution in &lt;code&gt;try/except&lt;/code&gt; blocks and uses the &lt;code&gt;logging&lt;/code&gt; module rather than &lt;code&gt;print&lt;/code&gt; statements. This is essential for production, as it allows you to pipe logs to a file or a monitoring service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;scrape_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Run multiple scrapes concurrently
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To Wrap Up
&lt;/h2&gt;

&lt;p&gt;This Beautylish scraper demonstrates that AI can implement design patterns that ensure data integrity and stealth at scale. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Stealth is mandatory&lt;/strong&gt;: Use plugins like &lt;code&gt;playwright-stealth&lt;/code&gt; and residential proxies to avoid detection.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;JSONL over JSON&lt;/strong&gt;: Use streamable formats to protect data from crashes and minimize RAM usage.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Extract structure, not style&lt;/strong&gt;: Prioritize JSON-LD and Schema.org data over brittle CSS selectors.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Deduplicate at the source&lt;/strong&gt;: Use a DataPipeline class to manage state and prevent duplicate records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To see these patterns in action, you can clone the full repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/scraper-bank/Beautylish.com-Scrapers.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use this architecture as a template for your next project. It solves the common problems of scraping—blocking, duplicates, and storage—right from the start.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>node</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>Stop Waiting for Engineers: Build a Competitor Price Monitor in 15 Minutes</title>
      <dc:creator>Jonathan D. Fisher</dc:creator>
      <pubDate>Mon, 09 Mar 2026 16:08:00 +0000</pubDate>
      <link>https://dev.to/jonathanfisher/stop-waiting-for-engineers-build-a-competitor-price-monitor-in-15-minutes-2noa</link>
      <guid>https://dev.to/jonathanfisher/stop-waiting-for-engineers-build-a-competitor-price-monitor-in-15-minutes-2noa</guid>
      <description>&lt;p&gt;The "Data Breadline" is a frustrating place to be. It’s that invisible queue where growth marketers, pricing analysts, and product managers wait for weeks—sometimes months—for engineering to fulfill a "simple" data extraction ticket. In the fast-moving world of e-commerce, waiting three weeks to see a competitor's price drop means you've already lost the sale.&lt;/p&gt;

&lt;p&gt;Dynamic pricing isn't just for airlines anymore. Retail giants like Crate &amp;amp; Barrel adjust prices constantly, and to stay competitive, you need that data now. &lt;/p&gt;

&lt;p&gt;This guide shows you how to bypass the engineering backlog entirely. By the end, you will have a functional competitor price monitor running on your machine. It will extract product names, prices, and availability from Crate &amp;amp; Barrel into a clean spreadsheet. No coding mastery required—just a bit of confidence with a terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You’ll need a few basic tools installed. These are standard "set it and forget it" utilities.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Python 3.x&lt;/strong&gt;: The engine that runs the script. Download it at &lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;python.org&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;VS Code (Optional)&lt;/strong&gt;: A clean text editor to view your files. Download it at &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer"&gt;code.visualstudio.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;A ScrapeOps API Key&lt;/strong&gt;: This handles proxy rotation and anti-bot bypass so you don't get blocked. You can grab a free key at &lt;a href="https://scrapeops.io/app/register/ai-scraper" rel="noopener noreferrer"&gt;ScrapeOps.io&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 1: The Setup
&lt;/h2&gt;

&lt;p&gt;We aren't going to write a scraper from scratch. Instead, we’ll use a production-ready template from the ScrapeOps &lt;strong&gt;Scraper Bank&lt;/strong&gt;. This repository contains optimized logic specifically for Crate &amp;amp; Barrel.&lt;/p&gt;

&lt;p&gt;First, download the code. You can use Git or simply download the ZIP file from the &lt;a href="https://github.com/scraper-bank/Crateandbarrel.com-Scrapers.git" rel="noopener noreferrer"&gt;Crate &amp;amp; Barrel Scraper repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once downloaded, open your terminal (or Command Prompt) and navigate to the project folder. We need to install two essential Python libraries: &lt;code&gt;requests&lt;/code&gt;, to send messages to the website, and &lt;code&gt;beautifulsoup4&lt;/code&gt;, to read the HTML data.&lt;/p&gt;

&lt;p&gt;Run this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;requests beautifulsoup4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 2: Configuring the Script
&lt;/h2&gt;

&lt;p&gt;Now, let’s point the script in the right direction. We will use the BeautifulSoup implementation because it is fast, lightweight, and perfect for price monitoring.&lt;/p&gt;

&lt;p&gt;Locate this file in your folder:&lt;br&gt;
&lt;code&gt;python/BeautifulSoup/product_data/scraper/crateandbarrel_scraper_product_data_v1.py&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Open it in your text editor. You only need to change one line to make it work. Look for the &lt;code&gt;API_KEY&lt;/code&gt; variable at the top of the file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;

&lt;span class="c1"&gt;# PASTE YOUR KEY HERE
&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_SCRAPEOPS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Monitoring Multiple Products
&lt;/h3&gt;

&lt;p&gt;The default script is set up to scrape a single example URL. To turn this into a real monitor, we want to loop through a list of products, such as your Top 50 SKUs. &lt;/p&gt;

&lt;p&gt;Replace the execution block at the bottom of your script with this snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# List the URLs you want to track
&lt;/span&gt;    &lt;span class="n"&gt;competitor_urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.crateandbarrel.com/example-product-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.crateandbarrel.com/example-product-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonl_filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;competitor_prices.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;competitor_urls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Monitoring: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Logic to call extract_data goes here...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 3: Running the Monitor
&lt;/h2&gt;

&lt;p&gt;Go back to your terminal, ensure you are in the directory containing your script, and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python crateandbarrel_scraper_product_data_v1.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see text scrolling by. These are logs telling you exactly what the script is doing. Look for a message like:&lt;br&gt;
&lt;code&gt;INFO:root:Saved item: [Product Name]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This confirms the script successfully bypassed Crate &amp;amp; Barrel's anti-bot protections using the ScrapeOps proxy and saved the data to a file named &lt;code&gt;output.jsonl&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Phase 4: From JSONL to Excel
&lt;/h2&gt;

&lt;p&gt;The script outputs data in &lt;strong&gt;JSONL (JSON Lines)&lt;/strong&gt; format. Developers use this format because it’s efficient for large datasets, but most analysts prefer a spreadsheet. &lt;/p&gt;

&lt;p&gt;If you try to open a &lt;code&gt;.jsonl&lt;/code&gt; file in Excel, it will look like a jumbled mess. We can fix this with a small helper script. Create a new file named &lt;code&gt;converter.py&lt;/code&gt; in the same folder and paste this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# This script turns your raw data into a clean spreadsheet
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;competitor_prices.jsonl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# We only want the most important columns for our monitor
&lt;/span&gt;&lt;span class="n"&gt;columns_to_keep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;productId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;availability&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;columns_to_keep&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price_report.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Success! Open price_report.csv in Excel or Google Sheets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;python converter.py&lt;/code&gt; to generate a clean CSV file ready for your weekly pricing meeting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 5: Scaling &amp;amp; Automation
&lt;/h2&gt;

&lt;p&gt;You’ve built a monitor for one competitor, but you likely need to track others like Pottery Barn or IKEA. &lt;/p&gt;

&lt;p&gt;The logic we used—sending a request, parsing the HTML with BeautifulSoup, and saving to JSONL—is the standard blueprint for web scraping. While every site has different "CSS selectors" (the labels for price and name), the underlying infrastructure remains the same.&lt;/p&gt;

&lt;p&gt;If you don't want to hunt for selectors on every new site, you can use the &lt;a href="https://scrapeops.io/app/register/ai-scraper-generator" rel="noopener noreferrer"&gt;ScrapeOps AI Scraper Generator&lt;/a&gt;. Provide a URL, and it generates the Python code for you, formatted exactly like the Crate &amp;amp; Barrel script we used.&lt;/p&gt;

&lt;h2&gt;
  
  
  To Wrap Up
&lt;/h2&gt;

&lt;p&gt;You have officially graduated from the "Data Breadline." By using pre-made open-source scripts and a reliable proxy API, you've built a professional data pipeline in minutes rather than weeks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Don't reinvent the wheel:&lt;/strong&gt; Use the Scraper Bank for production-ready templates.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Prevent blocks early:&lt;/strong&gt; Use ScrapeOps to handle proxy rotation so you don't waste time debugging 403 Forbidden errors.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Format for the user:&lt;/strong&gt; Always include a conversion step to get data into CSV or Excel for your stakeholders.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your next step? Set a calendar reminder to run this script every Monday morning, or look into "Cron Jobs" to automate the execution entirely. You now have the tools to make data-driven pricing decisions in real-time.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>webscraping</category>
      <category>devops</category>
    </item>
    <item>
      <title>Tracking Search Rankings &amp; SEO on Depop</title>
      <dc:creator>Jonathan D. Fisher</dc:creator>
      <pubDate>Sat, 07 Mar 2026 16:10:00 +0000</pubDate>
      <link>https://dev.to/jonathanfisher/tracking-search-rankings-seo-on-depop-2o59</link>
      <guid>https://dev.to/jonathanfisher/tracking-search-rankings-seo-on-depop-2o59</guid>
      <description>&lt;p&gt;Visibility drives sales on Depop. For high-volume sellers and fashion brands, slipping from the first row of search results to the tenth is the difference between a quick sale and a stale listing. Because the Depop algorithm prioritizes fresh, relevant content, your search position changes constantly.&lt;/p&gt;

&lt;p&gt;Monitoring these positions manually is tedious, especially if you manage dozens of items across multiple keywords. This guide demonstrates how to build an automated Depop SEO tool using Python and Selenium. We will use the open-source &lt;a href="https://github.com/scraper-bank/Depop.com-Scrapers.git" rel="noopener noreferrer"&gt;Depop.com-Scrapers&lt;/a&gt; repository to extract search data and implement logic to track exactly where your products rank over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Depop’s Search Structure
&lt;/h2&gt;

&lt;p&gt;Before writing code, we need to look at the technical layout of a Depop search page. When you search for "vintage nike sweatshirt," Depop returns a grid of products. &lt;/p&gt;

&lt;p&gt;Technically, these results are an ordered list of product objects. A product's rank is its index in that list, plus one to make it human-readable. For example, the first item in the results array has an index of 0 and a rank of 1.&lt;/p&gt;

&lt;p&gt;To track rankings reliably, use a unique identifier. Tracking by title is unreliable because sellers often use similar titles or update them for SEO. Instead, use the &lt;code&gt;productId&lt;/code&gt;, a unique string assigned by Depop that never changes. The logic follows these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send a search query to Depop.&lt;/li&gt;
&lt;li&gt;Extract the list of product IDs from the results.&lt;/li&gt;
&lt;li&gt;Find the index of your &lt;code&gt;TARGET_PRODUCT_ID&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Log the rank.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1: Setting Up the Search Scraper
&lt;/h2&gt;

&lt;p&gt;We’ll use the Selenium implementation from the ScrapeOps repository, as it handles Depop’s dynamic content effectively. &lt;/p&gt;

&lt;p&gt;First, clone the repository and install the dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/scraper-bank/Depop.com-Scrapers.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Depop.com-Scrapers/python/selenium
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuring the ScrapeOps API Key
&lt;/h3&gt;

&lt;p&gt;Depop uses anti-bot measures on their search pages. To avoid blocks or CAPTCHAs, you need proxy rotation. The repository is pre-configured to work with ScrapeOps.&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;product_search/scraper/depop_scraper_product_search_v1.py&lt;/code&gt; and find the &lt;code&gt;API_KEY&lt;/code&gt; variable. Replace it with your key from the &lt;a href="https://scrapeops.io/app/register/ai-scraper-generator" rel="noopener noreferrer"&gt;ScrapeOps Dashboard&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# python/selenium/product_search/scraper/depop_scraper_product_search_v1.py
&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_SCRAPEOPS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This routes your Selenium requests through a residential proxy network, rotating your IP address with every request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Extracting Search Results
&lt;/h2&gt;

&lt;p&gt;The base scraper uses the &lt;code&gt;extract_data&lt;/code&gt; function to parse search results into a structured &lt;code&gt;ScrapedData&lt;/code&gt; object. This object contains a list of products, each with its own &lt;code&gt;productId&lt;/code&gt;, &lt;code&gt;name&lt;/code&gt;, and &lt;code&gt;price&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The scraper identifies individual items using CSS selectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Snippet from extract_data in the repository
&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CSS_SELECTOR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;li.styles_listItem__Uv9lb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Logic to extract href, price, and image
&lt;/span&gt;    &lt;span class="n"&gt;p_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;href&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;href&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;productId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_id&lt;/span&gt;
    &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This provides a clean list of every product visible on the search page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Implementing the Rank Finder Logic
&lt;/h2&gt;

&lt;p&gt;Next, create a wrapper script to import the scraper, perform a search, and locate your item. Create a new file named &lt;code&gt;rank_tracker.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scraper.depop_scraper_product_search_v1&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_driver&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extract_data&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;TARGET_PRODUCT_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12345678&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with your Depop Product ID
&lt;/span&gt;&lt;span class="n"&gt;KEYWORD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vintage 90s windbreaker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_product_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_driver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;search_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.depop.com/search/?q=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;scraped_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;scraped_result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;scraped_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# Search failed or no results
&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scraped_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;productId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# Ranks are 1-based
&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Item not found in the current results
&lt;/span&gt;    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;quit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_product_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;KEYWORD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TARGET_PRODUCT_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your item is currently ranked: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Not Found&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;get_driver()&lt;/code&gt;&lt;/strong&gt;: Initializes the undetected-chromedriver with ScrapeOps proxy settings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;extract_data()&lt;/code&gt;&lt;/strong&gt;: Scrapes the page and returns the product list.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;enumerate()&lt;/code&gt;&lt;/strong&gt;: Loops through the list to find the matching &lt;code&gt;productId&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Handling Pagination and Depth
&lt;/h2&gt;

&lt;p&gt;Depop uses infinite scrolling. If your item isn't in the first 30 results, a basic scrape will miss it. You need to tell Selenium to scroll down to increase the search depth.&lt;/p&gt;

&lt;p&gt;Modify the logic to include a scroll loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scroll_to_depth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_items&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;last_height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;return document.body.scrollHeight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CSS_SELECTOR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;li.styles_listItem__Uv9lb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;max_items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;window.scrollTo(0, document.body.scrollHeight);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Wait for products to load
&lt;/span&gt;
        &lt;span class="n"&gt;new_height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;return document.body.scrollHeight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;new_height&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;last_height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="c1"&gt;# Reached the end of results
&lt;/span&gt;        &lt;span class="n"&gt;last_height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_height&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integrating this before calling &lt;code&gt;extract_data&lt;/code&gt; ensures you check the top 100 items. Checking beyond 200 items is rarely necessary, as click-through rates drop significantly after the first few pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Automating History
&lt;/h2&gt;

&lt;p&gt;A single rank check is just a snapshot. To see if SEO efforts like refreshing listings or changing tags work, you need historical data. You can store findings in a CSV file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;file_exists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rank_history.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;file_exists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rank_history.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;file_exists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Keyword&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ProductID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;rank&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="n"&gt;current_rank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_product_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;KEYWORD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TARGET_PRODUCT_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;log_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;KEYWORD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TARGET_PRODUCT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this script daily via a Cron job creates a dataset that reveals ranking volatility. If a rank drops from 5 to 50 overnight, it’s a clear signal to update the listing or check for new competitors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Approaches to Avoid Bans
&lt;/h2&gt;

&lt;p&gt;When building a rank tracker, the main risk is getting your IP flagged for excessive search requests.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use Proxy Rotation&lt;/strong&gt;: Search pages are more heavily guarded than product pages. Use ScrapeOps proxy rotation to distribute the load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control Frequency&lt;/strong&gt;: Don't check your rank every 10 minutes. Depop's search index doesn't update that fast. Once or twice a day is sufficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Randomize Delays&lt;/strong&gt;: If you are checking multiple keywords, add &lt;code&gt;time.sleep(random.uniform(5, 15))&lt;/code&gt; between queries to mimic human browsing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headless Mode&lt;/strong&gt;: The repository uses &lt;code&gt;--headless=new&lt;/code&gt; by default. This is faster and uses fewer resources. Ensure your User-Agent is set correctly to avoid detection.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  To Wrap Up
&lt;/h2&gt;

&lt;p&gt;A custom Depop SEO tool replaces guesswork with data. By combining ScrapeOps scrapers with a rank-finding script, you can detect ranking drops before they impact sales, test which keywords perform best, and monitor competitor movements.&lt;/p&gt;

&lt;p&gt;You can expand this by turning your &lt;code&gt;TARGET_PRODUCT_ID&lt;/code&gt; into a dictionary to loop through all your top items. You could even integrate a messaging service like Slack or Discord to send an alert whenever an item drops out of the top 10.&lt;/p&gt;

&lt;p&gt;For the full source code or alternative implementations using Playwright or Node.js, visit the &lt;a href="https://github.com/scraper-bank/Depop.com-Scrapers" rel="noopener noreferrer"&gt;Depop.com-Scrapers repository&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>seo</category>
      <category>webscraping</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to Track Competitor Pricing on StockX: A Low-Code Guide</title>
      <dc:creator>Jonathan D. Fisher</dc:creator>
      <pubDate>Fri, 06 Mar 2026 05:10:32 +0000</pubDate>
      <link>https://dev.to/jonathanfisher/how-to-track-competitor-pricing-on-stockx-a-low-code-guide-lkc</link>
      <guid>https://dev.to/jonathanfisher/how-to-track-competitor-pricing-on-stockx-a-low-code-guide-lkc</guid>
      <description>&lt;p&gt;Agility is the lifeblood of growth teams in the sneaker and collectible markets. However, that agility often dies when hours are spent manually refreshing StockX pages to monitor competitor bids, price volatility, and market trends. If you are still manually checking multiple SKUs to spot arbitrage opportunities, you are already behind the curve.&lt;/p&gt;

&lt;p&gt;The solution isn't to hire an expensive engineering team to build a custom monitoring platform. You can use open-source tools to automate the heavy lifting. This guide shows you how to use a pre-built Python script to extract real-time StockX product data and transform it into an actionable spreadsheet for price tracking and market analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites &amp;amp; Setup (Low-Code Friendly)
&lt;/h2&gt;

&lt;p&gt;This workflow is accessible even if you aren't a full-time developer. You only need a few basic tools to get started.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Install Python
&lt;/h3&gt;

&lt;p&gt;Ensure you have Python installed on your machine. Check this by opening your terminal (or Command Prompt) and typing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't have it, download the latest version from &lt;a href="https://python.org" rel="noopener noreferrer"&gt;python.org&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Download the Scrapers
&lt;/h3&gt;

&lt;p&gt;Use the open-source &lt;a href="https://github.com/scraper-bank/Stockx.com-Scrapers.git" rel="noopener noreferrer"&gt;Stockx.com-Scrapers&lt;/a&gt; repository. You can either clone it via Git or download it as a ZIP file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/scraper-bank/Stockx.com-Scrapers.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Stockx.com-Scrapers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Get a ScrapeOps API Key
&lt;/h3&gt;

&lt;p&gt;StockX employs sophisticated anti-bot protections that block standard requests. Use &lt;strong&gt;ScrapeOps&lt;/strong&gt; to handle proxy rotation and browser fingerprinting automatically.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Sign up for a free account at &lt;a href="https://scrapeops.io/app/register/ai-scraper" rel="noopener noreferrer"&gt;ScrapeOps&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;  Copy your &lt;strong&gt;API Key&lt;/strong&gt; from the dashboard. You’ll need this to bypass StockX's blocks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Configuring the Python Scraper
&lt;/h2&gt;

&lt;p&gt;The repository contains several implementations. To track specific products, use the &lt;strong&gt;Playwright&lt;/strong&gt; version. Playwright is a browser automation tool that acts like a real human user, making it much harder for StockX to detect.&lt;/p&gt;

&lt;p&gt;Navigate to this directory:&lt;br&gt;
&lt;code&gt;python/playwright/product_data/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;stockx_scraper_product_data_v1.py&lt;/code&gt; in any text editor like VS Code or Notepad++. You only need to modify two sections.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Insert Your API Key
&lt;/h3&gt;

&lt;p&gt;Locate the &lt;code&gt;API_KEY&lt;/code&gt; variable at the top of the script and paste your ScrapeOps key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Inside stockx_scraper_product_data_v1.py
&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR-SCRAPEOPS-API-KEY-HERE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Define Your Target URLs
&lt;/h3&gt;

&lt;p&gt;At the bottom of the script, define which products you want to track. You can pass a single URL or a list of competitor products:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example of targeting specific products for tracking
&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://stockx.com/nike-dunk-low-retro-white-black-2021&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://stockx.com/adidas-yeezy-slide-pure-re-release-2021&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Running the Scraper
&lt;/h2&gt;

&lt;p&gt;Now you can install the necessary libraries and execute the script. In your terminal, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the required Python libraries&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;playwright playwright-stealth beautifulsoup4

&lt;span class="c"&gt;# Install the browser binaries for Playwright&lt;/span&gt;
playwright &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the scraper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python python/playwright/product_data/scraper/stockx_scraper_product_data_v1.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the script runs, it opens a stealth browser instance, navigates to the products, and extracts the data. Once finished, a new file will appear in your folder with a name like &lt;code&gt;stockx_com_product_page_scraper_data_20240522.jsonl&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: From JSONL to Actionable Spreadsheet
&lt;/h2&gt;

&lt;p&gt;The output file is in JSONL format (JSON Lines). While this is great for developers, growth teams usually need this data in Excel or Google Sheets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 1: Importing into Microsoft Excel
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Open Excel and go to the &lt;strong&gt;Data&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt; Select &lt;strong&gt;Get Data&lt;/strong&gt; &amp;gt; &lt;strong&gt;From File&lt;/strong&gt; &amp;gt; &lt;strong&gt;From JSON&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Select your &lt;code&gt;.jsonl&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt; The Power Query Editor will open. Click &lt;strong&gt;To Table&lt;/strong&gt; in the top left.&lt;/li&gt;
&lt;li&gt; Click the "Expand" icon (two arrows) in the column header to choose the fields you want, such as &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;price&lt;/code&gt;, and &lt;code&gt;market_data&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Method 2: Importing into Google Sheets
&lt;/h3&gt;

&lt;p&gt;The easiest way is to use a free online "JSON to CSV" converter. Upload your &lt;code&gt;.jsonl&lt;/code&gt; file, download the CSV, and open it in Google Sheets.&lt;/p&gt;

&lt;p&gt;You now have a clean table containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Name&lt;/strong&gt;: The specific SKU or sneaker name.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Price&lt;/strong&gt;: The current lowest ask.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Market Data&lt;/strong&gt;: Last sale price and historical volatility.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Using the Data for Growth Strategies
&lt;/h2&gt;

&lt;p&gt;With the data in hand, you can move from guessing to strategizing. Here are three ways to use your new tracker:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Spotting Arbitrage Opportunities
&lt;/h3&gt;

&lt;p&gt;Compare the &lt;code&gt;lowest_ask&lt;/code&gt; on StockX against prices on platforms like eBay or GOAT. If the StockX price is significantly lower than the market average elsewhere, it's a buy signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Optimized Bidding
&lt;/h3&gt;

&lt;p&gt;If the &lt;code&gt;last_sale&lt;/code&gt; was $200 but the &lt;code&gt;lowest_ask&lt;/code&gt; is $240, setting a bid at $205 puts you at the front of the line without overpaying. Automated tracking allows you to adjust these bids daily as the market shifts.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Risk Management
&lt;/h3&gt;

&lt;p&gt;Monitor the &lt;code&gt;volatility&lt;/code&gt; metric. If a sneaker shows high price swings over a short period, it might be a "hype" play that is too risky to hold long-term. Stable, low-volatility items are better for consistent, slower growth.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;StockX Lowest Ask&lt;/th&gt;
&lt;th&gt;Last Sale&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nike Dunk Low&lt;/td&gt;
&lt;td&gt;$180&lt;/td&gt;
&lt;td&gt;$175&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Bid $176&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yeezy Slide&lt;/td&gt;
&lt;td&gt;$110&lt;/td&gt;
&lt;td&gt;$115&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Buy Now (Arbitrage)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jordan 1 High&lt;/td&gt;
&lt;td&gt;$350&lt;/td&gt;
&lt;td&gt;$310&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Pass (Overpriced)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Common Issues &amp;amp; Troubleshooting
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Empty Output File&lt;/strong&gt;: This usually happens if the StockX URL is incorrect or the site layout has changed. Double-check your URLs in the script.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;403 Forbidden Errors&lt;/strong&gt;: This means StockX has detected the bot. Ensure your ScrapeOps API key is active and you are using the &lt;code&gt;playwright-stealth&lt;/code&gt; plugin included in the repo.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Missing Market Data&lt;/strong&gt;: Some new or unreleased items don't have "Last Sale" data yet. The script will return &lt;code&gt;null&lt;/code&gt; or &lt;code&gt;0&lt;/code&gt; for these fields.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Moving away from manual checks and adopting a low-code automation strategy gives your team a competitive advantage. You can transform hours of tedious browsing into a data-rich spreadsheet in minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Automate efficiently&lt;/strong&gt;: Use the &lt;code&gt;Stockx.com-Scrapers&lt;/code&gt; repo to save on engineering costs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Avoid blocks&lt;/strong&gt;: Use ScrapeOps to ensure your scraper bypasses anti-bot measures.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Focus on Action&lt;/strong&gt;: Use the extracted JSONL data to fuel arbitrage and bidding strategies in Excel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To go further, try running the &lt;code&gt;product_search&lt;/code&gt; script in the repository to discover trending products before they hit your main tracking list.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>devops</category>
      <category>lowcode</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Stop Breaking Your Pipeline: Using Schema Validation to Clean Scraped Zappos Data</title>
      <dc:creator>Jonathan D. Fisher</dc:creator>
      <pubDate>Thu, 05 Mar 2026 05:30:45 +0000</pubDate>
      <link>https://dev.to/jonathanfisher/stop-breaking-your-pipeline-using-schema-validation-to-clean-scraped-zappos-data-53fa</link>
      <guid>https://dev.to/jonathanfisher/stop-breaking-your-pipeline-using-schema-validation-to-clean-scraped-zappos-data-53fa</guid>
      <description>&lt;p&gt;Web scraping is often described as the process of turning the "wild west" of the internet into structured data. However, anyone who has managed a production data pipeline knows that "structured" is a relative term. HTML is inherently chaotic. A price might be a string like &lt;code&gt;"$120.00"&lt;/code&gt; on one page, &lt;code&gt;"120"&lt;/code&gt; on another, or missing entirely on a third.&lt;/p&gt;

&lt;p&gt;If your scraper simply dumps these raw strings into a database, your downstream applications—whether they are price trackers, AI models, or analytics dashboards—will eventually crash. The solution is &lt;strong&gt;Schema-First Extraction&lt;/strong&gt;: an approach to scraping that enforces strict data types at the moment of collection.&lt;/p&gt;

&lt;p&gt;We can explore how to implement this using the &lt;a href="https://github.com/scraper-bank/Zappos.com-Scrapers.git" rel="noopener noreferrer"&gt;Zappos.com-Scrapers&lt;/a&gt; repository as a blueprint. This guide looks at using Python &lt;code&gt;dataclasses&lt;/code&gt; and Node.js helper functions to ensure your data is clean, consistent, and pipeline-ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To follow the code examples, you should have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A basic understanding of Python (specifically &lt;code&gt;dataclasses&lt;/code&gt;) or Node.js.&lt;/li&gt;
&lt;li&gt;  The &lt;a href="https://github.com/scraper-bank/Zappos.com-Scrapers.git" rel="noopener noreferrer"&gt;Zappos.com-Scrapers&lt;/a&gt; repository cloned locally.&lt;/li&gt;
&lt;li&gt;  Playwright installed in your environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 1: The Contract – Analyzing the &lt;code&gt;ScrapedData&lt;/code&gt; Dataclass
&lt;/h2&gt;

&lt;p&gt;In a high-quality scraper, the data structure isn't an afterthought. It is the contract that the scraper must fulfill. In the Zappos repository, this contract is defined using Python’s &lt;code&gt;@dataclass&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The implementation in &lt;code&gt;python/playwright/product_data/scraper/zappos_scraper_product_data_v1.py&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ScrapedData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;aggregateRating&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;availability&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in_stock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;preDiscountPrice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;productId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Explicit Types Matter
&lt;/h3&gt;

&lt;p&gt;By using &lt;code&gt;ScrapedData&lt;/code&gt;, we move away from generic, unpredictable dictionaries. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;price: float = 0.0&lt;/code&gt;&lt;/strong&gt;: This ensures that if a price is missing, we get a consistent numeric fallback rather than a &lt;code&gt;NoneType&lt;/code&gt; error during a calculation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;List[str]&lt;/code&gt;&lt;/strong&gt;: Explicitly typing lists tells your IDE and your pipeline exactly what to expect.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;Optional[float]&lt;/code&gt;&lt;/strong&gt;: This is vital for fields like &lt;code&gt;preDiscountPrice&lt;/code&gt;. Not every item is on sale. &lt;code&gt;Optional&lt;/code&gt; allows us to distinguish between a price of zero and a price that simply doesn't exist.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 2: Enforcing Types – The Extraction Logic
&lt;/h2&gt;

&lt;p&gt;Defining a schema is only half the battle. The second half is the "enforcer" logic, the code that bridges the gap between a messy HTML string and your strict types. &lt;/p&gt;

&lt;p&gt;In the Zappos scraper, helper functions act as validators. Consider this &lt;code&gt;parse_price&lt;/code&gt; logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;price_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Remove commas for large numbers like 1,200.00
&lt;/span&gt;    &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price_str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Use Regex to extract only the numeric parts, ignoring currency symbols
&lt;/span&gt;    &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[\d,]+\.?\d*&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Strategy
&lt;/h3&gt;

&lt;p&gt;This function handles three common "dirty data" scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Currency Symbols&lt;/strong&gt;: It strips &lt;code&gt;$&lt;/code&gt; or &lt;code&gt;€&lt;/code&gt; using regex.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Formatting&lt;/strong&gt;: It removes thousands-separator commas.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Missing Data&lt;/strong&gt;: It returns a default &lt;code&gt;0.0&lt;/code&gt; instead of raising an exception that would crash the entire scraping loop.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When building a scraper, use these cleaning utilities rather than accepting raw inner text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 3: Handling Nulls and Defaults Safely
&lt;/h2&gt;

&lt;p&gt;One of the most frequent causes of pipeline failure is the "None" (null) value. If your database expects an array but receives &lt;code&gt;null&lt;/code&gt;, the import fails. &lt;/p&gt;

&lt;p&gt;The Zappos repository uses Python's &lt;code&gt;field(default_factory=list)&lt;/code&gt; to solve this. This ensures that even if no features or images are found on the page, the resulting JSON contains &lt;code&gt;[]&lt;/code&gt; instead of &lt;code&gt;null&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From python/playwright/product_data/scraper/zappos_scraper_product_data_v1.py
&lt;/span&gt;
&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By using a &lt;code&gt;default_factory&lt;/code&gt;, every instance of &lt;code&gt;ScrapedData&lt;/code&gt; starts with a fresh, empty list. This maintains structural integrity. Your downstream code can always run &lt;code&gt;for image in data['images']&lt;/code&gt; without checking if the key exists or if it's null.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 4: Node.js Comparison – Type Safety in JavaScript
&lt;/h2&gt;

&lt;p&gt;While JavaScript lacks native &lt;code&gt;dataclasses&lt;/code&gt;, the Zappos repository achieves the same discipline in its Node.js implementation. &lt;/p&gt;

&lt;p&gt;The Node scraper uses a functional approach to mimic type safety in &lt;code&gt;node/playwright/product_data/scraper/zappos_scraper_product_data_v1.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsePrice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;priceText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;priceText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// Strip commas and extract the float&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;priceText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/,/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;([\d&lt;/span&gt;&lt;span class="sr"&gt;,&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.?\d&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Usage inside the extraction logic&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;parsePrice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.price-selector&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="na"&gt;availability&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;in_stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Default value&lt;/span&gt;
    &lt;span class="na"&gt;features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="c1"&gt;// Initialized as empty array&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python offers better developer tooling through type hints, while Node.js requires more runtime discipline. However, by using a centralized &lt;code&gt;parsePrice&lt;/code&gt; function, the Zappos repository ensures that the final JSON output is identical regardless of the language used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 5: Prompting for Strict Code
&lt;/h2&gt;

&lt;p&gt;This repository was generated using the &lt;strong&gt;ScrapeOps AI Scraper Generator&lt;/strong&gt;. When using AI to build scrapers, don't just ask it to "scrape Zappos." To get production-grade results, your prompt should include the schema requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example of a Schema-First Prompt:
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Extract product data from Zappos. Use the following JSON schema. Constraints: Prices must be floats (remove currency symbols), lists like 'features' must always return an empty array if no data is found, and 'availability' must be mapped to the string 'in_stock' or 'out_of_stock'."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Providing the schema as the primary requirement forces the generator to create the helper functions (&lt;code&gt;parse_price&lt;/code&gt;, &lt;code&gt;clean_float&lt;/code&gt;) shown in the Zappos repository. This moves the complexity from the data processing stage to the extraction stage, where it belongs.&lt;/p&gt;

&lt;h2&gt;
  
  
  To Wrap Up
&lt;/h2&gt;

&lt;p&gt;Strict schema validation is the difference between a script and a data product. By enforcing types at the edge of your network—inside the scraper itself—you prevent technical debt from accumulating in your databases.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/scraper-bank/Zappos.com-Scrapers.git" rel="noopener noreferrer"&gt;Zappos.com-Scrapers&lt;/a&gt; repository demonstrates these principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Use Dataclasses&lt;/strong&gt; to define a clear contract for your data.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Implement Helper Functions&lt;/strong&gt; like &lt;code&gt;parse_price&lt;/code&gt; to handle HTML inconsistencies.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Default to Empty Collections&lt;/strong&gt; instead of nulls to keep pipelines running smoothly.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Ensure Language Agnosticism&lt;/strong&gt; so your parsing logic produces identical JSON whether you use Python or Node.js.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're starting a new project, use the &lt;a href="https://scrapeops.io/app/register/ai-scraper" rel="noopener noreferrer"&gt;ScrapeOps AI Scraper Generator&lt;/a&gt; to build the base extraction logic, then add Pydantic for production-grade data validation.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>devops</category>
      <category>schema</category>
      <category>cicd</category>
    </item>
  </channel>
</rss>
