Jerry A. Henley

Posted on Mar 5

Share of Shelf Analysis: How to Scrape Zappos Search Results

#webdev #devops #webscraping #dataengineering

In e-commerce, price isn't the only factor that determines success. If a customer searches for "winter boots" on Zappos and your product appears on the third page, the sale is likely lost before the customer even sees your price tag. This visibility is known as Share of Shelf (SoS).

Measuring your digital Share of Shelf manually is impossible at scale. Zappos is a highly protected site that frequently updates its layout and employs sophisticated bot detection. To gain a competitive edge, you need to automate the collection of search rankings.

This guide covers how to build a Python script using Selenium and undetected-chromedriver to track product positions for specific keywords. By the end, you’ll be able to calculate exactly what percentage of the "digital shelf" your brand owns on Zappos.

Prerequisites

To follow along, you'll need:

Python 3.7+
A ScrapeOps API Key: Zappos uses aggressive anti-botting measures. ScrapeOps proxies rotate IPs to bypass these blocks. You can get a free API key here.
Chrome Browser (required for Selenium).

Environment Setup

We'll use undetected-chromedriver to make the browser instance look like a real user and selenium-wire to handle proxy authentication.

pip install selenium-wire undetected-chromedriver pandas

Mapping the Zappos Search Page

Before writing code, examine how Zappos structures its search results. Open Chrome DevTools (F12) on a search page to find these patterns:

Product Containers: Each product is wrapped in an <article> tag, typically using classes like yW-z or attributes like data-style-id.
Data Points: Inside these articles, you'll find the brand name, product name, and price.
Pagination: Zappos uses a URL parameter (usually p) for pagination. For example, &p=1 loads page two.
JSON-LD: Zappos often embeds JSON-LD (Linked Data) in <script type="application/ld+json"> tags. This is a goldmine for scrapers because structured data is much more stable than HTML classes.

Implementing the Search Scraper

The core scraper needs to use ScrapeOps proxies to ensure Zappos doesn't rate-limit your IP address after a few requests.

Proxy and Driver Configuration

We use selenium-wire to inject ScrapeOps credentials into the request headers.

import undetected_chromedriver as uc
from seleniumwire import webdriver
import os

API_KEY = "YOUR_SCRAPEOPS_API_KEY"

# ScrapeOps Residential Proxy Configuration
PROXY_CONFIG = {
    'proxy': {
        'http': f'http://scrapeops:{API_KEY}@residential-proxy.scrapeops.io:8181',
        'https': f'http://scrapeops:{API_KEY}@residential-proxy.scrapeops.io:8181',
        'no_proxy': 'localhost:127.0.0.1'
    }
}

def get_driver():
    options = uc.ChromeOptions()
    options.add_argument("--headless") # Run without a visible window

    driver = webdriver.Chrome(
        options=options,
        seleniumwire_options=PROXY_CONFIG
    )
    return driver

The Extraction Logic

Next, we need a function to loop through the search results. We'll focus on grabbing the brand and product name to identify which items belong to you versus your competitors.

from selenium.webdriver.common.by import By

def extract_search_results(driver, page_number):
    products = []
    # Locate all product articles
    elements = driver.find_elements(By.CSS_SELECTOR, "article.yW-z")

    for index, el in enumerate(elements):
        try:
            # Calculate organic rank: (Page - 1) * ItemsPerPage + CurrentIndex + 1
            rank = ((page_number - 1) * 100) + index + 1

            brand = el.find_element(By.CSS_SELECTOR, ".dC-z").text
            name = el.find_element(By.CSS_SELECTOR, ".eC-z").text
            price = el.find_element(By.CSS_SELECTOR, "._Y-z").text

            products.append({
                "rank": rank,
                "brand": brand.strip(),
                "product_name": name.strip(),
                "price": price.strip()
            })
        except Exception:
            continue

    return products

Calculating Share of Shelf

To calculate SoS, aggregate results across multiple pages and group them by brand. If you are monitoring "running shoes," you want to know what percentage of the top 50 or 100 results your brand occupies.

def run_sos_analysis(keyword, target_pages=2):
    driver = get_driver()
    search_url = f"https://www.zappos.com/search?term={keyword.replace(' ', '+')}"
    all_data = []

    for page in range(1, target_pages + 1):
        url = f"{search_url}&p={page-1}"
        driver.get(url)

        # Wait for elements to load
        driver.implicitly_wait(10)

        page_results = extract_search_results(driver, page)
        all_data.extend(page_results)
        print(f"Scraped {len(page_results)} items from page {page}")

    driver.quit()
    return all_data

# Example Usage
results = run_sos_analysis("running shoes", target_pages=1)

Analyzing the Data with Pandas

With the data collected, use Pandas to calculate the visibility metrics.

import pandas as pd

df = pd.DataFrame(results)

# Calculate Share of Shelf (Percentage of total visible slots)
sos = df['brand'].value_counts(normalize=True) * 100

print("--- Share of Shelf Analysis ---")
print(sos.head(10)) # Top 10 brands by visibility

Overcoming Anti-Bot Challenges

Zappos uses several layers of protection, including TLS fingerprinting and behavioral analysis. This script addresses these issues in three ways:

Undetected Chromedriver: Standard Selenium is easily detected by the navigator.webdriver property. undetected-chromedriver patches the binary to hide these signals.
Proxy Rotation: Using the ScrapeOps Residential Proxy network makes every request appear to come from a different home, preventing IP-based rate limiting.
Modern Headless Mode: Running in headless=new mode ensures the page renders exactly as a user would see it without the performance overhead of a GUI.

Exporting for Business Intelligence

While JSON is useful for development, e-commerce managers usually need Excel or CSV files.

# Save to CSV for reporting
output_file = "zappos_sos_report.csv"
df.to_csv(output_file, index=False)

print(f"Analysis complete. Data saved to {output_file}")

This CSV provides a ranked list of every product on the search page. Running this script daily allows you to visualize whether SEO efforts or ad spends are actually improving your shelf position.

To Wrap Up

Measuring Share of Shelf on Zappos provides a clear picture of your organic visibility compared to competitors. Automating this process moves your strategy away from guesswork and toward data-driven decision making.

Key Takeaways:

Rank is relative: Your position on page one is often more valuable than your price.
Use structured data: Check for JSON-LD scripts first, as they are more reliable than CSS selectors.
Rotate proxies: E-commerce sites block data center IPs quickly. Residential proxies are necessary for consistent data collection.

For more advanced implementations, including Node.js and Playwright versions, check out the Zappos Scraper Bank on GitHub. Your next step is to schedule this script to run automatically to track visibility fluctuations in real-time.

DEV Community