In e-commerce, price isn't the only factor that determines success. If a customer searches for "winter boots" on Zappos and your product appears on the third page, the sale is likely lost before the customer even sees your price tag. This visibility is known as Share of Shelf (SoS).
Measuring your digital Share of Shelf manually is impossible at scale. Zappos is a highly protected site that frequently updates its layout and employs sophisticated bot detection. To gain a competitive edge, you need to automate the collection of search rankings.
This guide covers how to build a Python script using Selenium and undetected-chromedriver to track product positions for specific keywords. By the end, you’ll be able to calculate exactly what percentage of the "digital shelf" your brand owns on Zappos.
Prerequisites
To follow along, you'll need:
- Python 3.7+
- A ScrapeOps API Key: Zappos uses aggressive anti-botting measures. ScrapeOps proxies rotate IPs to bypass these blocks. You can get a free API key here.
- Chrome Browser (required for Selenium).
Environment Setup
We'll use undetected-chromedriver to make the browser instance look like a real user and selenium-wire to handle proxy authentication.
pip install selenium-wire undetected-chromedriver pandas
Mapping the Zappos Search Page
Before writing code, examine how Zappos structures its search results. Open Chrome DevTools (F12) on a search page to find these patterns:
- Product Containers: Each product is wrapped in an
<article>tag, typically using classes likeyW-zor attributes likedata-style-id. - Data Points: Inside these articles, you'll find the brand name, product name, and price.
- Pagination: Zappos uses a URL parameter (usually
p) for pagination. For example,&p=1loads page two. - JSON-LD: Zappos often embeds JSON-LD (Linked Data) in
<script type="application/ld+json">tags. This is a goldmine for scrapers because structured data is much more stable than HTML classes.
Implementing the Search Scraper
The core scraper needs to use ScrapeOps proxies to ensure Zappos doesn't rate-limit your IP address after a few requests.
Proxy and Driver Configuration
We use selenium-wire to inject ScrapeOps credentials into the request headers.
import undetected_chromedriver as uc
from seleniumwire import webdriver
import os
API_KEY = "YOUR_SCRAPEOPS_API_KEY"
# ScrapeOps Residential Proxy Configuration
PROXY_CONFIG = {
'proxy': {
'http': f'http://scrapeops:{API_KEY}@residential-proxy.scrapeops.io:8181',
'https': f'http://scrapeops:{API_KEY}@residential-proxy.scrapeops.io:8181',
'no_proxy': 'localhost:127.0.0.1'
}
}
def get_driver():
options = uc.ChromeOptions()
options.add_argument("--headless") # Run without a visible window
driver = webdriver.Chrome(
options=options,
seleniumwire_options=PROXY_CONFIG
)
return driver
The Extraction Logic
Next, we need a function to loop through the search results. We'll focus on grabbing the brand and product name to identify which items belong to you versus your competitors.
from selenium.webdriver.common.by import By
def extract_search_results(driver, page_number):
products = []
# Locate all product articles
elements = driver.find_elements(By.CSS_SELECTOR, "article.yW-z")
for index, el in enumerate(elements):
try:
# Calculate organic rank: (Page - 1) * ItemsPerPage + CurrentIndex + 1
rank = ((page_number - 1) * 100) + index + 1
brand = el.find_element(By.CSS_SELECTOR, ".dC-z").text
name = el.find_element(By.CSS_SELECTOR, ".eC-z").text
price = el.find_element(By.CSS_SELECTOR, "._Y-z").text
products.append({
"rank": rank,
"brand": brand.strip(),
"product_name": name.strip(),
"price": price.strip()
})
except Exception:
continue
return products
Calculating Share of Shelf
To calculate SoS, aggregate results across multiple pages and group them by brand. If you are monitoring "running shoes," you want to know what percentage of the top 50 or 100 results your brand occupies.
def run_sos_analysis(keyword, target_pages=2):
driver = get_driver()
search_url = f"https://www.zappos.com/search?term={keyword.replace(' ', '+')}"
all_data = []
for page in range(1, target_pages + 1):
url = f"{search_url}&p={page-1}"
driver.get(url)
# Wait for elements to load
driver.implicitly_wait(10)
page_results = extract_search_results(driver, page)
all_data.extend(page_results)
print(f"Scraped {len(page_results)} items from page {page}")
driver.quit()
return all_data
# Example Usage
results = run_sos_analysis("running shoes", target_pages=1)
Analyzing the Data with Pandas
With the data collected, use Pandas to calculate the visibility metrics.
import pandas as pd
df = pd.DataFrame(results)
# Calculate Share of Shelf (Percentage of total visible slots)
sos = df['brand'].value_counts(normalize=True) * 100
print("--- Share of Shelf Analysis ---")
print(sos.head(10)) # Top 10 brands by visibility
Overcoming Anti-Bot Challenges
Zappos uses several layers of protection, including TLS fingerprinting and behavioral analysis. This script addresses these issues in three ways:
- Undetected Chromedriver: Standard Selenium is easily detected by the
navigator.webdriverproperty.undetected-chromedriverpatches the binary to hide these signals. - Proxy Rotation: Using the ScrapeOps Residential Proxy network makes every request appear to come from a different home, preventing IP-based rate limiting.
- Modern Headless Mode: Running in
headless=newmode ensures the page renders exactly as a user would see it without the performance overhead of a GUI.
Exporting for Business Intelligence
While JSON is useful for development, e-commerce managers usually need Excel or CSV files.
# Save to CSV for reporting
output_file = "zappos_sos_report.csv"
df.to_csv(output_file, index=False)
print(f"Analysis complete. Data saved to {output_file}")
This CSV provides a ranked list of every product on the search page. Running this script daily allows you to visualize whether SEO efforts or ad spends are actually improving your shelf position.
To Wrap Up
Measuring Share of Shelf on Zappos provides a clear picture of your organic visibility compared to competitors. Automating this process moves your strategy away from guesswork and toward data-driven decision making.
Key Takeaways:
- Rank is relative: Your position on page one is often more valuable than your price.
- Use structured data: Check for JSON-LD scripts first, as they are more reliable than CSS selectors.
- Rotate proxies: E-commerce sites block data center IPs quickly. Residential proxies are necessary for consistent data collection.
For more advanced implementations, including Node.js and Playwright versions, check out the Zappos Scraper Bank on GitHub. Your next step is to schedule this script to run automatically to track visibility fluctuations in real-time.
Top comments (0)