Robert N. Gutierrez

Posted on Feb 23

Building a Costco Price Tracker with Python, Playwright, and Streamlit

#python #webscraping #playwright #devops

In e-commerce, prices are rarely static. Retail giants like Costco adjust prices based on inventory levels, seasonal promotions, and regional demand. If you are a developer hunting for deals or an analyst monitoring competitive intelligence, a single snapshot of a price isn't enough. You need historical data to identify trends and predict future drops.

This guide walks through building a production-ready Costco price tracker. Using the open-source Costco.com-Scrapers repository, we will extract product data with Playwright, store it with timestamps, and visualize the history in a Streamlit dashboard.

Prerequisites

Before starting, ensure you have:

Python 3.8+
ScrapeOps API Key: Costco employs aggressive anti-bot measures. We’ll use ScrapeOps to handle proxy rotation and browser headers. You can get a free API key here.
Terminal access: Comfort working within a virtual environment.

Setup and Installation

Clone the repository and install the dependencies:

# Clone the repository
git clone https://github.com/scraper-bank/Costco.com-Scrapers.git
cd Costco.com-Scrapers/python/playwright

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install requirements
pip install playwright playwright-stealth streamlit pandas
playwright install chromium

Part 1: Data Extraction

The foundation of the tracker is the Playwright scraper located at python/playwright/product_data/scraper/costco_scraper_product_data_v1.py. This script uses playwright-stealth and structured data extraction to avoid detection.

A standard scraper only captures the current price. To track trends, we need to modify the script to include a timestamp. Update the ScrapedData dataclass and the extraction logic to ensure every record is ready for time-series analysis.

Modifying for Time-Series Data

Import the datetime module and add a scraped_at field to the data object:

from datetime import datetime

@dataclass
class ScrapedData:
    # ... existing fields ...
    scraped_at: str = ""  # Add this field

async def extract_data(page: Page) -> Optional[ScrapedData]:
    # ... existing extraction logic ...
    res = ScrapedData()

    # Inject the current timestamp
    res.scraped_at = datetime.now().isoformat()

    # Extract name, price, etc.
    # Playwright targets the JSON-LD schema for accuracy
    scripts = await page.locator("script[type='application/ld+json']").all_text_contents()
    # ...
    return res

Adding datetime.now().isoformat() ensures that every time the scraper runs, the price is anchored to a specific moment. This allows the dashboard to plot price changes over time.

Part 2: Handling Anti-Bot Measures

Scraping Costco repeatedly from a local IP usually results in a 403 Forbidden error or a CAPTCHA. The repository addresses this by integrating with the ScrapeOps Residential Proxy Aggregator.

Residential proxies route requests through real home IP addresses, making the scraper look like a regular shopper. In your costco_scraper_product_data_v1.py file, configure your ScrapeOps credentials:

# ScrapeOps Residential Proxy Configuration
PROXY_CONFIG = {
    "server": "http://residential-proxy.scrapeops.io:8181",
    "username": "scrapeops",
    "password": "YOUR_SCRAPEOPS_API_KEY"
}

The script uses these credentials to authenticate every request. ScrapeOps handles the rotation automatically, so you don't need to manage proxy lists manually.

Part 3: Automating the Tracker

To build a useful history, the scraper needs to run periodically, such as once every 24 hours. Instead of overwriting files, we use the JSONL (JSON Lines) format.

The DataPipeline class in the repository appends a new line to the file for every successful scrape:

class DataPipeline:
    def __init__(self, jsonl_filename="price_history.jsonl"):
        self.jsonl_filename = jsonl_filename

    def add_data(self, scraped_data: ScrapedData):
        with open(self.jsonl_filename, mode="a", encoding="UTF-8") as output_file:
            json_line = json.dumps(asdict(scraped_data), ensure_ascii=False)
            output_file.write(json_line + "\n")

You can automate this with a Cron job on Linux/Mac or a Scheduled Task on Windows. Here is a basic shell script (run_tracker.sh) to trigger the scrape:

#!/bin/bash
# run_tracker.sh
python costco_scraper_product_data_v1.py --urls "https://www.costco.com/product-1.html" "https://www.costco.com/product-2.html"

Part 4: Building the Dashboard

With price_history.jsonl collecting data, we can visualize it using Streamlit. Create a file named app.py:

import streamlit as st
import pandas as pd
import json

st.set_page_config(page_title="Costco Price Tracker", layout="wide")
st.title("🛒 Costco Price Intelligence Dashboard")

# Load data from JSONL
@st.cache_data(ttl=3600)
def load_data():
    data = []
    with open('price_history.jsonl', 'r') as f:
        for line in f:
            data.append(json.loads(line))
    df = pd.DataFrame(data)

    # Data Cleaning
    df['date'] = pd.to_datetime(df['scraped_at'])
    df['price'] = pd.to_numeric(df['price'], errors='coerce')
    return df

try:
    df = load_data()

    # Sidebar for filtering
    product_list = df['name'].unique()
    selected_product = st.sidebar.selectbox("Select a Product", product_list)

    # Filtered Data
    filtered_df = df[df['name'] == selected_product].sort_values('date')

    # Metrics
    current_price = filtered_df['price'].iloc[-1]
    min_price = filtered_df['price'].min()

    col1, col2 = st.columns(2)
    col1.metric("Current Price", f"${current_price:,.2f}")
    col2.metric("All-Time Low", f"${min_price:,.2f}", delta=float(current_price - min_price), delta_color="inverse")

    # Price History Chart
    st.subheader(f"Price Trend: {selected_product}")
    st.line_chart(filtered_df.set_index('date')['price'])

except FileNotFoundError:
    st.error("No data found. Run the scraper first to generate 'price_history.jsonl'.")

Why this approach works:

Efficiency: JSONL allows the app to read data line-by-line, which is more memory-efficient than loading a single massive JSON array.
Context: Streamlit metrics show immediately if the current price is a deal compared to the historical low.
Speed: The @st.cache_data decorator ensures the UI stays responsive even as the dataset grows.

Next Steps

Once the pipeline is stable, you can extend it with several features:

Price Drop Alerts: Integrate SendGrid or Slack notifications to alert you when a price hits a specific threshold.
Availability Tracking: The scraper already extracts stock status. You can plot this to see how often high-demand items sell out.
Database Migration: If the JSONL file grows to thousands of entries, consider switching the DataPipeline to SQLite or PostgreSQL for faster querying.

To wrap up, combining Playwright with Streamlit creates a powerful tool for monitoring market trends. You can find more Costco scraping implementations in the Costco.com-Scrapers repository.

DEV Community