Erika S. Adkins

Posted on Feb 5

Visualizing eBay Competitor Pricing: From Raw JSONL to Price Trend Dashboard

#python #webscraping #tutorial #datascience

In the high-stakes world of e-commerce, price is often the only thing standing between a customer clicking "Buy It Now" on your listing or your competitor’s. But simply scraping data isn't enough to stay ahead. If you have a folder full of raw JSONL files, you don't have a strategy; you have a storage problem.

The real value of web scraping lies in turning raw data into actionable insights. You need to see undercutting as it happens, identify price floors, and spot trends over time.

This guide covers how to build an end-to-end pipeline that scrapes eBay product data using Playwright, processes it with Pandas, and visualizes competitor price movements in a Streamlit dashboard.

1. The Setup

Before building the dashboard, you need the engine that powers it. We will use the Ebay.com-Scrapers repository, which contains production-ready scrapers optimized for eBay's structure.

Prerequisites

Python 3.8+
A ScrapeOps API Key for anti-bot bypass
Basic terminal skills

First, clone the repository and install the dependencies:

git clone https://github.com/scraper-bank/Ebay.com-Scrapers.git
cd Ebay.com-Scrapers/python/playwright/product_data
pip install playwright playwright-stealth pandas streamlit
playwright install chromium

We are using the playwright/product_data implementation because it extracts granular details like productId, price, and availability, which are essential for time-series tracking.

2. Configuring the Scraper for Competitors

The default scraper handles individual URLs. To track competitors, run the scraper against a specific list of product pages at regular intervals.

Instead of modifying the core library, create a wrapper script called run_tracker.py. This script loops through target competitor URLs and saves the results. The repository's scraper automatically appends a timestamp to the filename (e.g., ebay_com_product_page_scraper_data_20260116_090000.jsonl), which makes historical tracking straightforward.

import asyncio
from scraper.ebay_com_scraper_product_v1 import extract_data, API_KEY
from playwright.async_api import async_playwright

# List of specific competitor product URLs to monitor
COMPETITOR_URLS = [
    "https://www.ebay.com/itm/123456789012",
    "https://www.ebay.com/itm/987654321098",
]

async def run_monitoring_session():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        for url in COMPETITOR_URLS:
            print(f"Scraping competitor: {url}")
            await page.goto(url)
            data = await extract_data(page)

            # Data saving is handled by the DataPipeline class 
            # in the core scraper file
            print(f"Extracted Price: {data.price} {data.currency}")

        await browser.close()

if __name__ == "__main__":
    asyncio.run(run_monitoring_session())

3. Understanding the Data Structure

The scraper outputs data in JSONL (JSON Lines) format. This is more efficient than standard JSON for price tracking because it is streamable and handles large datasets without consuming excessive memory.

A typical record looks like this:

{
  "productId": "123456789012",
  "name": "Apple iPhone 15 Pro - 128GB - Blue Titanium",
  "price": 899.0,
  "currency": "USD",
  "availability": "in_stock",
  "seller": {"name": "TopTierElectronics", "rating": 99.8}
}

Two key details to note:

Price Cleaning: The scraper automatically converts "$899.00" into a float (899.0).
Availability: If a competitor goes "out_of_stock", you should decide how to represent that on a chart, such as by breaking the line or marking it as zero.

4. Data Ingestion & Cleaning with Pandas

Since every scraper run creates a new file, the first task is to merge these files into a single chronological DataFrame. We can extract the date and time directly from the filenames to build the time axis.

import pandas as pd
import glob
import re
import json
from datetime import datetime

def load_historical_data(directory="./"):
    all_data = []
    # Find all JSONL files generated by the scraper
    files = glob.glob(f"{directory}/ebay_com_product_page_scraper_data_*.jsonl")

    for file in files:
        # Extract timestamp from filename: 20260116_090000
        match = re.search(r'(\d{8}_\d{6})', file)
        if not match:
            continue

        timestamp = datetime.strptime(match.group(1), "%Y%m%d_%H%M%S")

        with open(file, 'r', encoding='utf-8') as f:
            for line in f:
                item = json.loads(line)
                item['scrape_timestamp'] = timestamp
                all_data.append(item)

    df = pd.DataFrame(all_data)
    # Ensure price is numeric
    df['price'] = pd.to_numeric(df['price'], errors='coerce')
    return df

5. Building the Dashboard with Streamlit

Now we use Streamlit to create a visual interface. This allows you to filter by product name and see price fluctuations over days or weeks.

Create a file named dashboard.py:

import streamlit as st
import pandas as pd

st.set_page_config(page_title="eBay Price Tracker", layout="wide")
st.title("📈 eBay Competitor Price Trends")

# Load data
df = load_historical_data()

# Sidebar filters
available_products = df['name'].unique()
selected_products = st.sidebar.multiselect(
    "Select Products to Track", 
    options=available_products, 
    default=available_products[:2]
)

filtered_df = df[df['name'].isin(selected_products)]

# Main Chart
if not filtered_df.empty:
    # Pivot data so each product has its own column for the line chart
    chart_data = filtered_df.pivot(
        index='scrape_timestamp', 
        columns='name', 
        values='price'
    )

    st.subheader("Price History")
    st.line_chart(chart_data)

    # Metrics
    cols = st.columns(len(selected_products))
    for i, product in enumerate(selected_products):
        latest_price = filtered_df[filtered_df['name'] == product].iloc[-1]['price']
        cols[i].metric(label=product[:30] + "...", value=f"${latest_price}")
else:
    st.write("Please select a product to see the price trend.")

Run the dashboard with:
streamlit run dashboard.py

6. Automating the Workflow

A dashboard is only useful if the data is fresh. Running the scraper manually every morning is inefficient.

On Linux or macOS, set up a cron job to run the run_tracker.py script every 6 hours:

0 */6 * * * /usr/bin/python3 /path/to/run_tracker.py

For Windows users, Task Scheduler achieves the same result. Once automated, the Streamlit dashboard will update with the latest price points every time you refresh the page.

To Wrap Up

We have moved from simple data extraction to building a functional business tool. By combining the scraping capabilities of the ScrapeOps eBay repository with the analytical power of Pandas and Streamlit, you now have a custom price intelligence platform.

Key Takeaways:

JSONL is efficient: It is the best format for logging time-series scraping data.
Filename Timestamps: Storing metadata in the filename prevents data loss if the internal JSON structure changes.
Visualization works: A 5% price drop on a line chart is much easier to act on than scanning raw text files.

To learn more about bypassing anti-bot measures, check out the eBay Scraping Breakdown.

DEV Community