Robert N. Gutierrez

Posted on Feb 12

Building a 'Living' Market Intelligence Dashboard with Python and Streamlit

#webscraping #python #devops #dataengineering

You’ve built a web scraper and now have a folder full of JSON files containing thousands of product rows. If you’re still developing your scraper, you can speed things up by using templates from the Etsy.com Scrapers GitHub repository.

Raw data is just digital noise until it’s interpreted. For most developers and businesses, the real challenge isn't just getting the data—it's turning that "dead" data into "living" intelligence. A static spreadsheet shows you what a competitor's price was yesterday. A living dashboard shows you price trends, review velocity, and inventory gaps in real-time. By visualizing these metrics, you move from observing the market to predicting it.

This guide covers how to build a custom market intelligence dashboard using Streamlit and Pandas. We’ll transform raw scraped data into an interactive application that tracks competitor movements and identifies market opportunities using pure Python.

Prerequisites

You’ll need a Python 3.8+ environment and the following libraries:

Streamlit: For the web interface.
Pandas: For data manipulation and cleaning.
Plotly: For interactive visualizations.

Install the required packages using pip:

pip install streamlit pandas plotly

1. Architecture & Setup

A reliable market intelligence pipeline follows a clear path:

Input: Your scraper (e.g., Scrapy or Playwright) saves data to a local file or database.
Processing: Pandas cleans raw strings and calculates "deltas" (changes over time).
Visualization: Streamlit renders the data into charts and metrics.

Use a simple folder structure to keep your scraping logic separate from your dashboard:

market-intelligence-app/
├── data/
│   └── scraped_products.csv  # Our data source
├── scripts/
│   └── mock_data.py          # Script to generate sample data
└── app.py                    # Our Streamlit dashboard

2. Defining the Data Structure

To track trends, you need more than a single snapshot. A single price point is a fact; a price change over three weeks is a signal.

Your scraped data should contain these fields:

timestamp: When the data was collected.
competitor_name: The seller.
product_name: The item name.
price: The current selling price.
review_count: Total number of reviews (used for popularity trends).
rating: The average star rating.

If you don't have a dataset ready, run this script to generate a scraped_products.csv file with 30 days of mock market data:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

def generate_mock_data():
    competitors = ['AlphaCorp', 'BetaStore', 'GammaRetail']
    products = ['Wireless Headphones', 'Smart Watch', 'Gaming Mouse']
    data = []

    for i in range(30):
        date = datetime.now() - timedelta(days=i)
        for comp in competitors:
            for prod in products:
                data.append({
                    'timestamp': date,
                    'competitor_name': comp,
                    'product_name': prod,
                    'price': np.random.uniform(50, 150),
                    'review_count': 100 + (30-i) * np.random.randint(1, 5),
                    'rating': np.random.uniform(3.5, 4.8)
                })

    df = pd.DataFrame(data)
    df.to_csv('data/scraped_products.csv', index=False)
    print("Mock data generated successfully!")

if __name__ == "__main__":
    generate_mock_data()

3. Data Processing with Pandas

Raw scraped data is often messy. Prices might appear as strings like "$199.99", and timestamps can be inconsistent. You need to clean this before it hits your charts.

We also want to calculate Review Velocity. This metric measures how many new reviews a product receives over a specific period, serving as a proxy for sales volume.

import pandas as pd

def load_and_clean_data(filepath):
    df = pd.read_csv(filepath)
    df['timestamp'] = pd.to_datetime(df['timestamp'])

    if df['price'].dtype == 'object':
        df['price'] = df['price'].str.replace('$', '').str.replace(',', '').astype(float)

    # Calculate Review Velocity (Change in reviews per day)
    df = df.sort_values(['product_name', 'competitor_name', 'timestamp'])
    df['review_velocity'] = df.groupby(['product_name', 'competitor_name'])['review_count'].diff().fillna(0)

    return df

By using groupby and diff(), we can isolate each product and see how many reviews were added since the last scrape. This helps spot products gaining traction quickly.

4. Building the Dashboard Layout

Streamlit makes UI design simple. We’ll set up a sidebar for filtering and a row of KPI (Key Performance Indicator) cards for an immediate market health check.

import streamlit as st

st.set_page_config(page_title="Market Intelligence Pro", layout="wide")

st.title("📈 Market Intelligence Dashboard")
st.markdown("Real-time competitor tracking and price positioning.")

st.sidebar.header("Filters")
df = load_and_clean_data('data/scraped_products.csv')

selected_competitor = st.sidebar.multiselect(
    "Select Competitors",
    options=df['competitor_name'].unique(),
    default=df['competitor_name'].unique()
)

filtered_df = df[df['competitor_name'].isin(selected_competitor)]

# KPI Metrics
latest_data = filtered_df[filtered_df['timestamp'] == filtered_df['timestamp'].max()]
avg_price = latest_data['price'].mean()
total_skus = latest_data.shape[0]

col1, col2, col3 = st.columns(3)
col1.metric("Total SKUs Tracked", total_skus)
col2.metric("Avg Market Price", f"${avg_price:.2f}")
col3.metric("Active Competitors", len(selected_competitor))

The st.metric component provides a professional look and can show "deltas" to indicate if prices shifted compared to the previous week.

5. Visualizing the Insights

We'll use Plotly Express for the charts because it provides interactive tooltips and zooming features.

Price Positioning Chart

To see how each competitor prices their version of a product, a box plot is ideal.

import plotly.express as px

st.subheader("Price Positioning by Competitor")
fig_price = px.box(
    filtered_df, 
    x="product_name", 
    y="price", 
    color="competitor_name",
    points="all",
    title="Price Distribution Across Market"
)
st.plotly_chart(fig_price, use_container_width=True)

Review Velocity

This chart identifies which products are gaining popularity. If a competitor's headphones suddenly spike in review velocity, they might be running a promotion or a successful ad campaign.

st.subheader("Review Velocity Over Time")
velocity_trends = filtered_df.groupby(['timestamp', 'competitor_name'])['review_velocity'].sum().reset_index()

fig_trend = px.line(
    velocity_trends, 
    x="timestamp", 
    y="review_velocity", 
    color="competitor_name",
    title="New Reviews per Day"
)
st.plotly_chart(fig_trend, use_container_width=True)

6. Automation and Refreshing

A dashboard is only as good as its last update. In production, your scraper might run every hour. At higher volumes, you’ll also need reliable IP rotation and anti-blocking infrastructure, which is why a proxy aggregator service is useful.

To keep the app snappy, use Streamlit’s caching.

@st.cache_data(ttl=3600) # Cache data for 1 hour
def get_data():
    return load_and_clean_data('data/scraped_products.csv')

if st.button('🔄 Refresh Data'):
    st.cache_data.clear()
    st.rerun()

The ttl=3600 parameter ensures data refreshes every hour. The "Refresh Data" button gives users manual control, clearing the cache and forcing a new read from the CSV file.

To wrap up

We have transformed raw scraped data into a dynamic market intelligence tool. By moving beyond simple data collection, you can now identify pricing anomalies and trending products at a glance.

Key Takeaways:

Historical context is vital: Always include timestamps in your scrapes to calculate trends like review velocity.
Pandas is the engine: Use it to clean currency strings and handle missing data before visualizing.
Interactivity matters: Tools like Plotly and Streamlit allow stakeholders to filter data themselves, leading to faster insights.

Your next step is to move this dashboard to the web. You can use Streamlit Cloud for quick deployment or wrap the application in a Docker container for a production setup. As your data grows, consider swapping the CSV file for a SQL database to maintain performance.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.