Web Scraping for Supply Chain Intelligence: Tracking Shortages

#python #tutorial #webdev #programming

Web scraping has become an essential tool for supply chain professionals who need real-time visibility into product availability, pricing shifts, and shortage patterns. In this tutorial, I'll show you how to build a supply chain intelligence tracker using Python.

Why Scrape for Supply Chain Data?

Traditional supply chain monitoring relies on delayed reports and manual checks. By scraping supplier websites, marketplaces, and inventory pages, you can:

Detect shortages before they hit mainstream news
Track price fluctuations across multiple suppliers
Monitor stock levels for critical components
Build early warning systems for disruptions

Setting Up the Scraper

First, install the required packages:

pip install requests beautifulsoup4 pandas schedule

We'll use ScraperAPI to handle proxies and anti-bot measures, which is critical when scraping at scale.

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime

SCRAPER_API_KEY = "YOUR_SCRAPERAPI_KEY"

def scrape_product_availability(url):
    """Scrape product availability from a supplier page."""
    api_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}"
    response = requests.get(api_url)
    soup = BeautifulSoup(response.text, "html.parser")

    products = []
    for item in soup.select(".product-listing"):
        name = item.select_one(".product-name")
        stock = item.select_one(".stock-status")
        price = item.select_one(".price")

        if name:
            products.append({
                "name": name.text.strip(),
                "in_stock": "in stock" in (stock.text.lower() if stock else ""),
                "price": price.text.strip() if price else "N/A",
                "scraped_at": datetime.now().isoformat()
            })
    return products

Building the Monitoring Pipeline

Now let's create a pipeline that tracks multiple suppliers and detects changes:

import json
import os

class SupplyChainMonitor:
    def __init__(self, data_file="supply_data.json"):
        self.data_file = data_file
        self.history = self._load_history()

    def _load_history(self):
        if os.path.exists(self.data_file):
            with open(self.data_file) as f:
                return json.load(f)
        return {}

    def check_supplier(self, supplier_name, url):
        """Check a supplier and detect changes."""
        products = scrape_product_availability(url)
        alerts = []

        for product in products:
            key = f"{supplier_name}:{product['name']}"
            prev = self.history.get(key)

            if prev and prev["in_stock"] and not product["in_stock"]:
                alerts.append(f"SHORTAGE: {product['name']} at {supplier_name}")

            if prev and prev["price"] != product["price"]:
                alerts.append(
                    f"PRICE CHANGE: {product['name']} "
                    f"{prev['price']} -> {product['price']}"
                )

            self.history[key] = product

        self._save_history()
        return alerts

    def _save_history(self):
        with open(self.data_file, "w") as f:
            json.dump(self.history, f, indent=2)

Scheduling Regular Checks

Use the schedule library to run checks periodically:

import schedule
import time

monitor = SupplyChainMonitor()

suppliers = [
    ("Supplier A", "https://supplier-a.com/products"),
    ("Supplier B", "https://supplier-b.com/inventory"),
]

def run_check():
    all_alerts = []
    for name, url in suppliers:
        alerts = monitor.check_supplier(name, url)
        all_alerts.extend(alerts)

    if all_alerts:
        print(f"[{datetime.now()}] ALERTS:")
        for alert in all_alerts:
            print(f"  - {alert}")

schedule.every(30).minutes.do(run_check)

while True:
    schedule.run_pending()
    time.sleep(60)

Scaling Up with Proxy Rotation

When monitoring dozens of suppliers, you'll need reliable proxy rotation. ScraperAPI handles this automatically, but for custom setups, ThorData provides residential proxies that work well for supply chain sites. For monitoring your scraper health, ScrapeOps gives you dashboards to track success rates.

Visualizing Shortage Trends

import pandas as pd
import matplotlib.pyplot as plt

def plot_availability_trends(data_file="supply_data.json"):
    with open(data_file) as f:
        data = json.load(f)

    df = pd.DataFrame.from_dict(data, orient="index")
    df["scraped_at"] = pd.to_datetime(df["scraped_at"])

    availability = df.groupby(
        df["scraped_at"].dt.date
    )["in_stock"].mean() * 100

    plt.figure(figsize=(12, 6))
    plt.plot(availability.index, availability.values)
    plt.title("Product Availability Over Time")
    plt.ylabel("% In Stock")
    plt.xlabel("Date")
    plt.savefig("availability_trend.png")
    plt.show()

Key Takeaways

Supply chain scraping gives you a competitive edge by providing real-time data that most organizations only get in weekly reports. Start small with a few critical suppliers and expand as you validate the approach.

The combination of Python, BeautifulSoup, and a reliable proxy service like ScraperAPI makes it straightforward to build production-grade supply chain monitoring without a massive infrastructure investment.

Remember to respect robots.txt and rate limits when scraping. Space out your requests and cache aggressively to be a good citizen of the web.