Web scraping has become an essential tool for supply chain professionals who need real-time visibility into product availability, pricing shifts, and shortage patterns. In this tutorial, I'll show you how to build a supply chain intelligence tracker using Python.
Why Scrape for Supply Chain Data?
Traditional supply chain monitoring relies on delayed reports and manual checks. By scraping supplier websites, marketplaces, and inventory pages, you can:
- Detect shortages before they hit mainstream news
- Track price fluctuations across multiple suppliers
- Monitor stock levels for critical components
- Build early warning systems for disruptions
Setting Up the Scraper
First, install the required packages:
pip install requests beautifulsoup4 pandas schedule
We'll use ScraperAPI to handle proxies and anti-bot measures, which is critical when scraping at scale.
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
SCRAPER_API_KEY = "YOUR_SCRAPERAPI_KEY"
def scrape_product_availability(url):
"""Scrape product availability from a supplier page."""
api_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}"
response = requests.get(api_url)
soup = BeautifulSoup(response.text, "html.parser")
products = []
for item in soup.select(".product-listing"):
name = item.select_one(".product-name")
stock = item.select_one(".stock-status")
price = item.select_one(".price")
if name:
products.append({
"name": name.text.strip(),
"in_stock": "in stock" in (stock.text.lower() if stock else ""),
"price": price.text.strip() if price else "N/A",
"scraped_at": datetime.now().isoformat()
})
return products
Building the Monitoring Pipeline
Now let's create a pipeline that tracks multiple suppliers and detects changes:
import json
import os
class SupplyChainMonitor:
def __init__(self, data_file="supply_data.json"):
self.data_file = data_file
self.history = self._load_history()
def _load_history(self):
if os.path.exists(self.data_file):
with open(self.data_file) as f:
return json.load(f)
return {}
def check_supplier(self, supplier_name, url):
"""Check a supplier and detect changes."""
products = scrape_product_availability(url)
alerts = []
for product in products:
key = f"{supplier_name}:{product['name']}"
prev = self.history.get(key)
if prev and prev["in_stock"] and not product["in_stock"]:
alerts.append(f"SHORTAGE: {product['name']} at {supplier_name}")
if prev and prev["price"] != product["price"]:
alerts.append(
f"PRICE CHANGE: {product['name']} "
f"{prev['price']} -> {product['price']}"
)
self.history[key] = product
self._save_history()
return alerts
def _save_history(self):
with open(self.data_file, "w") as f:
json.dump(self.history, f, indent=2)
Scheduling Regular Checks
Use the schedule library to run checks periodically:
import schedule
import time
monitor = SupplyChainMonitor()
suppliers = [
("Supplier A", "https://supplier-a.com/products"),
("Supplier B", "https://supplier-b.com/inventory"),
]
def run_check():
all_alerts = []
for name, url in suppliers:
alerts = monitor.check_supplier(name, url)
all_alerts.extend(alerts)
if all_alerts:
print(f"[{datetime.now()}] ALERTS:")
for alert in all_alerts:
print(f" - {alert}")
schedule.every(30).minutes.do(run_check)
while True:
schedule.run_pending()
time.sleep(60)
Scaling Up with Proxy Rotation
When monitoring dozens of suppliers, you'll need reliable proxy rotation. ScraperAPI handles this automatically, but for custom setups, ThorData provides residential proxies that work well for supply chain sites. For monitoring your scraper health, ScrapeOps gives you dashboards to track success rates.
Visualizing Shortage Trends
import pandas as pd
import matplotlib.pyplot as plt
def plot_availability_trends(data_file="supply_data.json"):
with open(data_file) as f:
data = json.load(f)
df = pd.DataFrame.from_dict(data, orient="index")
df["scraped_at"] = pd.to_datetime(df["scraped_at"])
availability = df.groupby(
df["scraped_at"].dt.date
)["in_stock"].mean() * 100
plt.figure(figsize=(12, 6))
plt.plot(availability.index, availability.values)
plt.title("Product Availability Over Time")
plt.ylabel("% In Stock")
plt.xlabel("Date")
plt.savefig("availability_trend.png")
plt.show()
Key Takeaways
Supply chain scraping gives you a competitive edge by providing real-time data that most organizations only get in weekly reports. Start small with a few critical suppliers and expand as you validate the approach.
The combination of Python, BeautifulSoup, and a reliable proxy service like ScraperAPI makes it straightforward to build production-grade supply chain monitoring without a massive infrastructure investment.
Remember to respect robots.txt and rate limits when scraping. Space out your requests and cache aggressively to be a good citizen of the web.
Top comments (0)