How to Build a Supply Chain Visibility Tool with Web Scraping
Supply chain disruptions cost businesses billions. Real-time visibility into shipping, inventory signals, and supplier status means the difference between a minor delay and a major crisis.
What We'll Track
- Port congestion and shipping delays
- Supplier website changes (stock status, lead times)
- Commodity price movements
- Logistics provider status pages
Setup
import requests
from bs4 import BeautifulSoup
import json
import hashlib
from datetime import datetime
PROXY_URL = "https://api.scraperapi.com"
API_KEY = "YOUR_SCRAPERAPI_KEY"
Shipping and logistics sites vary globally. ScraperAPI provides geo-targeted requests for international supply chain data.
Monitoring Port Congestion
def scrape_port_status(port_url):
params = {
"api_key": API_KEY,
"url": port_url,
"render": "true"
}
response = requests.get(PROXY_URL, params=params)
soup = BeautifulSoup(response.text, "html.parser")
vessels = []
for row in soup.select("table.vessel-list tbody tr"):
cells = row.select("td")
if len(cells) >= 4:
vessels.append({
"vessel_name": cells[0].text.strip(),
"status": cells[1].text.strip(),
"eta": cells[2].text.strip(),
"berth": cells[3].text.strip()
})
waiting = len([v for v in vessels if "waiting" in v["status"].lower()])
return {
"total_vessels": len(vessels),
"waiting": waiting,
"congestion_ratio": round(waiting/len(vessels)*100, 1) if vessels else 0,
"vessels": vessels
}
Tracking Supplier Inventory
def check_supplier_status(supplier):
params = {
"api_key": API_KEY,
"url": supplier["url"],
"render": "true"
}
response = requests.get(PROXY_URL, params=params)
soup = BeautifulSoup(response.text, "html.parser")
products = []
for item in soup.select(".product-item"):
name = item.select_one(".product-name, h3")
stock = item.select_one(supplier["stock_sel"])
lead_time = item.select_one(".lead-time")
products.append({
"supplier": supplier["name"],
"product": name.text.strip() if name else "",
"in_stock": "in stock" in (stock.text.lower() if stock else ""),
"lead_time": lead_time.text.strip() if lead_time else "",
"page_hash": hashlib.md5(response.text.encode()).hexdigest()
})
return products
Risk Scoring
def calculate_supply_risk(port_data, supplier_data, commodity_data):
risk_score = 0
factors = []
if port_data["congestion_ratio"] > 30:
risk_score += 3
factors.append(f"Port congestion at {port_data['congestion_ratio']}%")
out_of_stock = sum(1 for s in supplier_data if not s["in_stock"])
if out_of_stock > len(supplier_data) * 0.2:
risk_score += 3
factors.append(f"{out_of_stock} supplier items out of stock")
level = "LOW" if risk_score < 3 else "MEDIUM" if risk_score < 6 else "HIGH"
return {"risk_score": risk_score, "risk_level": level, "factors": factors}
Infrastructure
- ScraperAPI — geo-targeting for port authorities and logistics sites
- ThorData — residential proxies for global supply chain coverage
- ScrapeOps — monitor uptime across your supply chain scraping network
Conclusion
Supply chain visibility through web scraping turns scattered public data into an early warning system. Start with your most critical suppliers and expand from there.
Top comments (0)