Competitive intelligence used to mean hiring analysts, subscribing to expensive SaaS tools, and manually tracking competitor websites. In 2026, web scraping has changed the game entirely — you can build automated monitoring systems that track pricing, product launches, hiring patterns, and customer sentiment in real time.
In this guide, I'll show you how to architect a competitive intelligence pipeline using web scraping, with practical examples you can deploy today.
Why Web Scraping for Competitive Intelligence?
The web is the largest public dataset in the world. Your competitors publish their pricing, job openings, product features, and customer reviews openly. The challenge isn't access — it's automation and scale.
Manual monitoring breaks down when you're tracking:
- 50+ competitor product pages for price changes
- Hundreds of job postings across 10 companies
- Thousands of customer reviews on G2, Trustpilot, and app stores
- Product launch announcements across blogs and social media
Use Case 1: E-Commerce Price Monitoring
Price monitoring is the most common competitive intelligence use case. Whether you're selling on Amazon, eBay, or your own store, knowing when competitors change prices lets you react instantly.
Architecture
[Scheduler] → [Scraper] → [Parser] → [Database] → [Alert System]
(cron) (proxy) (extract) (compare) (email/slack)
Here's a basic price monitoring scraper:
import httpx
from selectolax.parser import HTMLParser
from datetime import datetime
import json
def scrape_product_price(url: str, proxy_url: str = None) -> dict:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
client_kwargs = {"headers": headers, "follow_redirects": True}
if proxy_url:
client_kwargs["proxy"] = proxy_url
with httpx.Client(**client_kwargs) as client:
response = client.get(url)
tree = HTMLParser(response.text)
# Extract price (adapt selectors per site)
price_el = tree.css_first('[data-testid="price"], .price, .product-price')
title_el = tree.css_first('h1, .product-title')
return {
"url": url,
"title": title_el.text(strip=True) if title_el else None,
"price": price_el.text(strip=True) if price_el else None,
"scraped_at": datetime.utcnow().isoformat()
}
For eBay specifically, I've built an eBay Scraper on Apify that handles pagination, anti-bot measures, and structured data extraction out of the box.
Scaling with Proxies
When monitoring hundreds of product pages, you'll need residential proxies to avoid blocks. I use ThorData for their rotating residential proxy pool — they handle IP rotation automatically so your scrapers stay reliable.
For simpler setups, ScraperAPI provides a single endpoint that manages proxies, headers, and JavaScript rendering for you:
import httpx
SCRAPERAPI_KEY = "your_key"
def scrape_with_api(url: str) -> str:
api_url = f"http://api.scraperapi.com?api_key={SCRAPERAPI_KEY}&url={url}"
response = httpx.get(api_url)
return response.text
Use Case 2: SaaS Feature & Pricing Tracking
SaaS companies change their pricing pages, feature lists, and positioning constantly. Tracking these changes gives you insight into their strategy.
What to Monitor
| Data Point | Source | Frequency |
|---|---|---|
| Pricing tiers | /pricing page | Daily |
| Feature lists | /features, /product | Weekly |
| Job postings | careers page, LinkedIn | Daily |
| Customer reviews | G2, Capterra | Weekly |
| Blog posts | /blog RSS feed | Daily |
For G2 reviews, my G2 Reviews Scraper extracts structured review data including ratings, pros/cons, and reviewer details — perfect for sentiment analysis.
Change Detection Pipeline
import hashlib
import json
from pathlib import Path
def detect_changes(url: str, new_content: str, storage_dir: str = "./snapshots") -> dict:
Path(storage_dir).mkdir(exist_ok=True)
url_hash = hashlib.md5(url.encode()).hexdigest()
snapshot_path = Path(storage_dir) / f"{url_hash}.json"
new_hash = hashlib.sha256(new_content.encode()).hexdigest()
if snapshot_path.exists():
previous = json.loads(snapshot_path.read_text())
if previous["hash"] != new_hash:
snapshot_path.write_text(json.dumps({"hash": new_hash, "content": new_content}))
return {"changed": True, "url": url}
snapshot_path.write_text(json.dumps({"hash": new_hash, "content": new_content}))
return {"changed": False, "url": url}
Use Case 3: Hiring Intelligence
Job postings reveal a company's strategic direction. If a competitor suddenly posts 15 machine learning engineer roles, they're building an AI product. If they're hiring sales reps in Europe, they're expanding internationally.
What Job Data Tells You
- Engineering roles: What technologies they're investing in
- Sales roles: Which markets they're targeting
- Leadership roles: Strategic pivots or scaling
- Role volume: Growth rate and burn rate signals
My Glassdoor Scraper extracts job postings, company reviews, and salary data — useful for both competitive intelligence and talent market analysis.
Building the Full Pipeline
Here's how I architect a complete competitive monitoring system:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Scheduler │────▶│ Scrapers │────▶│ Storage │
│ (cron/APF) │ │ (per source)│ │ (SQLite) │
└─────────────┘ └──────────────┘ └──────┬──────┘
│
┌──────────────┐ ┌───────▼──────┐
│ Alerts │◀────│ Analyzer │
│ (email/slack)│ │ (diff/trend) │
└──────────────┘ └──────────────┘
Key Components
- Scheduler: Cron jobs or Apify's scheduling for cloud-based runs
- Scrapers: One per data source, each with its own parsing logic
- Proxy Layer: ThorData residential proxies for reliability
- Storage: SQLite for small scale, PostgreSQL for larger deployments
- Analyzer: Compare snapshots, detect changes, compute trends
- Alerts: Email/Slack notifications when significant changes are detected
Getting Started
The fastest path to competitive intelligence:
- Pick 3-5 competitors to monitor
- Identify 2-3 data sources per competitor (pricing page, job board, review site)
- Start with pre-built scrapers on Apify to validate the approach
- Set up change detection with the snapshot pattern above
- Add alerts for price drops, new job postings, or review sentiment shifts
You don't need a massive infrastructure to start. A single server running cron jobs with ScraperAPI for proxy management can monitor dozens of competitors effectively.
What Competitive Data Are You Tracking?
I'd love to hear what competitive intelligence use cases you're working on. Are you monitoring pricing, hiring, reviews, or something else entirely? Drop a comment below — let's share approaches.
I build open-source web scrapers on Apify. Check out my actors for eBay, Glassdoor, and G2 Reviews if you need ready-to-use competitive intelligence tools.
Top comments (0)