Scraping competitor prices is legal in most jurisdictions. The GDPR question is about personal data — and competitor pricing pages don't contain personal data.
Here is what is actually legal, what to avoid, and how to build a compliant monitoring stack.
The Legal Reality of Web Scraping in 2026
What the courts have actually said:
- hiQ v. LinkedIn (9th Circuit, 2022): Scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA)
- Meta v. Bright Data (2024): Even scraping while logged in is not automatically a CFAA violation
- GDPR applies to personal data — names, emails, phone numbers, IP addresses tied to individuals
Pricing data is NOT personal data under GDPR. Product prices, inventory levels, delivery times, and business contact information (business addresses, general company phone numbers) are outside GDPR scope.
What IS risky under GDPR:
- Scraping names, emails, phone numbers of individuals
- Building profiles on identifiable natural persons
- Processing employee data without consent
Building a GDPR-Safe Price Monitor
Rule 1: Target business/product data only
Safe to scrape:
- Product names and descriptions
- Prices and promotional pricing
- Stock levels
- Delivery times and shipping costs
- Business addresses and general contact pages
- Reviews (aggregate scores, not individual reviewer names)
NOT safe without careful handling:
- Individual reviewer names and emails
- Employee directories
- User-generated content with personal identifiers
Rule 2: Do not store unnecessary data
GDPR's data minimization principle: only collect what you need.
If you need competitor prices, store: product_id, price, timestamp, competitor_domain.
Do NOT store: user agent, IP address of the target page, any personal data incidentally scraped.
Rule 3: Respect robots.txt (for ethical and legal protection)
While robots.txt is not legally binding, ignoring it in jurisdictions with Computer Misuse laws can create liability. More importantly, respecting it is good practice and protects against Terms of Service violations.
Check robots.txt before scraping:
import urllib.robotparser
def is_allowed(url, user_agent='*'):
from urllib.parse import urlparse
parsed = urlparse(url)
robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
rp = urllib.robotparser.RobotFileParser()
rp.set_url(robots_url)
rp.read()
return rp.can_fetch(user_agent, url)
Rule 4: Rate limit to avoid service disruption
Making thousands of requests per minute to a competitor's site could be characterized as a denial-of-service attack in some jurisdictions. Keep your rate to a level that would be indistinguishable from normal user traffic.
Rule of thumb: no more than 1 request per 2-5 seconds per target domain.
The Compliant Stack
Here is what I use to monitor 200+ competitor products daily:
Component 1: Apify Price Scraper
Apify's Amazon and e-commerce scrapers handle rate limiting, browser fingerprinting, and proxy rotation automatically. They follow reasonable crawl policies.
Input:
{
"urls": [
"https://competitor1.com/products",
"https://competitor2.com/pricing"
],
"maxItems": 500
}
Output: product name, price, availability, timestamp.
Cost: ~$0.002-0.005 per product page. About $3-8/month for 200 products checked daily.
Component 2: Google Shopping Scraper
For public market pricing intelligence, Google Shopping aggregates competitor prices publicly. Scraping Google Shopping results gives you pricing data without ever touching competitor servers directly.
This is the most legally conservative approach.
Component 3: Data Pipeline
Apify scraper -> Google Sheets (raw data) -> n8n (calculate deltas) -> Slack/Telegram alert
n8n logic:
- Pull latest prices from Apify dataset
- Compare to prices from 24h ago in Google Sheets
- If price change > 5%: send alert with product name, old price, new price, % change
- Log to Sheets for trend analysis
Component 4: Privacy-by-design data handling
- Store only: domain, product_sku, price, timestamp
- Auto-delete raw scrape data after 7 days (keep only the price log)
- No personal data ever enters the pipeline
What This Replaces
| Tool | Cost | What it does |
|---|---|---|
| Prisync | $99/month | Competitor price monitoring |
| Wiser | $200/month+ | Retail price intelligence |
| DataWeave | Custom pricing | Enterprise price monitoring |
| Our stack | $8-15/month | Same monitoring, full control |
Get the Complete Scraper Bundle
The scrapers used in this stack are part of the Apify Scrapers Bundle — $29 one-time, no subscription.
Includes Amazon, Google Shopping, Shopify, and 27 other scrapers with pre-configured GDPR-safe settings.
Note: This is not legal advice. For specific jurisdiction questions, consult a data protection solicitor. The hiQ and Meta/Bright Data cases are US law. EU law may differ for your use case.
Top comments (0)