Vhub Systems

Posted on Apr 3

Is Scraping Competitor Prices Legal Under GDPR? The Actual Answer (With Case Law)

#webscraping #startup #opensource

Scraping competitor prices is legal in most jurisdictions. The GDPR question is about personal data — and competitor pricing pages don't contain personal data.

Here is what is actually legal, what to avoid, and how to build a compliant monitoring stack.

The Legal Reality of Web Scraping in 2026

What the courts have actually said:

hiQ v. LinkedIn (9th Circuit, 2022): Scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA)
Meta v. Bright Data (2024): Even scraping while logged in is not automatically a CFAA violation
GDPR applies to personal data — names, emails, phone numbers, IP addresses tied to individuals

Pricing data is NOT personal data under GDPR. Product prices, inventory levels, delivery times, and business contact information (business addresses, general company phone numbers) are outside GDPR scope.

What IS risky under GDPR:

Scraping names, emails, phone numbers of individuals
Building profiles on identifiable natural persons
Processing employee data without consent

Building a GDPR-Safe Price Monitor

Rule 1: Target business/product data only

Safe to scrape:

Product names and descriptions
Prices and promotional pricing
Stock levels
Delivery times and shipping costs
Business addresses and general contact pages
Reviews (aggregate scores, not individual reviewer names)

NOT safe without careful handling:

Individual reviewer names and emails
Employee directories
User-generated content with personal identifiers

Rule 2: Do not store unnecessary data

GDPR's data minimization principle: only collect what you need.

If you need competitor prices, store: product_id, price, timestamp, competitor_domain.

Do NOT store: user agent, IP address of the target page, any personal data incidentally scraped.

Rule 3: Respect robots.txt (for ethical and legal protection)

While robots.txt is not legally binding, ignoring it in jurisdictions with Computer Misuse laws can create liability. More importantly, respecting it is good practice and protects against Terms of Service violations.

Check robots.txt before scraping:

import urllib.robotparser

def is_allowed(url, user_agent='*'):
    from urllib.parse import urlparse
    parsed = urlparse(url)
    robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
    rp = urllib.robotparser.RobotFileParser()
    rp.set_url(robots_url)
    rp.read()
    return rp.can_fetch(user_agent, url)

Rule 4: Rate limit to avoid service disruption

Making thousands of requests per minute to a competitor's site could be characterized as a denial-of-service attack in some jurisdictions. Keep your rate to a level that would be indistinguishable from normal user traffic.

Rule of thumb: no more than 1 request per 2-5 seconds per target domain.

The Compliant Stack

Here is what I use to monitor 200+ competitor products daily:

Component 1: Apify Price Scraper

Apify's Amazon and e-commerce scrapers handle rate limiting, browser fingerprinting, and proxy rotation automatically. They follow reasonable crawl policies.

Input:

{
  "urls": [
    "https://competitor1.com/products",
    "https://competitor2.com/pricing"
  ],
  "maxItems": 500
}

Output: product name, price, availability, timestamp.

Cost: ~$0.002-0.005 per product page. About $3-8/month for 200 products checked daily.

Component 2: Google Shopping Scraper

For public market pricing intelligence, Google Shopping aggregates competitor prices publicly. Scraping Google Shopping results gives you pricing data without ever touching competitor servers directly.

This is the most legally conservative approach.

Component 3: Data Pipeline

Apify scraper -> Google Sheets (raw data) -> n8n (calculate deltas) -> Slack/Telegram alert

n8n logic:

Pull latest prices from Apify dataset
Compare to prices from 24h ago in Google Sheets
If price change > 5%: send alert with product name, old price, new price, % change
Log to Sheets for trend analysis

Component 4: Privacy-by-design data handling

Store only: domain, product_sku, price, timestamp
Auto-delete raw scrape data after 7 days (keep only the price log)
No personal data ever enters the pipeline

What This Replaces

Tool	Cost	What it does
Prisync	$99/month	Competitor price monitoring
Wiser	$200/month+	Retail price intelligence
DataWeave	Custom pricing	Enterprise price monitoring
Our stack	$8-15/month	Same monitoring, full control

Get the Complete Scraper Bundle

The scrapers used in this stack are part of the Apify Scrapers Bundle — $29 one-time, no subscription.

Includes Amazon, Google Shopping, Shopify, and 27 other scrapers with pre-configured GDPR-safe settings.

Get the bundle here

Note: This is not legal advice. For specific jurisdiction questions, consult a data protection solicitor. The hiQ and Meta/Bright Data cases are US law. EU law may differ for your use case.

DEV Community