DEV Community

Vhub Systems
Vhub Systems

Posted on

Is Scraping Competitor Prices Legal Under GDPR? The Actual Answer (With Case Law)

Scraping competitor prices is legal in most jurisdictions. The GDPR question is about personal data — and competitor pricing pages don't contain personal data.

Here is what is actually legal, what to avoid, and how to build a compliant monitoring stack.

The Legal Reality of Web Scraping in 2026

What the courts have actually said:

  • hiQ v. LinkedIn (9th Circuit, 2022): Scraping publicly available data does not violate the Computer Fraud and Abuse Act (CFAA)
  • Meta v. Bright Data (2024): Even scraping while logged in is not automatically a CFAA violation
  • GDPR applies to personal data — names, emails, phone numbers, IP addresses tied to individuals

Pricing data is NOT personal data under GDPR. Product prices, inventory levels, delivery times, and business contact information (business addresses, general company phone numbers) are outside GDPR scope.

What IS risky under GDPR:

  • Scraping names, emails, phone numbers of individuals
  • Building profiles on identifiable natural persons
  • Processing employee data without consent

Building a GDPR-Safe Price Monitor

Rule 1: Target business/product data only

Safe to scrape:

  • Product names and descriptions
  • Prices and promotional pricing
  • Stock levels
  • Delivery times and shipping costs
  • Business addresses and general contact pages
  • Reviews (aggregate scores, not individual reviewer names)

NOT safe without careful handling:

  • Individual reviewer names and emails
  • Employee directories
  • User-generated content with personal identifiers

Rule 2: Do not store unnecessary data

GDPR's data minimization principle: only collect what you need.

If you need competitor prices, store: product_id, price, timestamp, competitor_domain.

Do NOT store: user agent, IP address of the target page, any personal data incidentally scraped.

Rule 3: Respect robots.txt (for ethical and legal protection)

While robots.txt is not legally binding, ignoring it in jurisdictions with Computer Misuse laws can create liability. More importantly, respecting it is good practice and protects against Terms of Service violations.

Check robots.txt before scraping:

import urllib.robotparser

def is_allowed(url, user_agent='*'):
    from urllib.parse import urlparse
    parsed = urlparse(url)
    robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
    rp = urllib.robotparser.RobotFileParser()
    rp.set_url(robots_url)
    rp.read()
    return rp.can_fetch(user_agent, url)
Enter fullscreen mode Exit fullscreen mode

Rule 4: Rate limit to avoid service disruption

Making thousands of requests per minute to a competitor's site could be characterized as a denial-of-service attack in some jurisdictions. Keep your rate to a level that would be indistinguishable from normal user traffic.

Rule of thumb: no more than 1 request per 2-5 seconds per target domain.

The Compliant Stack

Here is what I use to monitor 200+ competitor products daily:

Component 1: Apify Price Scraper

Apify's Amazon and e-commerce scrapers handle rate limiting, browser fingerprinting, and proxy rotation automatically. They follow reasonable crawl policies.

Input:

{
  "urls": [
    "https://competitor1.com/products",
    "https://competitor2.com/pricing"
  ],
  "maxItems": 500
}
Enter fullscreen mode Exit fullscreen mode

Output: product name, price, availability, timestamp.

Cost: ~$0.002-0.005 per product page. About $3-8/month for 200 products checked daily.

Component 2: Google Shopping Scraper

For public market pricing intelligence, Google Shopping aggregates competitor prices publicly. Scraping Google Shopping results gives you pricing data without ever touching competitor servers directly.

This is the most legally conservative approach.

Component 3: Data Pipeline

Apify scraper -> Google Sheets (raw data) -> n8n (calculate deltas) -> Slack/Telegram alert
Enter fullscreen mode Exit fullscreen mode

n8n logic:

  1. Pull latest prices from Apify dataset
  2. Compare to prices from 24h ago in Google Sheets
  3. If price change > 5%: send alert with product name, old price, new price, % change
  4. Log to Sheets for trend analysis

Component 4: Privacy-by-design data handling

  • Store only: domain, product_sku, price, timestamp
  • Auto-delete raw scrape data after 7 days (keep only the price log)
  • No personal data ever enters the pipeline

What This Replaces

Tool Cost What it does
Prisync $99/month Competitor price monitoring
Wiser $200/month+ Retail price intelligence
DataWeave Custom pricing Enterprise price monitoring
Our stack $8-15/month Same monitoring, full control

Get the Complete Scraper Bundle

The scrapers used in this stack are part of the Apify Scrapers Bundle — $29 one-time, no subscription.

Includes Amazon, Google Shopping, Shopify, and 27 other scrapers with pre-configured GDPR-safe settings.

Get the bundle here


Note: This is not legal advice. For specific jurisdiction questions, consult a data protection solicitor. The hiQ and Meta/Bright Data cases are US law. EU law may differ for your use case.

Top comments (0)