Web scraping sits in one of the most contested legal grey zones in technology. You can build an entire business on it — and get hit with a cease-and-desist the same week. In 2026, the legal landscape is clearer than it was five years ago, but it is still far from settled.
This guide breaks down what developers actually need to know: the landmark cases, what ToS and robots.txt actually mean, GDPR traps, and a practical decision framework you can use before writing a single line of scraping code.
The Core Law: The Computer Fraud and Abuse Act (CFAA)
The CFAA, passed in 1986, was written to prosecute hacking. It makes it illegal to access a computer "without authorization" or in a manner that "exceeds authorized access." For decades, website owners tried to use the CFAA as a weapon against scrapers, arguing that violating a ToS equals unauthorized access.
That argument took a serious blow in 2021 — and the repercussions are still shaping litigation in 2026.
Van Buren v. United States (2021)
The Supreme Court's ruling in Van Buren v. United States narrowed the CFAA significantly. The Court held that "exceeds authorized access" applies to someone who accesses information they are not permitted to access via an otherwise legitimate account — not to someone who uses information in a way the platform dislikes.
In plain terms: if a scraper can access a page by visiting it normally in a browser, accessing the same page programmatically is unlikely to be "unauthorized" under the CFAA. The law was not designed to criminalize every ToS violation.
This was a major win for the scraping community, though it is not a blanket free pass.
hiQ v. LinkedIn — The Long Road to Clarity
The hiQ v. LinkedIn saga is the defining web scraping case of the last decade. hiQ, a workforce analytics company, scraped public LinkedIn profiles. LinkedIn sent cease-and-desist letters and deployed technical blocks. hiQ sued, and what followed was years of appeals.
The Ninth Circuit ruled — twice — that scraping publicly available data likely does not violate the CFAA, because there is no authorization mechanism to circumvent. Anyone can visit a public LinkedIn profile without logging in. A bot doing the same is not "breaking in."
In 2022, a district court issued a permanent injunction in hiQ's favor on the CFAA claim. The case was later settled, but the legal precedent stands: scraping publicly accessible data is on solid legal ground under the CFAA.
The key word is "publicly." The moment you log in, authenticate, or bypass any access control — the calculus changes entirely.
Public Data vs. Authenticated Data: The Dividing Line
This is the most important practical distinction in web scraping law.
Public data — pages accessible to any anonymous visitor — generally falls outside CFAA protection. Courts have been consistent: if LinkedIn, Yelp, or Amazon displays something to anyone who loads the URL, a bot doing the same is not unauthorized access.
Authenticated data — anything behind a login — is a different matter. By logging in, you agreed to the platform's Terms of Service. Scraping behind authentication:
- Almost certainly violates the ToS
- May violate the CFAA depending on how the court interprets your "exceeding authorization"
- Exposes you to civil liability even if criminal prosecution is unlikely
If your scraper logs in as a user, you are operating in legally risky territory. If it scrapes public pages without credentials, you are on firmer ground.
robots.txt — A Convention, Not a Law
robots.txt is a technical standard that tells crawlers which paths to avoid. It is not legally binding in any jurisdiction. Ignoring it does not make you a criminal.
That said, ignoring robots.txt can be used against you in civil litigation as evidence of bad faith. If you scraped a site that explicitly blocked crawlers in robots.txt, and then caused harm to that site, a court may view your scraping as willful rather than incidental.
The practical rule: respect robots.txt not because you are legally required to, but because it signals how a site wants to be treated, and violating it can strengthen a plaintiff's case against you.
Terms of Service — Binding Contract or Wishful Thinking?
ToS agreements are contracts — but only if you agreed to them. For clickwrap agreements (where you explicitly click "I agree"), courts generally treat them as enforceable. If you created an account and accepted the ToS, you are bound by what you agreed to.
For browsewrap agreements (where ToS are linked in a footer but never explicitly accepted), enforceability is much weaker. Courts have split on this, and many have refused to enforce browsewrap ToS against scrapers who never saw or agreed to them.
The bottom line: ToS violations can lead to civil breach-of-contract claims, even if they do not trigger the CFAA. The damages available in contract cases are typically limited to actual provable harm — but that harm can still be significant if you disrupted a platform's infrastructure.
GDPR and Privacy Law: The Real Danger Zone
If you are scraping personal data — names, emails, profile photos, location data, anything that can identify a living person — you have entered GDPR territory if any of that data belongs to EU residents.
Under GDPR, scraping personal data without a lawful basis is illegal, even if the data is publicly posted. "It was public" is not a lawful basis. The lawful bases are consent, legitimate interest, contractual necessity, and a few others — and regulators have been skeptical that commercial scraping qualifies.
In 2021, the Italian data protection authority fined a company for scraping personal data from a social network without a proper legal basis. Similar enforcement actions have followed across Europe.
Practical implications:
- Do not scrape email addresses, phone numbers, or personal profile data at scale without a clear GDPR legal basis and a privacy policy.
- Do not store personal data longer than needed. If you scrape it, document why, and have a retention policy.
- Do not transfer EU personal data outside the EU without appropriate safeguards (Standard Contractual Clauses or similar).
- California's CPRA creates similar obligations for California residents, extending CCPA protections.
GDPR is where most scraping operations face their most serious legal exposure in 2026.
Decision Flowchart: Should I Scrape This?
Use this before you start any scraping project:
Is the target data publicly accessible (no login required)?
├── NO → High legal risk. Consult a lawyer before proceeding.
└── YES → Continue.
Does the site's ToS explicitly prohibit scraping?
├── YES → Civil breach-of-contract risk. Weigh business need vs. exposure.
│ If you proceed, document your reasoning.
└── NO → Lower risk. Continue.
Does the data include personal information about identifiable individuals?
├── YES → GDPR/CPRA obligations apply. Do you have a lawful basis?
│ ├── NO → Do not scrape personal data.
│ └── YES → Proceed with data minimization + retention policy.
└── NO → Continue.
Are you respecting robots.txt and reasonable rate limits?
├── NO → Fix this. Ignoring both signals bad faith and risks infrastructure harm claims.
└── YES → You are in the safest zone available. Document your approach and proceed.
Practical Guidelines for Legal Scraping
1. Stick to public data. Never log in to scrape unless you have explicit written permission from the platform.
2. Read the ToS. Not to follow every clause blindly, but to understand the risk profile before you build a business on it.
3. Respect robots.txt. Not legally required, but it reduces bad-faith arguments.
4. Rate-limit your requests. Scraping so aggressively that you degrade a site's performance can trigger claims of interference or trespass to chattels. Keep your crawl polite.
5. Do not collect personal data without a plan. If you scrape names, emails, or profiles, you need a GDPR legal basis, a privacy policy, and a data retention schedule before you store a single row.
6. Keep records. Document when you scraped, what you scraped, and why. If you are ever challenged, your paper trail shows good faith.
7. Use a scraping infrastructure that keeps you compliant. Rotating proxies and browser fingerprint management can help you avoid IP bans — but they do not change the legal analysis.
Tools That Handle the Infrastructure
Managing scraping at scale — rotating proxies, handling CAPTCHAs, avoiding blocks — is its own engineering problem. These services are built for it:
- ScraperAPI — Handles proxy rotation and CAPTCHA solving automatically. Use code SCRAPE13833889 for 50% off.
- Scrape.do — Headless browser rendering with built-in residential proxies.
- ScrapeOps — Proxy aggregator and Scrapy monitoring platform. Great for production pipelines. Check out their scraping tools and proxy comparison guides for in-depth benchmarks.
These tools do not make illegal scraping legal — but they make legal scraping significantly more reliable.
Final Thought
The law in 2026 is more developer-friendly than it was in 2019. The hiQ and Van Buren decisions shifted the CFAA away from being a general-purpose weapon against scrapers. Scraping public data, with care and documentation, is on solid legal ground in the United States.
The real risks now are GDPR (scraping personal data without a legal basis), breach of contract (scraping behind authentication after agreeing to a ToS that prohibits it), and infrastructure harm (aggressive scraping that degrades site performance).
Know those lines. Stay on the right side of them. Build your scraper accordingly.
Want the full legal and technical picture?
Get The Complete Web Scraping Playbook 2026 — 48 pages covering legality, anti-bot evasion, proxy strategy, data pipelines, and monetization. Available for $9.
Tags: #webscraping #legal #python #tutorial
Top comments (0)