DEV Community

Discussion on: Web Scraping - A Complete Guide

Collapse
 
cubiclesocial profile image
cubiclesocial • Edited

You can't get in legal trouble for scraping public websites where you don't have a clickwrap agreement for the Terms of Service. (Your IP might get banned by an admin or automated system for abusing web server resources, but that's a completely different issue.) Terms of Service documents are not legally binding if the data being scraped is publicly available. That is, an account or clickwrap approval was not required to obtain the data. Data is generally more like a recipe. Recipes are not protected by copyright law. Most website operators allow googlebot to scrape their content so that the website can be indexed in search results, but googlebot, in this case, violates any Terms of Service document that claims to disallow web scraping. It's a good thing then that googlebot ignores ToS documents.

As an example, imagine if I were allowed to say, "You now owe me $1,000 for the privilege of reading this message on dev.to. Go to any CubicleSoft repo on GitHub and use the Donate link to pay up." Not only is that ridiculous, but you didn't agree to it and the allowance of such would result in the collapse of society. No sane court of law would entertain such an argument.

Similarly, a Terms of Service document on a website is legally non-enforceable unless the user actually agrees to it either by creating an account where doing so has language as such or every entry point to valuable data requires agreement prior to accessing the data, thereby forming a contract between the user and the data provider. Contract law then takes effect. It's a subtle but important distinction. Everyone who has gotten in trouble legally to date for scraping content has formally agreed to the provider's ToS.

Whether or not digital clickwrap agreements like ToS' and software EULAs should actually have force of law under contract law is still a matter of ample debate and very little case law.

Note that I'm not a lawyer and this isn't legal advice but any assumption that simply accessing a website results in automatically agreeing to that website's ToS is an obviously invalid argument. Like a contract, unless you sign the agreement, it has no effect.