DEV Community

loading...

Discussion on: WebScraping [Part-1]

Collapse
sunilaleti profile image
Sunil Aleti Author

The main aim of this tutorial is to make understand people "How and What is Scraping?" I dont have any intentions or work to scrape data. It's just a popular website and it also is easy to explain through this website 🙂

Collapse
cubiclesocial profile image
cubiclesocial

It's against IMDB's Terms of Service (ToS) to scrape their content. Not that their ToS has actually stopped anyone in the past from scraping their site, my response was just to point out an alternative to scraping their content that doesn't violate their ToS.

Scraping websites of private entities is a legal minefield. U.S. government websites, however, are completely legal to scrape as all of the content on them is in the public domain and they usually have data worth scraping that's more up-to-date than what shows up on data.gov. There are also massive multi-petabyte public datasets on Amazon S3 available too that require the use of a scraper toolset to properly retrieve and process (e.g. commoncrawl.org/the-data/get-started/) but that might be a tad more advanced than a beginner's tutorial might be able to cover.

Anywho, just a couple of thoughts.