What is the web scraping and how it works?

#python #javascript #webdev #security

What is web scraping?

Web scraping is a way to take some data from a large amount of data on the website and export it in different types of shapes such as JSON, CSV, Excel sheets and various extensions depending on the application or framework we use all of this for the purpose of analyzing that data to draw conclusions and comparisons from it.

How does web scraping work?

the web scraping first take one or more websites URL
then the scraper loads the HTML page, and if you use advanced scrapper it will render the entire page including CSS and JavaScript
than scraper extract all the page data or specific element base on what we need
then it will export the data in CSV, Excel, JSON or any other sport extinctions

What are the uses of web scraping?

Scraping data from websites to generate leads
Scraping product data from sites like Amazon for competitor analysis
Scraping product details for comparison shopping
Scraping financial data for market insights and research
Scraping jobs websites to find most Appropriate for clients
there are a lot of things to use scraping with that is based on the person who uses it

What do I need as a programmer to learn it?

Basic knowledge in programming languages like python or JavaScript
Basic knowledge in a framework that is for scraping and this is some example for python (Scrapy, PySpider, Selenium)
Basic Html knowledge that is for knew the type of element in the target web site to scrape it
Basic CSS or XML knowledge that for use it to select the HTML * elements from the website by the framework tools
(Optional) basic knowledge in the regular expression to search for the HTML elements in the website

Conclusion:

in the end, web scraping is an important topic and easy to learn by some basic knowledge you can begin to work in this niche

Top comments (6)

ADS-BNE • Sep 4 '20 • Edited

So ultimately you're using CSS classes and IDs to pull the data from an HTML element and save it?

What happens with React generated elements that don't have consistant CSS classes?

Florian Rappl • Sep 4 '20

You can find any element via some selector. The only difference is the robustness of the solution. There is, however, no fully robust solution as everything (the DOM hierachy, the CSS classes, and the used IDs) may be changed from the site owner.

Just open your dev tools, click in the elements tab on some DOM node and select "copy selector".