DEV Community 👩‍💻👨‍💻

Cover image for What is the web scraping and how it works?
Ahmed Atef
Ahmed Atef

Posted on

What is the web scraping and how it works?

Alt Text

What is web scraping?

Web scraping is a way to take some data from a large amount of data on the website and export it in different types of shapes such as JSON, CSV, Excel sheets and various extensions depending on the application or framework we use all of this for the purpose of analyzing that data to draw conclusions and comparisons from it.

Alt Text

How does web scraping work?

  • the web scraping first take one or more websites URL
  • then the scraper loads the HTML page, and if you use advanced scrapper it will render the entire page including CSS and JavaScript
  • than scraper extract all the page data or specific element base on what we need
  • then it will export the data in CSV, Excel, JSON or any other sport extinctions

Alt Text

What are the uses of web scraping?

  • Scraping data from websites to generate leads
  • Scraping product data from sites like Amazon for competitor analysis
  • Scraping product details for comparison shopping
  • Scraping financial data for market insights and research
  • Scraping jobs websites to find most Appropriate for clients
  • there are a lot of things to use scraping with that is based on the person who uses it

Alt Text

What do I need as a programmer to learn it?

  • Basic knowledge in programming languages like python or JavaScript
  • Basic knowledge in a framework that is for scraping and this is some example for python (Scrapy, PySpider, Selenium)
  • Basic Html knowledge that is for knew the type of element in the target web site to scrape it
  • Basic CSS or XML knowledge that for use it to select the HTML * elements from the website by the framework tools
  • (Optional) basic knowledge in the regular expression to search for the HTML elements in the website

Alt Text

Conclusion:

in the end, web scraping is an important topic and easy to learn by some basic knowledge you can begin to work in this niche

Top comments (7)

Collapse
 
adamdsherman profile image
AdamDSherman • Edited on

So ultimately you're using CSS classes and IDs to pull the data from an HTML element and save it?

What happens with React generated elements that don't have consistant CSS classes?

Collapse
 
florianrappl profile image
Florian Rappl

You can find any element via some selector. The only difference is the robustness of the solution. There is, however, no fully robust solution as everything (the DOM hierachy, the CSS classes, and the used IDs) may be changed from the site owner.

Just open your dev tools, click in the elements tab on some DOM node and select "copy selector".

Collapse
 
stalha97 profile image
stalha97

I have been seeing "Xpath" everywhere. Can it not be used as an absolute path? I am interested in web scraping.

P.S - Great article

Thread Thread
 
ahmedatefae profile image
Ahmed Atef Author • Edited on

You can use css as selector for the target html elements but i advice you to use xml in web scraping it have alot of advantages

Collapse
 
jonyk56 profile image
Jonyk56

👏

Collapse
 
alex24409331 profile image
alex24409331

Thank you for your article.
as a newbie I am using e-scraper.com to scrape data for eCommerce i need.

Need a better mental model for async/await?

Check out this classic DEV post on the subject.

⭐️🎀 JavaScript Visualized: Promises & Async/Await

async await