DEV Community

Cover image for Making the first ever web scraper

Posted on

Making the first ever web scraper

This is a comprehensive on point tutorial to get you started on web scraping. Before getting on our project we have to install some mandatory libraries.

Let's get started.

Beautiful Soup
So, beautiful soup package is the essence of web scraping and all we are about to do. Short intro- It's a python package to parse HTML and XML documents.

Requests is a python HTTP library to make HTTP and XML requests simple and convenient.

Now, we are done with installing all the ingredients that we need to cook our first web scraper. We will be using the newegg website to scrape the costs of graphics card that are available on the website.

Step 1
Hop on to your favourite editor and follow my lead.

Alt Text

Let's first import our libraries that we just pip-installed.

Step 2
After importing the libraries, we have to make an HTTP request to the newegg website to scrape through. We will be using the requests library to make the request. When the request is approved we will parse the website using our beautiful soup library.

Step 3
Pheww, we are halfway down the road.

Okay, now we have to do bit of an excercise. Navigate to this link and explore for real on this page. Let the information sink in.

Now, open the developer tools of your browser(for chrome: follow
for firefox: follow) and watch the code even if you don't understand and try finding where the prices would be hidden amongst these layers of code. Sorry, for sounding gloomy but as web scraper you have to develop this intuition.

After you are done experimenting and playing around let's dig the website and do what we're supposed to do. So, we will use the beautiful soup library to find the price from our parsed website. And we wil repeat this step to extract the prices of all the graphics cards on the page.

Congratulations! You just made your first ever web scraper.

Alt Text

Here is the full code:

Note: Maybe your output was not clean but you got the prices. Try cleaning them and remove the unnecessarily spaces. Good Luck!

Top comments (0)