loading...

Start Web Scraping with NodeJs

grohsfabian profile image Grohs Fabian Updated on ・6 min read

Hey there,

Today we're gonna get started with Web Scraping with NodeJs with some cool and simple examples

Let's get started

Introduction

I'm not going to make it boring for you with scientific technical explanation so,

I'm gonna give you a simple example:

Lets say

You want to get information of a instagram profile, followers, followings, uploads, description and other informations which may not be available to an API or you may not have access to that API.

This is the case that you go and start with Web Scraping.

💻 Tools we're gonna use

Here are the tools that I am going to use for this example, these are the perfect tools for getting started

  • Request - Peer dependency for request-promise

  • Request-Promise - In order to make the requests and to get the contents of the website you want to scrape.

  • Cheerio - Probably the most used library to parse html content with NodeJs with a Jquery-like syntax

  • Nothing else. Yes, that's right!

Getting started

I will assume that you already have Node.Js installed on your laptop or pc and if not, what are you waiting for? 🔥

Now, we need to make sure that you have a new project ready to write the code.

You can easily initiate one on a new empty folder with npm.

npm init

And after completing these steps you must install the libraries that we're gonna use by running the following lines ( while on the same new project ):

npm install cheerio --save
npm install --save request
npm install request-promise --save

What are we scraping? 🤔

For this example I am going to take this community website dev.to because I want to make this unique and directly dedicated to all you people 😋

We're gonna scrape basic details of any dev.to member page.

Mentions

I very much want to mention that if you still Web Scrape with callbacks or chained promises, this is going to be a nice refresh to you because we are going to use async await syntax.

I also post a lot content like this on my Scraping Blog including a nice article on Scraping Instagram Profile Data with NodeJs 💻

Lets Code 👨‍💻👩‍💻

Let's get right at it, I don't like to waste time talking non sense without actually showing some code and results.

1. Initial request and parsing

The first phase is pretty straight forward. We need to simulate a request to the dev.to website just like a normal browser would and get the HTML content of it.

Here's what you can do:

const request = require('request-promise');
const cheerio = require('cheerio');

const BASE_URL = 'https://dev.to/';
const USERNAME = 'grohsfabian';

(async () => {

  /* Send the request to the user page and get the results */
  let response = await request(`${BASE_URL}${USERNAME}`);

  /* Start processing the response */
  let $ = cheerio.load(response);

  /* Parse details from the html with query selectors */
  let fullName = $('span[itemprop="name"]').text();

  console.log({ fullName });

})();

And I really do think that this code is pretty self explanatory if you look at it even for someone who doesn't know much about scraping or maybe nothing at all.

This example shows you how easy you can get someone's Full Name from their profile page of the dev.to website.

first

Pretty cool? Let's move further 👁

2. Getting more data

Now, that we have a base to start off, we need to continue to do the same things but for the other data from the Profile that we want to get.

Again, because we are using Cheerio as the method for parsing the html, we can use any selector from the jquery library that is integrated into Cheerio.

So, this means that you should at least have some basic knowledge of CSS Query Selectors ( which you can use in Cheerio ) and also Jquery Selectors.

So, before going any further..

I want to at least break down the selector that we are using for getting the Full Name of the profile.

span[itemprop="name"]

This tells the cheerio library to look for: The HTML element that is a span which has the itemprop attribute AND that attribute is equal to "name".

We are going to use the same structure and logic for the further selectors 💻.

Lets create.

I've made a few more selectors in order to parse more data from the profile and here it is 🔥

let description = $('span[itemprop="description"]').text();
let profilePictureUrl = $('img[class="profile-pic"]').attr('href');

And this is just the start. These are some simple examples that are pretty easy to get and don't require much thinking.

Going a bit deeper.

Here are some interesting informations that could be a little bit more challenging for a beginner to get but still, a nice exercise.

second

These details right here can be existing and can be not there. People can either add their email to be public or not, it's their choice. But still, it is our option to be able to scrape everything that we want.

Here's what I'm gonna do..

  /* Get extra properties from the profile */
  let details = {};

  $('div[class="user-metadata-details-inner"] > div[class="row"]').each((i, elm) => {

    let key = $(elm).find('div[class="key"]').text().trim();
    let value = $(elm).find('div[class="value"]').text().trim();

    details[key] = value;
  });

This piece of code is going to iterate over all the possible properties of the profile, which include stuff like Joined date, email ( if available ), ** location ** ( if available )..etc.

Getting another round of details

We're not stopping here, I'm going even deeper with this to get all the social links available to the persons page.

I'm gonna use a similar technique that I've used above and here is what it's going to look like:

  /* Get socials from the profile */
  let socials = [];
  $('p[class="social"] > a').each((i, elm) => {

    let url = $(elm).attr('href');

    socials.push(url);
  });

And in this code I'm basically iterating over each of the links available in that class that includes the social icon buttons and storing them in an array.

3. Finishing it

Of course, a lot more data can be scraped depending on your needs but I think you get the point now..

Scraping is a nice skill to have and if you know the basics of it then it opens up your imagination of what you can do 🔥

Full Code

TL;DR; Here's everything you need if you didn't want to read the article 😅

const request = require('request-promise');
const cheerio = require('cheerio');

const BASE_URL = 'https://dev.to/';
const USERNAME = 'peter';

(async () => {

  /* Send the request to the user page and get the results */
  let response = await request(`${BASE_URL}${USERNAME}`);

  /* Start processing the response */
  let $ = cheerio.load(response, { normalizeWhitespace: true });

  /* Parse details from the html */
  let fullName = $('span[itemprop="name"]').text();
  let description = $('span[itemprop="description"]').text();
  let profilePictureUrl = $('img[class="profile-pic"]').attr('href');

  /* Get extra properties from the profile */
  let details = {};

  $('div[class="user-metadata-details-inner"] > div[class="row"]').each((i, elm) => {

    let key = $(elm).find('div[class="key"]').text().trim();
    let value = $(elm).find('div[class="value"]').text().trim();

    details[key] = value;
  });

  /* Get socials from the profile */
  let socials = [];
  $('p[class="social"] > a').each((i, elm) => {

    let url = $(elm).attr('href');

    socials.push(url);
  });

  console.log({
    fullName,
    profilePictureUrl,
    description,
    details,
    socials
  });

})();

This code is going to output you something like this:

image

But please do NOT use this code for malicious intent and spamming!

Video Tutorial

The Plug

*Here comes the plug people.. *

I have recently launched my new Blog dedicated to help you learn more about scraping with NodeJs and I have some good articles there and in-depth like this one.

Make sure to check it out, I'm sure you will like it -> LearnScraping with NodeJs.

If you're really liking this kind of stuff, I also have a great 5 Star Course and best seller on Udemy. Also,

I have a secret coupon for all the dev.to members

Learn Web Scraping with NodeJs - The Crash Course

Ask me anything and please let me know what you thought about the article 🔥

Posted on by:

Discussion

markdown guide
 
Sloan, the sloth mascot Comment marked as low quality/non-constructive by the community View code of conduct

Check out I'm using Scrapy tool in python!!!
fiverr.com/m_waqarsikandar