Nitin Reddy

Posted on Apr 11, 2020

Web-scraping with NodeJS

#node #webscraping #programming #rottentomatoes

Today we are going to learn about how we can do web-scraping with NodeJS and some other tools.
We will be fetching the data from a web URL with the GET request and store it in a CSV file.

The codebase is available at Node-WEbScrap

Tools and things required:-

NodeJS
NPM packages
1. request-promise - It helps us to make HTTP requests to the source Uri and get the data
2. cheerio - This is used to load and parse markup data.
3. json2csv - This is used to convert the JSON data to the CSV format
Basic knowledge of JavaScript

Let's get started with the project

Create a NodeJS project

   $ mkdir node-webscrap
   $ cd node-webscrap
   $ npm init
   $ yarn add request-promise request cheerio json2csv

Create an index.js file in the root directory of your project

   $ touch index.js

Get all the required modules inside the index.js

    const request = require("request-promise")
    const cheerio = require("cheerio")
    const fs = require("fs")
    const json2csv = require("json2csv").Parser;

Next, create an array of movies with proper strings. I have used rotten tomatoes to get the movie review URLs

   const movies = [
     "https://www.rottentomatoes.com/m/the_last_full_measure",
     "https://www.rottentomatoes.com/m/stray_dolls"
   ];

Now create a function with the below code base

   const dataRepresent = async() => {
     let rottenTomatoData = []

     for (let movie of movies) {
     const response = await request({
      uri: movie,
      headers: {
        "accept": 
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "accept-encoding": "gzip, deflate, br",
        "accept-language": "en-US,en;q=0.9,es;q=0.8"
      },
      gzip: true,
     })

     let $ = cheerio.load(response);
     let title = $("h1[class='mop-ratings-wrap__title mop-ratings-wrap__title--top']").text().trim()
     let tomatoMeterObj = $('#tomato_meter_link > .mop-ratings-wrap__percentage');
     let tomatoMeter = tomatoMeterObj && tomatoMeterObj.text().trim();
     let audMeterObj = $('.audience-score > .mop-ratings-wrap__score >  .articleLink  > .mop-ratings-wrap__percentage');
     let audMeter = audMeterObj && audMeterObj.text().trim();
     let summary = $('.mop-ratings-wrap__text').text().trim()

     rottenTomatoData.push({
      title,
      tomatoMeter,
      audMeter,
      summary,
     });
   }
   const j2cp = new json2csv()
   const csv = j2cp.parse(rottenTomatoData);
   fs.writeFileSync('./rottenTomatoes.csv', csv, "utf-8")
 }

Call the function at the end in the index.js file

    dataRepresent();

After running the index.js from the command line, you should see the file "rottenTomatoes.csv" getting generated in the project's root directory

   $ node .\index.js

So here we are iterating over the movies array asynchronously and using request-promise npm module we are passing headers, uri and the required parameter like gzip to fetch the raw HTML data. Using cheerio we can parse the data by using jquery selectors to get the data.

Then we push the data into "rottenTomatoData" array and write the data in the file named as "rottenTomatoes.csv" using fs module provided by NodeJS out of the box

So that's it for the day. I will come up with some learnings and will share them with you.

Thanks for reading and please share it across with other folks and keep learning!!

Top comments (3)

allnulled • Apr 11 '20

With this tool you can reuse your node and your browser js knowledge:

dev.to/allnulled/live-web-scrappin...

Nitin Reddy • Apr 11 '20

Let me try this as well.

allnulled • Apr 11 '20 • Edited

Sure. With it you can see the scrap in live, render Angular/React/Vue/xxx applications, and do asynchronous operations in both, client and local environments, and of course, passing data between them.

I wish I knew more about Electron... because web2os has a cool interface, but poorly got under the hood... but it has turned the de facto solution for my small scraps.