loading...

I built a web scraper with NodeJS

tomwritescode profile image Tom Whitaker Originally published at tomwritescode.com Updated on ・4 min read

This article was originally published on my personal blog - tomWritesCode

This post covers how I made a terminal command that goes to the VS Code marketplace page of rasoberryCandy, scrapes for the number of downloads it has and then returns it in a styled fashion using Figlet and Cheerio.

Large GIF 912x276

Going through a few tutorials that showed how to make a Node app that from my terminal will show me the number of downloads I have had, as well as doing the math on how long raspberryCandy has been released for.

I went through this article from scotch.io which used NodeJS, Cheerio and request-promise.


- request-promise - Makes the HTTP request which supports promises. The extension of the standard request promise.

- Cheerio - Helps to traverse the DOM letting us select parts of the page we would like to extract.


Figlet is a great package for making ASCII art from text and has a library of different ways to do so. Chalk is used to colour the text of the response in the terminal which means I could keep the raspberryCandy colours being returned.

The first part of my code is making the piece that looks after displaying when raspberryCandy was released and how many days it has been released for which sits below the number of downloads in the command line result. To do this all I used was the Date function built into Javascript, making one of the dates the release which I already knew and wasnt going to change just put it straight in. And then another date which is the date and time that the function is called.


const release = new Date("February 19, 2019 11:46:11");
const current = new Date();

Following that I made a function which calculates how many days raspberryCandy has been out. The function works in two steps, firslty it subtracts the release date from the current date which then gives us the amount of time bwtween the dates but in Milliseconds (which isn't the most readable).


function dateDiff() {
       let difference = current - release;
 }     

Step two of the function takes the result in milliseconds and divides it by the math that converts milliseconds to days. This is wrapped in a Math.round() function to give whole days as a return without a decimal.


function dateDiff() {
    let difference = current - release;
    return Math.round(difference / (60 * 60 * 24 * 1000));

}


Now for the main party trick, scraping the marketplace page for raspberryCandy to get the number of downloads. This is where we use the request-promise package and give it the target url which in this case is the page on the Visual Studio Marketplace.

Inspecting the page prior to building this I found that the piece I was after was a span with the class name 'downloads-text'. Using Cheerio I can target the span with the right class name from the HTML document and then return it as plain text. Below I have laid out the basic structure of what it is doing.


request-promise(URL).then(function(html) 
  {
    cheerio("span.downloads-text", html).text()
  }

Now that we have the heavy lifting out of the way the structure comes into play. In my example I have wrapped each of the console.log() returns with the Chalk package allowing me to set the colour, in this case I am going with the Purple and Aqua colours of raspberryCandy.


console.log(
  chalk.hex("#e592faff").bold(" WOW! raspberryCandy has:")
);

The other extra piece is using Figlet which gives me the ASCII art, this wraps around the Cherrio function and takes the properties of what font to use, as well as the horizontal and vertical layouts.


console.log(
  chalk.hex("#00feff")(
    figlet.textSync($("span.downloads-text", html).text(), {
      font: "Big",
      horizontalLayout: "default",
      verticalLayout: "default"
    })
  )
);

Now that it's all pieced together the last thing left to do was to link it to my terminal as a single command rather than having to go to the file system and run the JS file. To do this, we add a line to the top of the file which will let us add our command to the package.json.


#!/usr/bin/env node

This line will let it be triggered as a Node command and run. This will also let the next part we add to the package.json work. Inside the package.json I have added:


"bin": {
  "raspberry": "./raspberryScraper.js"
},

Final step is to run npm link in the terminal while in the folder. This takes the bin command, in this case "raspberry" and then launches the command "./raspberryScraper.js". This is similar to any short hand command like npm run start or gatsby develop.


Links

GitHub logo tomWritesCode / raspberryScraper

NodeJS web scraper to show how many downloads my raspberryCandy VS Code theme has in the terminal.

raspberryScraper

NodeJS web scraper to show how many downloads my raspberryCandy VS Code theme has in the terminal.


raspberryScraper


After building my VS Code theme raspberryCandy I wanted an easier way of checking how many downloads I have and was also curious about how to use NodeJS as a web scraper.

It uses Cherrio and request-promise as well as Figlet and Chalk for styling the terminal.

- Scotch.io article I got most of the resources from.

- request-promise GitHub page

- Cheerio GitHub page

- Figlet NPM page

Discussion

pic
Editor guide