loading...
Cover image for How to build a price scraper using JavaScript and cheerio.js

How to build a price scraper using JavaScript and cheerio.js

hugoliconv profile image Hugo ・4 min read

Motivation

A few months ago I decided to build a computer, and I wanted to save some money, so I started using Keepa, a Chrome extension that monitors Amazon prices and notifies you when the price of a product drops below of the price you defined. The problem is that I was buying the pieces on Amazon, and another couple of pages.

So I decided to create something similar to Keepa using Node, Cheerio.js and React.

PC part price Scraper project

Description

There is a cron that runs every hour and goes and checks the prices of the products that you are tracking. If the price obtained is less than the desired price, an email is sent to you. Price logs are also stored to show historical price changes.

Let's start

Today I will teach you to make a simple scraper similar to the one I did. I will skip a couple of things to make this easier to understand.

Note:
I will use CyberPuerta which is a Mexican page to buy computer parts, and one of the pages I used in the project, but you can use the one you like.

Let's imagine you want to buy a cool but expensive monitor, something like this one:
really cool monitor

The first thing you need to start the scraper is the link of the page you want to get information from, in this case, it is the following:

https://www.cyberpuerta.mx/Computo-Hardware/Monitores/Monitores/Monitor-Gamer-Curvo-ASUS-ROG-Strix-XG35VQ-LED-35-Quad-HD-Ultra-Wide-FreeSync-100Hz-HDMI-Negro-Gris-Rojo.html?nosto=shop_api_home0_1

Then, we need to obtain the information that is important to us, in this case, it is only the price, and we need a way to identify that piece of information through an HTML selector, so we open our developer tools and press this little icon to select an element in the page and inspect it, and finally we click on the price.

inspect element

and we get this:

<span class="priceText">$ 16,489.00</span>

As I said before, we need a way to identify the price, and not other things, using a selector. The best thing would be to use an id since it is unique but in this case, all we have is the class, so we will have to identify the price that way.

Sometimes it is easy because the class turns out to be unique, as in this page, but if it is not your case, something you can do to identify the element is to right-click on the element and select copy > copy selector which gives you this:

#productinfo > form > div.detailsInfo.clear > div:nth-child(1) > div:nth-child(2) > div > div:nth-child(4) > div.medium-7.cell.cp-pr > div > div > div.mainPrice > span.priceText

unique identifier

Now comes the fun part, let’s code.

Installation

We need to install three dependencies:

  • axios to make HTTP requests (npm install axios)
  • cheerios, a library similar to jQuery but for the server. ( npm install cheerio)
  • node-cron to run the scraper every hour (npm install --save node-cron)

First, we need to have the link we want to scrap, the desire price and the selector for the price, so we create a few variables to store them.

const productPage = 'https://www.cyberpuerta.mx/Computo-Hardware/Monitores/Monitores/Monitor-Gamer-Curvo-ASUS-ROG-Strix-XG35VQ-LED-35-Quad-HD-Ultra-Wide-FreeSync-100Hz-HDMI-Negro-Gris-Rojo.html?nosto=shop_api_home0_1'
const desiredPrice = 15000
const selector = '.priceText';

Next, we need to get the page to start scraping it, so we make an HTTP request to obtain the HTML of the page

const axios = require('axios');
/* ... */
async function getHTML (url) {
  const { data: html } = await axios.get(url).catch(() => {
    console.log("Couldn't get the page ☹️")
  })
  return html
}

Now we have the HTML, we need to find the price, here is where the selector comes handy:

const cheerio = require('cheerio');
/* ... */
function scrapPrice(html) {
  const $ = cheerio.load(html); //First you need to load in the HTML
  const price = $(selector)
    .text() // we get the text
    .trim();
    return price;
}

if we run this function we get this $ 16,489.00, but if we want to compare this price we need it to be a number, so I found this function to convert a currency string to a number

const currencyStringToNumber = currency => Number(currency.replace(/[^0-9.-]+/g, ''));

Finally, we need to schedule a task using node-cron. The app will be running in the background and checking the price every hour.

const cron = require('node-cron')

/* ... */
cron.schedule('0 * * * *', async () => {
  console.log('running a task every hour ⏲️');
  const html = await getHTML(productPage).catch(console.log);
  const currentPrice = currencyStringToNumber(scrapPrice(html));
  if (currentPrice < desiredPrice) {
    console.log('Congratulations! you just saved some bucks 💵');
  }
});

And that’s it, you have a simple, but powerful scraper. You can add more logic or get any other kind of data now that you know the basics.

If you have any question please let me know

And here is the code if need it.

Posted on by:

hugoliconv profile

Hugo

@hugoliconv

Sometimes I code, sometimes I read, sometimes I write, but I'm always learning.

Discussion

pic
Editor guide