DEV Community

Cover image for How to build a price scraper using JavaScript and cheerio.js
Hugo
Hugo

Posted on

How to build a price scraper using JavaScript and cheerio.js

Motivation

A few months ago I decided to build a computer, and I wanted to save some money, so I started using Keepa, a Chrome extension that monitors Amazon prices and notifies you when the price of a product drops below of the price you defined. The problem is that I was buying the pieces on Amazon, and another couple of pages.

So I decided to create something similar to Keepa using Node, Cheerio.js and React.

PC part price Scraper project

Description

There is a cron that runs every hour and goes and checks the prices of the products that you are tracking. If the price obtained is less than the desired price, an email is sent to you. Price logs are also stored to show historical price changes.

Let's start

Today I will teach you to make a simple scraper similar to the one I did. I will skip a couple of things to make this easier to understand.

Note:
I will use CyberPuerta which is a Mexican page to buy computer parts, and one of the pages I used in the project, but you can use the one you like.

Let's imagine you want to buy a cool but expensive monitor, something like this one:
really cool monitor

The first thing you need to start the scraper is the link of the page you want to get information from, in this case, it is the following:

https://www.cyberpuerta.mx/Computo-Hardware/Monitores/Monitores/Monitor-Gamer-Curvo-ASUS-ROG-Strix-XG35VQ-LED-35-Quad-HD-Ultra-Wide-FreeSync-100Hz-HDMI-Negro-Gris-Rojo.html?nosto=shop_api_home0_1

Then, we need to obtain the information that is important to us, in this case, it is only the price, and we need a way to identify that piece of information through an HTML selector, so we open our developer tools and press this little icon to select an element in the page and inspect it, and finally we click on the price.

inspect element

and we get this:

<span class="priceText">$ 16,489.00</span>
Enter fullscreen mode Exit fullscreen mode

As I said before, we need a way to identify the price, and not other things, using a selector. The best thing would be to use an id since it is unique but in this case, all we have is the class, so we will have to identify the price that way.

Sometimes it is easy because the class turns out to be unique, as in this page, but if it is not your case, something you can do to identify the element is to right-click on the element and select copy > copy selector which gives you this:

#productinfo > form > div.detailsInfo.clear > div:nth-child(1) > div:nth-child(2) > div > div:nth-child(4) > div.medium-7.cell.cp-pr > div > div > div.mainPrice > span.priceText

unique identifier

Now comes the fun part, let’s code.

Installation

We need to install three dependencies:

  • axios to make HTTP requests (npm install axios)
  • cheerios, a library similar to jQuery but for the server. ( npm install cheerio)
  • node-cron to run the scraper every hour (npm install --save node-cron)

First, we need to have the link we want to scrap, the desire price and the selector for the price, so we create a few variables to store them.

const productPage = 'https://www.cyberpuerta.mx/Computo-Hardware/Monitores/Monitores/Monitor-Gamer-Curvo-ASUS-ROG-Strix-XG35VQ-LED-35-Quad-HD-Ultra-Wide-FreeSync-100Hz-HDMI-Negro-Gris-Rojo.html?nosto=shop_api_home0_1'
const desiredPrice = 15000
const selector = '.priceText';
Enter fullscreen mode Exit fullscreen mode

Next, we need to get the page to start scraping it, so we make an HTTP request to obtain the HTML of the page

const axios = require('axios');
/* ... */
async function getHTML (url) {
  const { data: html } = await axios.get(url).catch(() => {
    console.log("Couldn't get the page ☹️")
  })
  return html
}
Enter fullscreen mode Exit fullscreen mode

Now we have the HTML, we need to find the price, here is where the selector comes handy:

const cheerio = require('cheerio');
/* ... */
function scrapPrice(html) {
  const $ = cheerio.load(html); //First you need to load in the HTML
  const price = $(selector)
    .text() // we get the text
    .trim();
    return price;
}
Enter fullscreen mode Exit fullscreen mode

if we run this function we get this $ 16,489.00, but if we want to compare this price we need it to be a number, so I found this function to convert a currency string to a number

const currencyStringToNumber = currency => Number(currency.replace(/[^0-9.-]+/g, ''));
Enter fullscreen mode Exit fullscreen mode

Finally, we need to schedule a task using node-cron. The app will be running in the background and checking the price every hour.

const cron = require('node-cron')

/* ... */
cron.schedule('0 * * * *', async () => {
  console.log('running a task every hour ⏲️');
  const html = await getHTML(productPage).catch(console.log);
  const currentPrice = currencyStringToNumber(scrapPrice(html));
  if (currentPrice < desiredPrice) {
    console.log('Congratulations! you just saved some bucks 💵');
  }
});

Enter fullscreen mode Exit fullscreen mode

And that’s it, you have a simple, but powerful scraper. You can add more logic or get any other kind of data now that you know the basics.

If you have any question please let me know

And here is the code if need it.

Top comments (4)

Collapse
 
alex24409331 profile image
alex24409331

awesome tutorial, you saved me a lot of time, thank you for your post it is very clear and easy. Also as newbie in WooCommerce eCommerce i am using e-scraper.com/woocommerce/ to scrape all product data from my supplier sites and other sources. It helps me a lot. maybe it helps somebody too.

Collapse
 
hugoliconv profile image
Hugo

Thanks Alex! I'm glad it was useful to you :D

Collapse
 
ilong3 profile image
ilong3

when i run the scraper this is what i got
Image description

Collapse
 
ilong3 profile image
ilong3

please can somebody help me out?