DEV Community

loading...

How to make a web scraper with JavaScript

The Vik
・2 min read

In this blog I will teach how to make a web scraper with axios and cheerio.

Source Code

const axios = require('axios')
const cheerio = require('cheerio')
// Replace the url with your url
const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1'

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const statsTable = $('.statsTableContainer > tr')
        const statsData = []

        statsTable.each(function() {
            const rank = $(this).find('.rank > strong').text()
            const playerName = $(this).find('.playerName > strong').text()
            const nationality = $(this).find('.playerCountry').text()
            const mainStat = $(this).find('.mainStat').text()
            statsData.push({
                rank,
                playerName,
                nationality,
                mainStat
            })
        })
        // Will print the collected data
        console.log(statsData)
    })
    // In case of any error it will print the error
    .catch(console.error)
Enter fullscreen mode Exit fullscreen mode

Wosh

thats a lot of code lets get it one by one

npm install axios cheerio --save
Enter fullscreen mode Exit fullscreen mode

to install or of the required dependencies

const axios = require('axios')
const cheerio = require('cheerio')
Enter fullscreen mode Exit fullscreen mode

this will import those installed dependencies

const url = 'https://www.premierleague.com/stats/top/players/goals?se=-1&cl=-1&iso=-1&po=-1?se=-1'
Enter fullscreen mode Exit fullscreen mode

this is the url from which we will scrap the data, you can change
it if you want but will have to change more things then

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const statsTable = $('.statsTableContainer > tr')
        const statsData = []
    }
Enter fullscreen mode Exit fullscreen mode

at the first line we are calling axios and url we are then adding .then function and passing response in it.
then we are making a const named html and passing response.data
if you now use

console.log(html)

then it will print the whole html code of the website.
okay so now we are making a const named $ and then loading the html with cheerio.
now making a const name statsTable and passing ( with $ = cheerio )the class of the div from which we are going to scrap the data.
now are are making a statsData in which we will store the scraped data.


statsTable.each(function() {
            // If you replaced the url then you have to replace these too
            const rank = $(this).find('.rank > strong').text()
            const playerName = $(this).find('.playerName > strong').text()
            const nationality = $(this).find('.playerCountry').text()
            const mainStat = $(this).find('.mainStat').text()
            statsData.push({
                rank,
                playerName,
                nationality,
                mainStat
            })
        }) 

//  this code should be inside .then(responde => {}) which be made above
Enter fullscreen mode Exit fullscreen mode

okay now we are just finding the specific div to scrap the data and then converting it to text using .text()
also then we are pushing those specific div's text to statsData which we also made above.

now we have to just use

console.log(statsData) // inside .then(responde => {})

and it should show all of the scraped data.

and at last when everything is closed }) we will

.catch(console.error)

which will print the error if we have one and done.

this is my first time explaining a code so idk how I did.

THANKS

Discussion (5)

Collapse
lukeshiru profile image
LUKESHIRU

Nice!

The only suggestion I'll do is to use node-fetch instead of axios (axios is kinda an overkill for this scenario), or if you really prefer axios API. you can use redaxios which is way lighter. I would also prefer to use DOMParser instead of cheerio, but sadly, afaik, there is no DOMParser implementation in node, so that could be useful on a browser implementation only :'(

Thanks for putting this together.

Cheers!

Collapse
webdevken profile image
Web Dev Ken

Showing the full code at the beginning was very good to me for scanning through and note questions in order to get them answered reading your post.

Your scraper might work now, but is very static because it is fundamentally based on the CSS classes that premierleague is using. Some sites might even have auto generated classes whenever you request a new page. So maybe adjusting the scraper to look for HTML structure might mitigate this issue a little. Could also change in future however.

Thanks for the blog!

Collapse
heheprogrammer profile image
The Vik Author

Umm yeah that correct, I will make a new version later

Collapse
jfbrennan profile image
Jordan Brennan • Edited

After the first three backticks type "javascript" and your code will get syntax highlighting

const foo = 'bar'
Enter fullscreen mode Exit fullscreen mode
Collapse
heheprogrammer profile image
The Vik Author

Oh really thanks, I didnt know that we got change the syntax :)