DEV Community

Piyush Kumar Das
Piyush Kumar Das

Posted on

Web Scrapper using NodeJS, Express, Cheerio and Axios

Web Scrapper

A web scraper is a software tool or program that automates the process of collecting data from websites. It uses automated scripts or bots to extract data from web pages by reading and analyzing the HTML code of the page. Web scrapers can be used to extract a wide range of data, such as product prices, reviews, social media posts, and more.

Web scraping has become increasingly popular in recent years as a means of gathering data for research, market analysis, and business intelligence. However, it is important to note that some websites explicitly prohibit web scraping in their terms of service, and scraping certain types of data may be illegal in some jurisdictions. As such, it is important to ensure that you are not violating any laws or policies before using a web scraper.

Here we will create a very basic version of a web scrapper which will allow us to scrap HTML data from websites.

Steps

  • Make sure you have node installed with the latest version.

node --version

  • Start by creating a directory and run npm init , this
    will create a package.json file which will help in

    dependency management and script management, create an

    index.js file next.

  • Install four packages using npm:

  • npm i express

  • npm i cheerio

  • npm i axios

  • npm i nodemon

  • Now in your package.json, change the "start" in the "scripts" to nodemon index.js. Your package.json will start looking something like this.

Image json-file

  • Now in your index.js, load the modules using:
const express = require('express')
const cheerio = require('cheerio')
const axios = require('axios')
Enter fullscreen mode Exit fullscreen mode
  • Create a listening port like this:
const PORT = 8000
const app = express()
app.listen(PORT , ()=> console.log(`Server running on port ${PORT}`))
Enter fullscreen mode Exit fullscreen mode
  • Now take the url of the website which you want to scrape and store the url in any variable.

  • Then using axios create a promise and url as its parameters.

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const articles = []
        $('li', html).each(function() {
            const title = $(this).text()
            const url = $(this).find('a').attr('href') 
            articles.push({
                title,
                url
            })
        })
        console.log(articles)
    }).catch(err => console.log('Error occured'))
Enter fullscreen mode Exit fullscreen mode
  • Here articles is an array which will store the scrapped data. In the above code I have scrapped a website to get data of all the

  • tag and if they have any tag then the links.
  • Finally add all the data to the articles array.

  • The final code would start appearing like this :

Image finale-code

  • Now in the terminal run npm start to see the results.

Output of above code:

Image output

Top comments (0)