DEV Community

Piyush Kumar Das
Piyush Kumar Das

Posted on

1

Web Scrapper using NodeJS, Express, Cheerio and Axios

Web Scrapper

A web scraper is a software tool or program that automates the process of collecting data from websites. It uses automated scripts or bots to extract data from web pages by reading and analyzing the HTML code of the page. Web scrapers can be used to extract a wide range of data, such as product prices, reviews, social media posts, and more.

Web scraping has become increasingly popular in recent years as a means of gathering data for research, market analysis, and business intelligence. However, it is important to note that some websites explicitly prohibit web scraping in their terms of service, and scraping certain types of data may be illegal in some jurisdictions. As such, it is important to ensure that you are not violating any laws or policies before using a web scraper.

Here we will create a very basic version of a web scrapper which will allow us to scrap HTML data from websites.

Steps

  • Make sure you have node installed with the latest version.

node --version

  • Start by creating a directory and run npm init , this
    will create a package.json file which will help in

    dependency management and script management, create an

    index.js file next.

  • Install four packages using npm:

  • npm i express

  • npm i cheerio

  • npm i axios

  • npm i nodemon

  • Now in your package.json, change the "start" in the "scripts" to nodemon index.js. Your package.json will start looking something like this.

Image json-file

  • Now in your index.js, load the modules using:
const express = require('express')
const cheerio = require('cheerio')
const axios = require('axios')
Enter fullscreen mode Exit fullscreen mode
  • Create a listening port like this:
const PORT = 8000
const app = express()
app.listen(PORT , ()=> console.log(`Server running on port ${PORT}`))
Enter fullscreen mode Exit fullscreen mode
  • Now take the url of the website which you want to scrape and store the url in any variable.

  • Then using axios create a promise and url as its parameters.

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const articles = []
        $('li', html).each(function() {
            const title = $(this).text()
            const url = $(this).find('a').attr('href') 
            articles.push({
                title,
                url
            })
        })
        console.log(articles)
    }).catch(err => console.log('Error occured'))
Enter fullscreen mode Exit fullscreen mode
  • Here articles is an array which will store the scrapped data. In the above code I have scrapped a website to get data of all the

  • tag and if they have any tag then the links.
  • Finally add all the data to the articles array.

  • The final code would start appearing like this :

Image finale-code

  • Now in the terminal run npm start to see the results.

Output of above code:

Image output

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

AWS GenAI LIVE!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️