Web Scrapper using NodeJS, Express, Cheerio and Axios

#javascript #node #express #npm

Web Scrapper

A web scraper is a software tool or program that automates the process of collecting data from websites. It uses automated scripts or bots to extract data from web pages by reading and analyzing the HTML code of the page. Web scrapers can be used to extract a wide range of data, such as product prices, reviews, social media posts, and more.

Web scraping has become increasingly popular in recent years as a means of gathering data for research, market analysis, and business intelligence. However, it is important to note that some websites explicitly prohibit web scraping in their terms of service, and scraping certain types of data may be illegal in some jurisdictions. As such, it is important to ensure that you are not violating any laws or policies before using a web scraper.

Here we will create a very basic version of a web scrapper which will allow us to scrap HTML data from websites.

Steps

Make sure you have node installed with the latest version.

node --version

Start by creating a directory and run npm init , this
will create a package.json file which will help in

dependency management and script management, create an

index.js file next.
Install four packages using npm:
npm i express
npm i cheerio
npm i axios
npm i nodemon
Now in your package.json, change the "start" in the "scripts" to nodemon index.js. Your package.json will start looking something like this.

Now in your index.js, load the modules using:

const express = require('express')
const cheerio = require('cheerio')
const axios = require('axios')

Create a listening port like this:

const PORT = 8000
const app = express()
app.listen(PORT , ()=> console.log(`Server running on port ${PORT}`))

Now take the url of the website which you want to scrape and store the url in any variable.
Then using axios create a promise and url as its parameters.

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const articles = []
        $('li', html).each(function() {
            const title = $(this).text()
            const url = $(this).find('a').attr('href') 
            articles.push({
                title,
                url
            })
        })
        console.log(articles)
    }).catch(err => console.log('Error occured'))

Here articles is an array which will store the scrapped data. In the above code I have scrapped a website to get data of all the
tag and if they have any tag then the links.
Finally add all the data to the articles array.
The final code would start appearing like this :

Now in the terminal run npm start to see the results.

Output of above code:

DEV Community

Web Scrapper using NodeJS, Express, Cheerio and Axios

Web Scrapper

Steps

Top comments (0)

Read next

Reactive vs. Ref in Vue 3: What’s the difference?

Node.js + ioredis + elasticache

Integrating Google Calendar API in Node.JS: A Guide to Event Creation and Meeting Scheduling

My React Journey: Day 11