DEV Community

Željko Šević
Željko Šević

Posted on • Edited on • Originally published at sevic.dev

Web scraping with cheerio

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

Prerequisites

  • cheerio package is installed

  • HTML page is retrieved via an HTTP client

Usage

  • create a scraper object with load method by passing HTML content as an argument
    • set decodeEntities option to false to preserve encoded characters (like &) in their original form
  const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
Enter fullscreen mode Exit fullscreen mode
  • find DOM elements by using CSS-like selectors
  const items = $('.item');
Enter fullscreen mode Exit fullscreen mode
  • iterate through found elements using each method
  items.each((index, element) => {
    // ...
  });
Enter fullscreen mode Exit fullscreen mode

access element content using specific methods

  • text $(element).text()
  • HTML $(element).html()
  • attributes
    • all $(element).attr()
    • specific one $(element).attr('href')
  • child elements
    • first $(element).first()
    • last $(element).last()
    • all $(element).children()
    • specific one $(element).find('a')
  • siblings
    • previous $(element).prev()
    • next $(element).next()

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Course

Build your SaaS in 2 weeks - Start Now

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay