DEV Community

Željko Šević
Željko Šević

Posted on • Edited on • Originally published at sevic.dev

Web scraping with cheerio

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

Prerequisites

  • cheerio package is installed

  • HTML page is retrieved via an HTTP client

Usage

  • create a scraper object with load method by passing HTML content as an argument
    • set decodeEntities option to false to preserve encoded characters (like &) in their original form
  const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
Enter fullscreen mode Exit fullscreen mode
  • find DOM elements by using CSS-like selectors
  const items = $('.item');
Enter fullscreen mode Exit fullscreen mode
  • iterate through found elements using each method
  items.each((index, element) => {
    // ...
  });
Enter fullscreen mode Exit fullscreen mode

access element content using specific methods

  • text $(element).text()
  • HTML $(element).html()
  • attributes
    • all $(element).attr()
    • specific one $(element).attr('href')
  • child elements
    • first $(element).first()
    • last $(element).last()
    • all $(element).children()
    • specific one $(element).find('a')
  • siblings
    • previous $(element).prev()
    • next $(element).next()

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Course

Build your SaaS in 2 weeks - Start Now

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)

Retry later
Retry later