Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.
Prerequisites
cheeriopackage is installedHTML page is retrieved via an HTTP client
Usage
- create a scraper object with
loadmethod by passing HTML content as an argument- set
decodeEntitiesoption to false to preserve encoded characters (like &) in their original form
- set
const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
- find DOM elements by using CSS-like selectors
const items = $('.item');
- iterate through found elements using
eachmethod
items.each((index, element) => {
// ...
});
access element content using specific methods
- text
$(element).text() - HTML
$(element).html() - attributes
- all
$(element).attr() - specific one
$(element).attr('href')
- all
- child elements
- first
$(element).first() - last
$(element).last() - all
$(element).children() - specific one
$(element).find('a')
- first
- siblings
- previous
$(element).prev() - next
$(element).next()
- previous
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.
Need help with your project?
Get personalized advice on your architecture, code, or career in a 45-minute 1-on-1 consultation.
Top comments (0)