DEV Community

Cover image for Cheerio & ChatGPT: A Primer on Web Scraping with Node.js
ByteBricks.ai
ByteBricks.ai

Posted on

Cheerio & ChatGPT: A Primer on Web Scraping with Node.js

Hey there fellow data digger (we dig at bytebricks.ai)! The web is a treasure trove of information waiting to be unearthed. And guess what? With a sprinkle of Cheerio and a dash of Node.js, you can turn your code into a data-gathering wizard.

But, hey, let's add a fun twist to it! Ever heard of ChatGPT? 😂 This buddy can take a peek at HTML and whip up the Cheerio code you need to grab that data. Let’s dive into this delicious bowl of Cheerio (pun totally intended) and see how we can make web scraping a breeze.

Prepping Up

Before we start, make sure Node.js is comfortably nestled in your machine. Create a cozy little space for your project, hop into that directory via the terminal, and kickstart a new Node.js project with a simple:

npm init -y
Enter fullscreen mode Exit fullscreen mode

Next up, let’s invite Cheerio and Axios (our trusty HTTP client) to the party with:

npm install cheerio axios
Enter fullscreen mode Exit fullscreen mode

Snagging that HTML

Alright, with the gang all set, let’s nab the HTML of the website we’re eyeing. For this little adventure, we’re gonna pretend we’re extracting goodies from a make-believe e-commerce site.

const axios = require('axios');

async function fetchHTML(url) {
  const { data } = await axios.get(url);
  return data;
}

const url = 'https://fictional-ecommerce-site.com';
fetchHTML(url).then(console.log);
Enter fullscreen mode Exit fullscreen mode

Let Cheerio Lead the Way

Got the HTML? Sweet! Now, let’s hand it over to Cheerio for some parsing action.

const cheerio = require('cheerio');

async function parseHTML(html) {
  const $ = cheerio.load(html);

  // Let’s pretend each product is nestled
  $('.product').each((i, element) => {
    const title = $(element).find('.product-title').text();
    const price = $(element).find('.product-price').text();

    console.log(`${title}: ${price}`);
  });
}

fetchHTML(url).then(html => parseHTML(html));
Enter fullscreen mode Exit fullscreen mode

See what we did there? It’s like we’re using jQuery!, but with the turbo engines of Node.js.

ChatGPT comes handy

Now for the cherry on top! ChatGPT can take a look at HTML and conjure up the Cheerio code you need to snatch that data. Just feed it the HTML, and voila, you’ve got your data extraction code ready to roll. It's like having a buddy who writes code while you munch on snacks!

Polishing Your Data Scooper

Crafting a web scraper is kinda like brewing the perfect cup of coffee. It needs a little tinkering to hit that sweet spot between speed and accuracy. With Cheerio, Node.js, and a little help from ChatGPT, you’ve got a solid start. Don’t forget to handle those pesky paginations, asynchronous loads, and rate limits to scrape like a pro!😎

Further

You need to make data useful! at bytebricks we build using a Laravel backend and a Vue front, we find that fast to market and Laravel using SQL have a low ongoing overhead cost! not to mention the magic of Eloquent as in this example of whereHas or the very easy integration of AWS SES!

Top comments (0)