A JavaScript scraper for the Wikipedia Academy Award List.

#javascript #webscraping

Scraping the Academy Award winners listed on Wikipedia with cheerio and saving them to a CSV file.

Today, a simple demonstration of how to scrape data using JavaScript with the cheerio library. For this, we'll use the list of Academy Award winners directly from Wikipedia.

First, install the necessary packages:

npm install cheerio axios

The URL used is:

const url = 'https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films';

Next, we'll load the HTML using the load function and prepare two variables to hold the columns and the necessary information from the table:

const { data: html } = await axios.get(url);
const $ = cheerio.load(html); 

const theadData = [];
const tableData = [];

Now we'll select and manipulate the elements as we traverse the DOM, which are Cheerio objects returned in the $ function:

$('tbody').each((i, column) => { 
    const columnData = [];
    $(column)
      .find('th')
      .each((j, cell) => {
      columnData.push($(cell).text().replace('\n',''));
    });
    theadData.push(columnData)
  }) 

  tableData.push(theadData[0]) 

$('table tr').each((i, row) => {
    const rowData = []; 
    $(row)
      .find('td')
      .each((j, cell) => {
        rowData.push($(cell).text().trim());
      });

    if (rowData.length) tableData.push(rowData)
  })

Glad you still know jQuery...

Finally, save the data as it is, ~~even without processing the data 😅~~ into a .csv spreadsheet with fs.writeFileSync.

Note, I used ";" as the delimiter.

const csvContent = tableData
    .map((row) => row.join(';')) 
    .join('\n');

fs.writeFileSync('academy_awards.csv', csvContent, 'utf-8');

running

node scraper.js

I’ve written other tutorials here on dev.to about scraping, with Go and Python, and If this article helped you or you enjoyed it, consider contributing:

DEV Community

A JavaScript scraper for the Wikipedia Academy Award List.

Top comments (0)