Scraping the Academy Award winners listed on Wikipedia with
cheerio
and saving them to a CSV file.
Today, a simple demonstration of how to scrape data using JavaScript with the cheerio library. For this, we'll use the list of Academy Award winners directly from Wikipedia.
First, install the necessary packages:
npm install cheerio axios
The URL used is:
const url = 'https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films';
Next, we'll load the HTML using the load
function and prepare two variables to hold the columns and the necessary information from the table:
const { data: html } = await axios.get(url);
const $ = cheerio.load(html);
const theadData = [];
const tableData = [];
Now we'll select and manipulate the elements as we traverse the DOM, which are Cheerio
objects returned in the $
function:
$('tbody').each((i, column) => {
const columnData = [];
$(column)
.find('th')
.each((j, cell) => {
columnData.push($(cell).text().replace('\n',''));
});
theadData.push(columnData)
})
tableData.push(theadData[0])
$('table tr').each((i, row) => {
const rowData = [];
$(row)
.find('td')
.each((j, cell) => {
rowData.push($(cell).text().trim());
});
if (rowData.length) tableData.push(rowData)
})
Glad you still know jQuery...
Finally, save the data as it is, even without processing the data 😅 into a .csv
spreadsheet with fs.writeFileSync
.
Note, I used ";" as the delimiter.
const csvContent = tableData
.map((row) => row.join(';'))
.join('\n');
fs.writeFileSync('academy_awards.csv', csvContent, 'utf-8');
running
node scraper.js
I’ve written other tutorials here on dev.to about scraping, with Go and Python, and If this article helped you or you enjoyed it, consider contributing:
Top comments (0)