Hello and welcome to the second post in this series on web scraping with Puppeteer. If you missed the first post you can check it out here. In this post we'll pick up where we left off and scrape some weather data from weather.com. The current goal is to scrape the 10-day forecast of Austin, Texas. Feel free to swap out Austin for your favorite city.
Picking up where we left off
Our scrape
function that we created in the previous post looks like this:
async function scrape() {
const browser = await puppeteer.launch({ dumpio: true });
const page = await browser.newPage();
await page.goto("https://weather.com/weather/tenday/l/Austin+TX");
const weatherData = await page.evaluate(() =>
Array.from(
document.querySelectorAll(".DaypartDetails--DayPartDetail--2XOOV"),
(e) => ({
date: e.querySelector("h3").innerText,
})
)
);
await browser.close();
return weatherData;
}
const scrapedData = await scrape();
console.log(scrapedData);
Let's now add to the weatherData. In addition to the innerText
of the h3
, we'll get the high temperature, the low temperature, and the precipitation percentage for the day.
Let's have a look at how we can do that:
const weatherData = await page.evaluate(() =>
Array.from(
document.querySelectorAll(".DaypartDetails--DayPartDetail--2XOOV"),
(e) => ({
date: e.querySelector("h3").innerText,
highTemp: e.querySelector(".DetailsSummary--highTempValue--3PjlX")
.innerText,
lowTemp: e.querySelector(".DetailsSummary--lowTempValue--2tesQ")
.innerText,
precipitationPercentage: e.querySelector(
".DetailsSummary--precip--1a98O"
).innerText,
})
)
As you can see I am adding three new properties to the object that's returned in the Array.from
mapping function. These properties are highTemp
, lowTemp
, and precipitationPercentage
. I found the class names by inspecting the document in the browser. These values seem to work, but only time will tell if something will have to be updated.
Let's now run node scraper.js
in the terminal and check out the results:
[
{
date: 'Tonight',
highTemp: '--',
lowTemp: '31°',
precipitationPercentage: '84%'
},
{
date: 'Thu 02',
highTemp: '41°',
lowTemp: '32°',
precipitationPercentage: '53%'
},
{
date: 'Fri 03',
highTemp: '55°',
lowTemp: '30°',
precipitationPercentage: '6%'
},
{
date: 'Sat 04',
highTemp: '57°',
lowTemp: '40°',
precipitationPercentage: '7%'
},
{
date: 'Sun 05',
highTemp: '64°',
lowTemp: '47°',
precipitationPercentage: '9%'
},
{
date: 'Mon 06',
highTemp: '71°',
lowTemp: '58°',
precipitationPercentage: '14%'
},
{
date: 'Tue 07',
highTemp: '68°',
lowTemp: '50°',
precipitationPercentage: '54%'
},
{
date: 'Wed 08',
highTemp: '60°',
lowTemp: '47°',
precipitationPercentage: '40%'
},
{
date: 'Thu 09',
highTemp: '60°',
lowTemp: '42°',
precipitationPercentage: '52%'
},
{
date: 'Fri 10',
highTemp: '62°',
lowTemp: '38°',
precipitationPercentage: '17%'
},
{
date: 'Sat 11',
highTemp: '59°',
lowTemp: '42°',
precipitationPercentage: '11%'
},
{
date: 'Sun 12',
highTemp: '64°',
lowTemp: '48°',
precipitationPercentage: '15%'
},
{
date: 'Mon 13',
highTemp: '67°',
lowTemp: '51°',
precipitationPercentage: '24%'
},
{
date: 'Tue 14',
highTemp: '71°',
lowTemp: '51°',
precipitationPercentage: '24%'
},
{
date: 'Wed 15',
highTemp: '70°',
lowTemp: '50°',
precipitationPercentage: '21%'
}
]
Very cool. We're getting the values I'd expect to get.
GitHub Repo
I've set up a GitHub repository for this project. You can find the link here. Feel free to fork/clone this repository and play around. If you're not too comfortable with using git
, there's a plethora of resources out there. If you'd be interested in a tutorial for noobs, please let me know in the comment section.
Wrapping up
In this post we were able to scrape a bit more weather forecast data and return it in our scrape
function. In the next post I'll show you how to create a GitHub Action that will run the scrape
function once a day and save the scraped weather data in a .json
file in the same GitHub repository.
Top comments (0)