Python is the most popular language for web scraping. This language is widely used in making automated web scrapers to extract the precious data available on the internet for various purposes like data analysis, SEO monitoring, news, and media monitoring.
In this post, we’ll be creating a web scraping tool to scrape Google News Results with Python and the Google News API.
Why Scrape Google News Results?
Scraping Google News Results provide several benefits, including:
Brand Monitoring — Scraping Google News Results can help you to monitor the media and public perspective about your brand. It helps you to keep a check on any issue or negative publicity about your company that can affect your business.
Keeps You Updated — News Results keep you updated about the current political events occurring in the world. It also helps you to keep a check on the current advancements taking place in your areas of interest.
Market Research — Google News Results can help you study various historical trends in your industry and the data can also be used for research-based purposes like consumer sentiment, competitor analysis, etc.
Competitor Analysis — You can use the news data to monitor the latest developments and new product launches by your competitor. You can also study their media strategy to identify any loopholes in your tactics while dealing with media marketing.
Let’s Start Scraping Google News Using Python
In this post, we’ll be making a Python script to extract the first 100 Google News results including, the title, description, link, source, and date.
Requirements
To scrape Google News, we will be installing these two libraries:
Beautiful Soup — Used for parsing the raw HTML data.
Requests — Used for making HTTP requests.
Or you can directly install these libraries by running the below command in your terminal:
pip install requests
pip install beautifulsoup4
Process:
Before starting, I assume you have set up your Python project on your device. So, open the project file in your respective code editor and import these two libraries, which we will use in this tutorial.
import requests
import json
from bs4 import BeautifulSoup
Now, let’s create a function to scrape the Google News Results:
def getNewsData():
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"
}
response = requests.get("https://www.google.com/search?q=amazon&gl=us&tbm=nws&num=100", headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
news_results = []
First, we set the header to the User Agent, which will help us to make our scraping bot make an organic visit to Google. Then we made an HTTP request on the target URL using the request library we imported above and stored the extracted HTML in the response variable. And in the last line, we created an instance of the BeautifulSoup library to parse the HTML data.
Let us now search for the tags from the HTML to extract the required data.
If you inspect the HTML file, you will find every result or news article is inside this div.SoaBEf
tag. And after further searching in the container, you will find the tag for the title as div.mCBkyc
, the description as GI74Re
, and the source as .NUnG9d span
and the date as ZE0LJd span
.
Now, add these tags in your parser:
for el in soup.select("div.SoaBEf"):
news_results.append({
"link": el.find("a")["href"],
"title": el.select_one("div.mCBkyc").get_text(),
"snippet": el.select_one(".GI74Re").get_text(),
"date": el.select_one(".ZE0LJd span").get_text(),
"source": result.css(".NUnG9d span").get()_text(),
})
print(json.dumps(news_results, indent=2))
getNewsData()
Ok, so let us now run this code in our terminal to see the results:
[
{
"link": "https://people.com/home/housewarming-gifts-amazon-march-2023/",
"title": "15 Beautiful Housewarming Gifts Under $100 at Amazon",
"snippet": "Peak moving season has arrived, and you may have a few housewarmings on the \nhorizon. Don't show up empty-handed \u2014 pick up one of these...",
"date": "6 hours ago",
"source": "People"
},
{
"link": "https://www.yahoo.com/lifestyle/zombie-pack-mask-amazon-deal-154108906.html",
"title": "'I look 7 years younger': The popular Zombie Mask is down to $17 at Amazon, \ntoday only",
"snippet": "Daydreaming about having tighter skin? How about smaller pores? We're \nalways on the lookout for beauty products that can help get back that...",
"date": "2 hours ago",
"source": "Yahoo"
},
{
"link": "https://people.com/home/front-porch-spring-decor-under-25-amazon/",
"title": "10 Under-$25 Amazon Finds to Upgrade Your Front Porch This Spring",
"snippet": "If you're looking to give your front porch a spring-ready makeover, check \nout Amazon's selection of front porch decor.",
"date": "4 hours ago",
"source": "People"
},
With Google News API
If you don’t want to code and maintain the scraper in the long run then you can try our Google News API to scrape News Results.
We also offer 100 free requests on the first sign-up.
After getting registered on our website, you will get an API Key. Embed this API Key in the below code, you will be able to scrape Google News Results at a much faster speed.
import requests
payload = {'api_key': 'APIKEY', 'q':'football' , 'gl':'us'}
resp = requests.get('https://api.serpdog.io/news', params=payload)
print (resp.text)
Conclusion:
In this tutorial, we learned to scrape Google News Results using Node JS. Feel free to message me anything you need clarification on. Follow me on Twitter. Thanks for reading!
Additional Resources
Frequently Asked Questions
Q. Can you scrape Google News Results?
Yes, Serpdog API can scrape Google News Results and can withstand millions of API calls per second without any problem of blockage and CAPTCHAs.
Top comments (1)
Very nice explanation!
I would like to introduce Newsdata.io News API which provides real time breaking news and historical news from various reliable and popular sources.
This is the best alternative for Google News API .