In this tutorial, we'll walk through how to analyze the sentiment of news headlines using VADER — a fast and effective sentiment analysis model designed for social media text, which works well for news headlines too.
We will:
- Collect news data using HasData's Google News API.
- Use VADER to analyze the sentiment of each article's headline and snippet.
- Classify articles into positive, neutral, and negative categories.
- Save the results in a structured JSON format.
Table of Contents
- Introduction
- Setup
- Fetching Google News Data
- Sentiment Analysis with VADER
- Saving and Visualizing Results
- Full Code
- Next Steps
- Further Reading
Introduction
Understanding the sentiment of news headlines is essential for various applications like trend analysis, brand monitoring, and political sentiment tracking.
To perform sentiment analysis, we’ll use VADER (Valence Aware Dictionary and sEntiment Reasoner), which is particularly effective for short texts such as headlines and tweets. The key advantage of VADER is that it’s a lightweight, rule-based model that works well for social media and news data, making it an ideal choice for our case.
Setup
You will need the following tools and libraries:
- Python 3.x
-
requestsfor API calls -
nltkfor sentiment analysis -
matplotlibfor optional visualizations - JSON for saving results
pip install requests matplotlib nltk
You also need to download the VADER lexicon from NLTK:
import nltk
nltk.download('vader_lexicon')
Make sure to get your HasData API key from your HasData dashboard.
Fetching Google News Data
We’ll start by collecting news data from Google News using HasData's API. The goal is to fetch the headline and snippet from each article.
import requests
import json
API_KEY = "HASDATA-API-KEY"
params_raw = {
"q": "",
"gl": "us",
"hl": "en",
"topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB", # Example: Entertainment
}
params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}
resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()
In this example, we're fetching articles from the Entertainment section. You can change the topicToken to explore other categories like Business, Technology, etc.
Sentiment Analysis with VADER
Now that we have the data, let’s analyze the sentiment of each article’s headline and snippet. VADER returns a score between -1 and +1, where:
- Positive sentiment: +0.05 or higher
- Negative sentiment: -0.05 or lower
- Neutral sentiment: between -0.05 and +0.05
We'll classify the news articles based on these scores.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
grouped_news = {"positive": [], "neutral": [], "negative": []}
for item in data.get("newsResults", []):
highlight = item.get("highlight", {})
text_to_analyze = highlight.get("title", "") + " " + highlight.get("snippet", "")
score = sid.polarity_scores(text_to_analyze)
news_item = {
"title": highlight.get("title"),
"link": highlight.get("link"),
"snippet": highlight.get("snippet"),
"source_name": highlight.get("source", {}).get("name"),
"date": highlight.get("date"),
"thumbnail": highlight.get("thumbnail")
}
if score['compound'] >= 0.05:
grouped_news["positive"].append(news_item)
elif score['compound'] <= -0.05:
grouped_news["negative"].append(news_item)
else:
grouped_news["neutral"].append(news_item)
In this code:
-
Sentiment score: We analyze both the title and snippet of each news article using VADER's
polarity_scores. - Classification: Based on the compound score, we classify the article as positive, neutral, or negative.
- Group data: All articles are saved into one of these three categories.
Saving and Visualizing Results
Once the sentiment analysis is complete, we’ll save the results in a structured JSON format.
with open("news_sentiment.json", "w", encoding="utf-8") as f:
json.dump(grouped_news, f, ensure_ascii=False, indent=2)
Optionally, you can also visualize the distribution of positive, neutral, and negative articles using matplotlib.
import matplotlib.pyplot as plt
sentiments = ['positive', 'neutral', 'negative']
counts = [len(grouped_news['positive']), len(grouped_news['neutral']), len(grouped_news['negative'])]
plt.figure(figsize=(8, 6))
plt.bar(sentiments, counts, color=['green', 'gray', 'red'])
plt.title("News Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Articles")
plt.show()
This chart will give you a quick overview of how the news is distributed across positive, neutral, and negative sentiments.
Full Code
Here's the full code that combines everything, from fetching news data to performing sentiment analysis and visualizing the results. You can easily copy and run it:
import requests
import json
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import nltk
import re
# Download NLTK's VADER lexicon
nltk.download('vader_lexicon')
API_KEY = "HASDATA-API-KEY" # Replace with your HasData API Key
# Set parameters for the API request
params_raw = {
"q": "",
"gl": "us",
"hl": "en",
"topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB", # Example: Entertainment
}
params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}
# Fetch news data from Google News
resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()
# Initialize VADER SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
# Create a dictionary to store the categorized news
grouped_news = {"positive": [], "neutral": [], "negative": []}
# Process each article and classify based on sentiment
for item in data.get("newsResults", []):
highlight = item.get("highlight", {})
text_to_analyze = highlight.get("title", "") + " " + highlight.get("snippet", "")
score = sid.polarity_scores(text_to_analyze)
news_item = {
"title": highlight.get("title"),
"link": highlight.get("link"),
"snippet": highlight.get("snippet"),
"source_name": highlight.get("source", {}).get("name"),
"date": highlight.get("date"),
"thumbnail": highlight.get("thumbnail")
}
# Classify based on sentiment score
if score['compound'] >= 0.05:
grouped_news["positive"].append(news_item)
elif score['compound'] <= -0.05:
grouped_news["negative"].append(news_item)
else:
grouped_news["neutral"].append(news_item)
# Save the results in a JSON file
with open("news_sentiment.json", "w", encoding="utf-8") as f:
json.dump(grouped_news, f, ensure_ascii=False, indent=2)
# Visualize sentiment distribution
sentiments = ['positive', 'neutral', 'negative']
counts = [len(grouped_news['positive']), len(grouped_news['neutral']), len(grouped_news['negative'])]
plt.figure(figsize=(8, 6))
plt.bar(sentiments, counts, color=['green', 'gray', 'red'])
plt.title("News Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Articles")
plt.show()
Next Steps
- Improve sentiment accuracy by using a custom sentiment model tailored to news data.
- Combine this with topic modeling to analyze sentiment by specific topics.
- Use this analysis for real-time news tracking in specific domains (e.g., political, business).
Further Reading
For more advanced Google News scraping techniques, check out our full blog post on HasData: Google News Scraping: RSS, SERP, and Topic Pages.
Top comments (0)