DEV Community

Valentina Skakun for HasData

Posted on

Analyzing News Sentiment with VADER

In this tutorial, we'll walk through how to analyze the sentiment of news headlines using VADER — a fast and effective sentiment analysis model designed for social media text, which works well for news headlines too.

We will:

  1. Collect news data using HasData's Google News API.
  2. Use VADER to analyze the sentiment of each article's headline and snippet.
  3. Classify articles into positive, neutral, and negative categories.
  4. Save the results in a structured JSON format.

Table of Contents

Introduction

Understanding the sentiment of news headlines is essential for various applications like trend analysis, brand monitoring, and political sentiment tracking.

To perform sentiment analysis, we’ll use VADER (Valence Aware Dictionary and sEntiment Reasoner), which is particularly effective for short texts such as headlines and tweets. The key advantage of VADER is that it’s a lightweight, rule-based model that works well for social media and news data, making it an ideal choice for our case.

Setup

You will need the following tools and libraries:

  • Python 3.x
  • requests for API calls
  • nltk for sentiment analysis
  • matplotlib for optional visualizations
  • JSON for saving results
pip install requests matplotlib nltk
Enter fullscreen mode Exit fullscreen mode

You also need to download the VADER lexicon from NLTK:

import nltk
nltk.download('vader_lexicon')
Enter fullscreen mode Exit fullscreen mode

Make sure to get your HasData API key from your HasData dashboard.

Fetching Google News Data

We’ll start by collecting news data from Google News using HasData's API. The goal is to fetch the headline and snippet from each article.

import requests
import json

API_KEY = "HASDATA-API-KEY"

params_raw = {
    "q": "",
    "gl": "us",
    "hl": "en",
    "topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB",  # Example: Entertainment
}

params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}

resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()
Enter fullscreen mode Exit fullscreen mode

In this example, we're fetching articles from the Entertainment section. You can change the topicToken to explore other categories like Business, Technology, etc.

Sentiment Analysis with VADER

Now that we have the data, let’s analyze the sentiment of each article’s headline and snippet. VADER returns a score between -1 and +1, where:

  • Positive sentiment: +0.05 or higher
  • Negative sentiment: -0.05 or lower
  • Neutral sentiment: between -0.05 and +0.05

We'll classify the news articles based on these scores.

from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

grouped_news = {"positive": [], "neutral": [], "negative": []}

for item in data.get("newsResults", []):
    highlight = item.get("highlight", {})
    text_to_analyze = highlight.get("title", "") + " " + highlight.get("snippet", "")

    score = sid.polarity_scores(text_to_analyze)
    news_item = {
        "title": highlight.get("title"),
        "link": highlight.get("link"),
        "snippet": highlight.get("snippet"),
        "source_name": highlight.get("source", {}).get("name"),
        "date": highlight.get("date"),
        "thumbnail": highlight.get("thumbnail")
    }

    if score['compound'] >= 0.05:
        grouped_news["positive"].append(news_item)
    elif score['compound'] <= -0.05:
        grouped_news["negative"].append(news_item)
    else:
        grouped_news["neutral"].append(news_item)
Enter fullscreen mode Exit fullscreen mode

In this code:

  • Sentiment score: We analyze both the title and snippet of each news article using VADER's polarity_scores.
  • Classification: Based on the compound score, we classify the article as positive, neutral, or negative.
  • Group data: All articles are saved into one of these three categories.

Saving and Visualizing Results

Once the sentiment analysis is complete, we’ll save the results in a structured JSON format.

with open("news_sentiment.json", "w", encoding="utf-8") as f:
    json.dump(grouped_news, f, ensure_ascii=False, indent=2)
Enter fullscreen mode Exit fullscreen mode

Optionally, you can also visualize the distribution of positive, neutral, and negative articles using matplotlib.

import matplotlib.pyplot as plt

sentiments = ['positive', 'neutral', 'negative']
counts = [len(grouped_news['positive']), len(grouped_news['neutral']), len(grouped_news['negative'])]

plt.figure(figsize=(8, 6))
plt.bar(sentiments, counts, color=['green', 'gray', 'red'])
plt.title("News Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Articles")
plt.show()
Enter fullscreen mode Exit fullscreen mode

This chart will give you a quick overview of how the news is distributed across positive, neutral, and negative sentiments.

Full Code

Here's the full code that combines everything, from fetching news data to performing sentiment analysis and visualizing the results. You can easily copy and run it:

import requests
import json
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import nltk
import re

# Download NLTK's VADER lexicon
nltk.download('vader_lexicon')

API_KEY = "HASDATA-API-KEY"  # Replace with your HasData API Key

# Set parameters for the API request
params_raw = {
    "q": "",
    "gl": "us",
    "hl": "en",
    "topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB",  # Example: Entertainment
}

params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}

# Fetch news data from Google News
resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()

# Initialize VADER SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

# Create a dictionary to store the categorized news
grouped_news = {"positive": [], "neutral": [], "negative": []}

# Process each article and classify based on sentiment
for item in data.get("newsResults", []):
    highlight = item.get("highlight", {})
    text_to_analyze = highlight.get("title", "") + " " + highlight.get("snippet", "")

    score = sid.polarity_scores(text_to_analyze)
    news_item = {
        "title": highlight.get("title"),
        "link": highlight.get("link"),
        "snippet": highlight.get("snippet"),
        "source_name": highlight.get("source", {}).get("name"),
        "date": highlight.get("date"),
        "thumbnail": highlight.get("thumbnail")
    }

    # Classify based on sentiment score
    if score['compound'] >= 0.05:
        grouped_news["positive"].append(news_item)
    elif score['compound'] <= -0.05:
        grouped_news["negative"].append(news_item)
    else:
        grouped_news["neutral"].append(news_item)

# Save the results in a JSON file
with open("news_sentiment.json", "w", encoding="utf-8") as f:
    json.dump(grouped_news, f, ensure_ascii=False, indent=2)

# Visualize sentiment distribution
sentiments = ['positive', 'neutral', 'negative']
counts = [len(grouped_news['positive']), len(grouped_news['neutral']), len(grouped_news['negative'])]

plt.figure(figsize=(8, 6))
plt.bar(sentiments, counts, color=['green', 'gray', 'red'])
plt.title("News Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Articles")
plt.show()
Enter fullscreen mode Exit fullscreen mode

Next Steps

  • Improve sentiment accuracy by using a custom sentiment model tailored to news data.
  • Combine this with topic modeling to analyze sentiment by specific topics.
  • Use this analysis for real-time news tracking in specific domains (e.g., political, business).

Further Reading

For more advanced Google News scraping techniques, check out our full blog post on HasData: Google News Scraping: RSS, SERP, and Topic Pages.

Top comments (0)