Valentina Skakun for HasData

Posted on Dec 4

Analyzing News Sentiment with VADER

#python #learning #tutorial #hasdata

In this tutorial, we'll walk through how to analyze the sentiment of news headlines using VADER — a fast and effective sentiment analysis model designed for social media text, which works well for news headlines too.

We will:

Collect news data using HasData's Google News API.
Use VADER to analyze the sentiment of each article's headline and snippet.
Classify articles into positive, neutral, and negative categories.
Save the results in a structured JSON format.

Introduction
Setup
Fetching Google News Data
Sentiment Analysis with VADER
Saving and Visualizing Results
Full Code
Next Steps
Further Reading

Introduction

Understanding the sentiment of news headlines is essential for various applications like trend analysis, brand monitoring, and political sentiment tracking.

To perform sentiment analysis, we’ll use VADER (Valence Aware Dictionary and sEntiment Reasoner), which is particularly effective for short texts such as headlines and tweets. The key advantage of VADER is that it’s a lightweight, rule-based model that works well for social media and news data, making it an ideal choice for our case.

Setup

You will need the following tools and libraries:

Python 3.x
requests for API calls
nltk for sentiment analysis
matplotlib for optional visualizations
JSON for saving results

pip install requests matplotlib nltk

You also need to download the VADER lexicon from NLTK:

import nltk
nltk.download('vader_lexicon')

Make sure to get your HasData API key from your HasData dashboard.

Fetching Google News Data

We’ll start by collecting news data from Google News using HasData's API. The goal is to fetch the headline and snippet from each article.

import requests
import json

API_KEY = "HASDATA-API-KEY"

params_raw = {
    "q": "",
    "gl": "us",
    "hl": "en",
    "topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB",  # Example: Entertainment
}

params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}

resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()

In this example, we're fetching articles from the Entertainment section. You can change the topicToken to explore other categories like Business, Technology, etc.

Sentiment Analysis with VADER

Now that we have the data, let’s analyze the sentiment of each article’s headline and snippet. VADER returns a score between -1 and +1, where:

Positive sentiment: +0.05 or higher
Negative sentiment: -0.05 or lower
Neutral sentiment: between -0.05 and +0.05

We'll classify the news articles based on these scores.

from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

grouped_news = {"positive": [], "neutral": [], "negative": []}

for item in data.get("newsResults", []):
    highlight = item.get("highlight", {})
    text_to_analyze = highlight.get("title", "") + " " + highlight.get("snippet", "")

    score = sid.polarity_scores(text_to_analyze)
    news_item = {
        "title": highlight.get("title"),
        "link": highlight.get("link"),
        "snippet": highlight.get("snippet"),
        "source_name": highlight.get("source", {}).get("name"),
        "date": highlight.get("date"),
        "thumbnail": highlight.get("thumbnail")
    }

    if score['compound'] >= 0.05:
        grouped_news["positive"].append(news_item)
    elif score['compound'] <= -0.05:
        grouped_news["negative"].append(news_item)
    else:
        grouped_news["neutral"].append(news_item)

In this code:

Sentiment score: We analyze both the title and snippet of each news article using VADER's polarity_scores.
Classification: Based on the compound score, we classify the article as positive, neutral, or negative.
Group data: All articles are saved into one of these three categories.

Saving and Visualizing Results

Once the sentiment analysis is complete, we’ll save the results in a structured JSON format.

with open("news_sentiment.json", "w", encoding="utf-8") as f:
    json.dump(grouped_news, f, ensure_ascii=False, indent=2)

Optionally, you can also visualize the distribution of positive, neutral, and negative articles using matplotlib.

import matplotlib.pyplot as plt

sentiments = ['positive', 'neutral', 'negative']
counts = [len(grouped_news['positive']), len(grouped_news['neutral']), len(grouped_news['negative'])]

plt.figure(figsize=(8, 6))
plt.bar(sentiments, counts, color=['green', 'gray', 'red'])
plt.title("News Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Articles")
plt.show()

This chart will give you a quick overview of how the news is distributed across positive, neutral, and negative sentiments.

Full Code

Here's the full code that combines everything, from fetching news data to performing sentiment analysis and visualizing the results. You can easily copy and run it:

import requests
import json
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import nltk
import re

# Download NLTK's VADER lexicon
nltk.download('vader_lexicon')

API_KEY = "HASDATA-API-KEY"  # Replace with your HasData API Key

# Set parameters for the API request
params_raw = {
    "q": "",
    "gl": "us",
    "hl": "en",
    "topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB",  # Example: Entertainment
}

params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}

# Fetch news data from Google News
resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()

# Initialize VADER SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

# Create a dictionary to store the categorized news
grouped_news = {"positive": [], "neutral": [], "negative": []}

# Process each article and classify based on sentiment
for item in data.get("newsResults", []):
    highlight = item.get("highlight", {})
    text_to_analyze = highlight.get("title", "") + " " + highlight.get("snippet", "")

    score = sid.polarity_scores(text_to_analyze)
    news_item = {
        "title": highlight.get("title"),
        "link": highlight.get("link"),
        "snippet": highlight.get("snippet"),
        "source_name": highlight.get("source", {}).get("name"),
        "date": highlight.get("date"),
        "thumbnail": highlight.get("thumbnail")
    }

    # Classify based on sentiment score
    if score['compound'] >= 0.05:
        grouped_news["positive"].append(news_item)
    elif score['compound'] <= -0.05:
        grouped_news["negative"].append(news_item)
    else:
        grouped_news["neutral"].append(news_item)

# Save the results in a JSON file
with open("news_sentiment.json", "w", encoding="utf-8") as f:
    json.dump(grouped_news, f, ensure_ascii=False, indent=2)

# Visualize sentiment distribution
sentiments = ['positive', 'neutral', 'negative']
counts = [len(grouped_news['positive']), len(grouped_news['neutral']), len(grouped_news['negative'])]

plt.figure(figsize=(8, 6))
plt.bar(sentiments, counts, color=['green', 'gray', 'red'])
plt.title("News Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Articles")
plt.show()

Next Steps

Improve sentiment accuracy by using a custom sentiment model tailored to news data.
Combine this with topic modeling to analyze sentiment by specific topics.
Use this analysis for real-time news tracking in specific domains (e.g., political, business).

DEV Community