Building a Tech Trend Dashboard from Hacker News Data with Python

#python #webdev #tutorial #datascience

Every day, thousands of developers discuss emerging technologies on Hacker News. What if you could turn that firehose of discussion into a structured trend dashboard?

In this tutorial, we build a Python script that analyzes HN stories and comments to detect trending topics and visualize what the developer community cares about right now.

The Data Pipeline

We need stories and their comments. The official HN API gives you individual items, but fetching hundreds of stories plus nested comments is slow since each comment requires a separate HTTP call.

For bulk collection, I use the HN Stories + Comments Scraper on Apify which grabs full story metadata and comment trees in one run. But you can adapt this to any data source.

Assume we have our data as JSON:

stories = [
    {
        "title": "Show HN: I built a Rust web framework",
        "score": 342,
        "num_comments": 156,
        "comments": [
            {"text": "Really fast compared to Actix...", "score": 45},
            {"text": "How does this handle async?", "score": 23},
        ]
    },
]

Step 1: Extract Tech Keywords

Simple keyword extraction weighted by engagement:

import re
from collections import Counter

TECH_TERMS = {
    'rust', 'python', 'go', 'typescript', 'zig', 'kotlin',
    'react', 'vue', 'svelte', 'htmx', 'nextjs',
    'llm', 'gpt', 'claude', 'openai', 'transformer',
    'kubernetes', 'docker', 'wasm', 'sqlite', 'postgres',
}

def extract_trends(stories):
    weighted_counts = Counter()
    for story in stories:
        text = story['title'].lower()
        for c in story.get('comments', []):
            text += ' ' + c.get('text', '').lower()
        for term in TECH_TERMS:
            if re.search(r'\b' + term + r'\b', text):
                weight = story['score'] + story['num_comments']
                weighted_counts[term] += weight
    return weighted_counts.most_common(15)

Step 2: Detect Rising vs Falling Topics

Compare recent mentions against a baseline period:

from datetime import datetime, timedelta

def trend_momentum(stories, days_recent=7, days_baseline=30):
    now = datetime.now()
    recent_cutoff = now - timedelta(days=days_recent)
    baseline_cutoff = now - timedelta(days=days_baseline)
    recent, baseline = Counter(), Counter()

    for story in stories:
        ts = datetime.fromtimestamp(story.get('time', 0))
        terms = {t for t in TECH_TERMS
                 if re.search(r'\b' + t + r'\b', story['title'].lower())}
        if ts >= recent_cutoff:
            for t in terms: recent[t] += 1
        elif ts >= baseline_cutoff:
            for t in terms: baseline[t] += 1

    momentum = {}
    for term in TECH_TERMS:
        r = recent[term] / days_recent
        b = baseline[term] / (days_baseline - days_recent) or 0.001
        momentum[term] = r / b

    return (sorted(momentum.items(), key=lambda x: -x[1])[:5],
            sorted(momentum.items(), key=lambda x: x[1])[:5])

Step 3: Build the Dashboard

def print_dashboard(stories):
    trends = extract_trends(stories)
    rising, falling = trend_momentum(stories)

    print("=" * 50)
    print("  HN TECH TREND DASHBOARD")
    print("=" * 50)

    print("\nTOP TECHNOLOGIES (by weighted mentions):")
    for i, (term, score) in enumerate(trends[:10], 1):
        bar = "#" * min(score // 100, 30)
        print(f"  {i:2d}. {term:12s} {bar} ({score})")

    print("\nRISING:")
    for term, ratio in rising:
        if ratio > 1.2:
            print(f"  ^ {term} ({ratio:.1f}x baseline)")

    print("\nCOOLING:")
    for term, ratio in falling:
        if ratio < 0.8:
            print(f"  v {term} ({ratio:.1f}x baseline)")

Collecting Real Data

For production, here are your options:

HN Official API - Free, but slow for bulk. Good for under 50 stories.
HN Algolia API - Search-oriented, great for keyword queries.
Web scraping - Fragile, breaks when HN changes markup.
Pre-built scrapers - Tools like the Apify HN scraper handle pagination, rate limits, and comment tree traversal.

import requests

def fetch_hn_stories(query="python", hits=100):
    url = "http://hn.algolia.com/api/v1/search"
    params = {"query": query, "tags": "story", "hitsPerPage": hits}
    return requests.get(url, params=params).json()["hits"]

What You Can Build From Here

Weekly email digest of trending tech topics
Investment signal - track which technologies gain developer mindshare
Content planning tool - write about what devs are actively discussing

The code runs in under a second on a few hundred stories. For continuous monitoring, throw it in a cron job with a SQLite database. Happy trend hunting!