Every day, thousands of stories compete for the Hacker News front page. I wanted to detect trending topics before they blow up — using nothing but public APIs and a bit of Python.
Here's how I built a simple HN trend detector, and what I learned about the data along the way.
The HN Firebase API
Hacker News runs on a public Firebase API. No auth needed. The two endpoints that matter:
import requests
# Get current top 500 story IDs
top_ids = requests.get("https://hacker-news.firebaseio.com/v0/topstories.json").json()
# Get details for a single story
story = requests.get(f"https://hacker-news.firebaseio.com/v0/item/{top_ids[0]}.json").json()
print(story["title"], story["score"], story["descendants"]) # descendants = comment count
This gives you title, score, author, timestamp, comment count, and URL for every item. Simple, but powerful.
Detecting Velocity, Not Just Score
A story with 300 points after 12 hours is stale. A story with 80 points after 40 minutes is exploding. The key metric is points per hour:
import time
def velocity(story):
age_hours = (time.time() - story["time"]) / 3600
if age_hours < 0.1:
return 0 # too new to judge
return story["score"] / age_hours
# Fetch top 30 stories and rank by velocity
stories = []
for sid in top_ids[:30]:
s = requests.get(f"https://hacker-news.firebaseio.com/v0/item/{sid}.json").json()
s["velocity"] = velocity(s)
stories.append(s)
stories.sort(key=lambda s: s["velocity"], reverse=True)
for s in stories[:10]:
print(f"{s[\"velocity\"]:.1f} pts/hr | {s[\"score\"]} pts | {s[\"title\"]}")
Sample output:
142.3 pts/hr | 87 pts | Show HN: I made a tool that...
98.1 pts/hr | 203 pts | The hidden cost of...
67.4 pts/hr | 312 pts | Why we switched from...
Stories with high velocity in their first 1-2 hours almost always make it to #1.
Adding Comment Sentiment
HN comments are gold. A story with 200 points but hostile comments won't last. I added a simple ratio check:
def engagement_ratio(story):
comments = story.get("descendants", 0)
if story["score"] == 0:
return 0
return comments / story["score"]
Ratios above 1.5 usually mean controversy. Below 0.3 means people upvote but don't discuss — often link-heavy or self-explanatory content. The sweet spot (0.5-1.2) indicates genuine interest.
Tracking Topics Over Time
To spot trends across days, I store snapshots in SQLite:
import sqlite3
from datetime import datetime
db = sqlite3.connect("hn_trends.db")
db.execute("""CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER, title TEXT, score INTEGER,
comments INTEGER, velocity REAL,
captured_at TEXT
)""")
for s in stories:
db.execute(
"INSERT INTO snapshots VALUES (?,?,?,?,?,?)",
(s["id"], s["title"], s["score"],
s.get("descendants", 0), s["velocity"],
datetime.utcnow().isoformat())
)
db.commit()
Run this every 30 minutes via cron, and after a week you can query for patterns:
-- Topics that appeared on front page 3+ times this week
SELECT title, COUNT(*) as appearances, MAX(score) as peak_score
FROM snapshots
WHERE captured_at > datetime("now", "-7 days")
GROUP BY id
HAVING appearances >= 3
ORDER BY peak_score DESC;
Scaling Up: When the Firebase API Isn't Enough
The Firebase API is great for real-time data, but it has limits:
- No bulk export (you fetch one item at a time)
- No historical data beyond current top/new/best lists
- Comment trees require recursive fetching (slow for 500+ comment threads)
If you need structured historical data or full comment threads at scale, Apify has several HN scrapers that handle the heavy lifting. I've been using HN Top Stories Scraper which pulls structured data including full comment threads — useful when you want to analyze discussion patterns without writing your own recursive crawler.
The Full Pipeline
My production setup:
- Cron job every 30 min → fetches top 100 via Firebase API
- Velocity calculator flags stories above 50 pts/hr
- SQLite storage for historical analysis
- Weekly digest email with recurring topics and velocity outliers
Total code: ~120 lines of Python. No ML, no fancy NLP. Just velocity math and some SQL.
What I Found
After running this for a few weeks:
- AI/LLM stories consistently hit the highest velocities (80+ pts/hr)
- Show HN posts have the best engagement ratios
- Stories posted between 9-11am ET get 2x the velocity of evening posts
- The comment-to-score ratio reliably predicts whether a story stays on the front page
The full code is straightforward enough to run on any $5 VPS. If you're interested in HN data analysis, start with the Firebase API — it's surprisingly capable for a free, unauthenticated endpoint.
What patterns have you noticed on HN? Drop a comment if you've built something similar.
Top comments (0)