DEV Community

Cover image for I built a real-time Hacker News trend tracker in one weekend (step-by-step guide)
Linghua Jin
Linghua Jin

Posted on

I built a real-time Hacker News trend tracker in one weekend (step-by-step guide)

Ever wondered what's actually trending on Hacker News right now? Not just today's front page, but the real patterns emerging across threads and comments?

I built a live trend tracker that monitors HN 24/7 and extracts structured topics using LLMs—all in one weekend with open source project https://github.com/cocoindex-io/cocoindex.
And the best part? You can build your own.

Star the repo if you like it!

Why This Matters

Most developers check HN manually or rely on third-party aggregators. But what if you could:

  • Query live trends with natural language: "What's hot in AI infrastructure today?"
  • Track specific topics over time: See when Claude, Rust, or any framework starts gaining traction
  • Get structured data ready for agents, dashboards, or analytics—no scraping gymnastics

This isn't just about reading HN. It's about treating every API as a live, incremental data source that stays fresh automatically.

What I Built (The Technical Details)

The system has three main components:

1. Custom Source for HN API

I wrapped the Hacker News Algolia API with CocoIndex's Custom Source pattern:

  • list() does cheap discovery of new posts
  • get_value() only fetches changed threads (using timestamps as ordinals)
  • Result: Never refetch everything—only process what changed

2. LLM-Powered Topic Extraction

Every thread and comment gets analyzed to extract:

  • hn_messages: Full text, authors, timestamps (perfect for semantic search)
  • hn_topics: Normalized topics with relevance scores

The magic? This runs incrementally. New comments = only new topics get extracted.

3. Query Handlers for Easy Access

Two main queries make the data instantly useful:

search_by_topic("Claude")
→ Returns every HN discussion about Claude with links and context

get_trending_topics(limit=20)
→ Ranked list of what's trending now, with top threads per topic

How It Works in Practice

Run this once:

cocoindex update -L main
Enter fullscreen mode Exit fullscreen mode

Now your system polls HN every 30 seconds and:

  • Discovers new threads automatically
  • Extracts topics from new comments
  • Updates your Postgres database incrementally
  • Handles deletions and stale data cleanup

No cron jobs. No custom ETL pipelines. Just declarative transformations.

The Real Power: This Pattern Works Everywhere

The same architecture I used for Hacker News works for:

  • Reddit - Track subreddit trends
  • Internal Slack - Monitor team discussions
  • GitHub Issues - Analyze project activity
  • CRM Events - Understand customer patterns
  • Any API with timestamps - Make it incremental

Why? Because CocoIndex handles:

  • ✅ Change tracking
  • ✅ Incremental recomputation
  • ✅ Stale data cleanup
  • ✅ Ordinal-based polling

You just declare what transformations you want—CocoIndex figures out when to run them.

Why This Beats Traditional Approaches

Before (Traditional Polling):

  • Write custom polling scripts with cron
  • Manually track what you've already processed
  • Handle API rate limits yourself
  • Store raw data, then build ETL pipelines
  • Hope you don't miss changes

After (CocoIndex):

  • Declare your source and transformations
  • Run cocoindex update -L main
  • Get incremental updates automatically
  • Query structured data immediately
  • Changes propagate in real-time

Agent-Ready by Default

Because everything is stored as normalized topics in Postgres, your AI agents can:

  • Monitor "what's happening in X today?"
  • Track sentiment on specific frameworks
  • Alert on emerging trends
  • Answer questions without vector DB complexity

The data is already structured for agent consumption.

Try It Yourself

Full walkthrough with all the code:
👉 CocoIndex HackerNews Trending Topics Example

The example includes:

  • Complete Custom Source implementation
  • LLM extraction flows
  • Query handler code
  • Database schema
  • Real usage examples

Key Takeaways

If you're building:

  • Trend dashboards - Real-time topic tracking
  • AI agents - Structured data sources
  • Research tools - Historical trend analysis
  • Monitoring systems - Alert on emerging topics

...stop polling APIs like it's 2012. Treat them as live, incremental data sources instead.

The weekend project became my production monitoring system. What will you build?


What would you track with this pattern? Drop your ideas in the comments! 👇

Top comments (0)