DEV Community

Aman Sachan
Aman Sachan

Posted on

I automated reading 600+ RSS feeds into one daily India news brief

Every morning I used to spend an hour jumping between news sites — NDTV, The Hindu, Economic Times, scrolling through dozens of RSS feeds. Then I built a script to do it for me.

The problem with more sources

More feeds actually made quality worse until I built a velocity scoring system. Without it, breaking stories drowned in noise and structural trends kept resurfacing even after I had already read them.

What I built

600+ RSS feeds, deduplicated, ranked by source authority and story velocity — synthesized into one email delivered at 8:30 AM IST. Politics, economy, tech, environment, markets — all in one readable brief.

The pipeline:

  1. Fetch — parallel HTTP requests to all feeds, 10-second timeout per feed
  2. Deduplicate — simhash-based near-duplicate detection across all sources
  3. Rank — source authority weight × story velocity × recency score
  4. Summarize — Groq AI generates a 3-paragraph situational brief from top stories
  5. Email — formatted HTML email, direct to inbox

The hardest part

The velocity scoring. I tried naive tf-idf first — it ranked opinion pieces over breaking news. The fix was a momentum score that tracks how many new sources pick up a story in the first 2 hours. Now genuine breaking stories surface without drowning editorial content.

Stack

  • Python 3.12 + feedparser + httpx
  • Redis for feed metadata cache
  • Groq API for summarization
  • SQLite for story fingerprints

Result

One email at 8:30 AM instead of 60 minutes of scattered reading. MIT licensed.

GitHub: https://github.com/amsach/btc-research (skill lives in /home/workspace/Skills/india-daily/)

If you consume India news for research or journalism, this might save you serious time.

Python #India #RSS #Automation #OpenSource #Journalism

Top comments (0)