Every morning I used to spend an hour jumping between news sites — NDTV, The Hindu, Economic Times, scrolling through dozens of RSS feeds. Then I built a script to do it for me.
The problem with more sources
More feeds actually made quality worse until I built a velocity scoring system. Without it, breaking stories drowned in noise and structural trends kept resurfacing even after I had already read them.
What I built
600+ RSS feeds, deduplicated, ranked by source authority and story velocity — synthesized into one email delivered at 8:30 AM IST. Politics, economy, tech, environment, markets — all in one readable brief.
The pipeline:
- Fetch — parallel HTTP requests to all feeds, 10-second timeout per feed
- Deduplicate — simhash-based near-duplicate detection across all sources
- Rank — source authority weight × story velocity × recency score
- Summarize — Groq AI generates a 3-paragraph situational brief from top stories
- Email — formatted HTML email, direct to inbox
The hardest part
The velocity scoring. I tried naive tf-idf first — it ranked opinion pieces over breaking news. The fix was a momentum score that tracks how many new sources pick up a story in the first 2 hours. Now genuine breaking stories surface without drowning editorial content.
Stack
- Python 3.12 + feedparser + httpx
- Redis for feed metadata cache
- Groq API for summarization
- SQLite for story fingerprints
Result
One email at 8:30 AM instead of 60 minutes of scattered reading. MIT licensed.
GitHub: https://github.com/amsach/btc-research (skill lives in /home/workspace/Skills/india-daily/)
If you consume India news for research or journalism, this might save you serious time.
Top comments (0)