Want to build your own automated news channel? Here's exactly how I did it — the complete architecture, code patterns, and lessons learned.
The Stack
- Python 3 (stdlib only — no pip installs needed)
- RSS feeds (free, reliable, real-time)
- Telegram Bot API (free, unlimited messages to channels)
- Cron (15-minute intervals)
- SearXNG (optional: self-hosted search fallback)
Total cost: $0/month. Runs on any Linux box, VPS, or even a Raspberry Pi.
Step 1: Create Your Telegram Bot
- Message @BotFather on Telegram
- Send
/newbotand follow the prompts - Save your bot token
- Create a public channel (e.g., @YourNewsChannel)
- Add your bot as an admin with posting rights
Step 2: Find RSS Feeds
Most major news sites still offer RSS. Here's how to find them:
# Common RSS URL patterns:
# /feed/
# /rss/
# /rss.xml
# /feeds/rss/headlines
# /atom.xml
# Example feeds:
FEEDS = [
('TechCrunch', 'https://techcrunch.com/feed/'),
('Ars Technica', 'https://feeds.arstechnica.com/arstechnica/index'),
('The Verge', 'https://www.theverge.com/rss/index.xml'),
('Hacker News', 'https://hnrss.org/frontpage?points=100'),
]
Pro tip: If a site doesn't have RSS, use Google News RSS:
https://news.google.com/rss/search?q=site:example.com&hl=en-US
Step 3: The Core Engine (~60 lines)
import urllib.request
import xml.etree.ElementTree as ET
import json, hashlib, re, os
from datetime import datetime, timezone
from html import unescape
def fetch_rss(url, max_items=10):
"""Fetch and parse an RSS feed. Returns list of stories."""
stories = []
req = urllib.request.Request(url, headers={
'User-Agent': 'Mozilla/5.0 (compatible; NewsBot/1.0)'
})
resp = urllib.request.urlopen(req, timeout=12)
root = ET.fromstring(resp.read())
for item in root.findall('.//item')[:max_items]:
title = item.findtext('title', '').strip()
link = item.findtext('link', '').strip()
desc = item.findtext('description', '').strip()
desc = re.sub(r'<[^>]+>', '', unescape(desc))[:200]
pub = item.findtext('pubDate', '')
if title and link:
stories.append({
'title': unescape(title),
'url': link,
'desc': desc,
'pub': pub
})
return stories
Step 4: Deduplication
Without dedup, you'll post the same AP/Reuters story from 6 different sources:
def story_hash(title):
clean = re.sub(r'[^a-z0-9 ]', '', title.lower().strip())
return hashlib.md5(clean[:80].encode()).hexdigest()[:12]
# Load previously posted stories
state = json.load(open('state.json')) if os.path.exists('state.json') else {}
posted = state.get('posted', {})
# Filter new stories
new_stories = []
for story in all_stories:
h = story_hash(story['title'])
if h not in posted:
new_stories.append(story)
posted[h] = {'title': story['title'], 'ts': datetime.now().isoformat()}
Step 5: Freshness Filter
Only post stories from the last hour — nobody wants yesterday's news:
from datetime import timezone
def is_fresh(pub_date_str, max_hours=1):
formats = [
'%a, %d %b %Y %H:%M:%S %z',
'%Y-%m-%dT%H:%M:%S%z',
'%Y-%m-%dT%H:%M:%SZ',
]
for fmt in formats:
try:
dt = datetime.strptime(pub_date_str.strip(), fmt)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
age = datetime.now(timezone.utc) - dt
return age.total_seconds() < (max_hours * 3600)
except ValueError:
continue
return True # Can't parse = assume fresh
Step 6: Image Extraction
Posts with images get 3-5x more engagement:
def extract_og_image(url):
"""Scrape og:image from article page."""
try:
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req, timeout=5).read(100000)
html = html.decode('utf-8', errors='ignore')
match = re.search(
r'<meta[^>]*property=["\']og:image["\'][^>]*content=["\']'
r'(https?://[^"\']+)["\']', html
)
return match.group(1) if match else None
except:
return None
Step 7: Post to Telegram
def post_to_telegram(token, channel, text, image_url=None):
if image_url:
data = json.dumps({
'chat_id': channel,
'photo': image_url,
'caption': text,
'parse_mode': 'HTML'
}).encode()
url = f'https://api.telegram.org/bot{token}/sendPhoto'
else:
data = json.dumps({
'chat_id': channel,
'text': text,
'parse_mode': 'HTML'
}).encode()
url = f'https://api.telegram.org/bot{token}/sendMessage'
req = urllib.request.Request(url, data=data,
headers={'Content-Type': 'application/json'})
resp = json.loads(urllib.request.urlopen(req, timeout=15).read())
return resp.get('ok', False)
Step 8: Cron It
# Run every 15 minutes
*/15 * * * * /usr/bin/python3 /path/to/news_bot.py >> /var/log/news_bot.log 2>&1
Quality Controls I Added Later
These came from running the system for a week:
1. Content Filtering
BBC's main RSS feed includes sports, lifestyle, and entertainment. Filter by category or use specific feeds:
EXCLUDE = re.compile(r'football|soccer|rugby|cricket|recipe|horoscope', re.I)
if EXCLUDE.search(title):
continue # Skip off-topic
2. Fuzzy Deduplication
Same story from AP appears as slightly different headlines on CNN, BBC, Guardian:
def is_near_duplicate(new_title, existing_titles, threshold=0.7):
words_new = set(new_title.lower().split())
for existing in existing_titles:
words_ex = set(existing.lower().split())
overlap = len(words_new & words_ex) / max(len(words_new), len(words_ex))
if overlap >= threshold:
return True
return False
3. Default Images
Some RSS feeds don't include images. Always have a fallback:
DEFAULT_IMAGES = {
'tech': 'https://images.unsplash.com/photo-1518770660439-4636190af475?w=800',
'world': 'https://images.unsplash.com/photo-1451187580459-43490279c0fa?w=800',
}
image = story.get('image') or extract_og_image(url) or DEFAULT_IMAGES[category]
Results
Running 4 channels with this architecture:
- ⚡ Pokemon News — 14 sources + 2 scrapers
- 🤖 AI/Tech — 10 feeds
- 🌍 World News — 7 feeds
- ₿ Crypto — 6 feeds
Total infrastructure cost: $0/month (runs on existing server).
What's Next
- AI-generated summaries and analysis layer
- Premium tier with sentiment analysis
- More niche channels (sports, science, gaming)
- Telegram Mini App for custom news feeds
Questions? Drop a comment. Want to see the full source? Check out MFS Corp on GitHub.
Part of the Building MFS Corp series — documenting how we're building an AI-powered company from scratch.
Top comments (0)