Build a Medium Content Aggregator: One Pipeline, Many Sources
Niche newsletters, competitor monitors, and topic dashboards share one insight: Medium is a firehose of signal if you normalize it. This tutorial builds a single ingestion layer—not five one-off scrapers.
Tool outcome: A config-driven cron job that upserts articles from publications + keyword search into one database table.
Why aggregators die in maintenance
Someone scripts three publications with CSS selectors. One redesigns; the feed goes empty. The fix is a single schema:
type Article = {
article_id: string;
title: "string;"
url: string;
source: string; // e.g. "pub:7f60cf5620c9" | "search:ml"
published_at?: string;
fetched_at: string;
};
Every source maps into this shape.
Architecture (four layers)
- Sources — publications, writers, tags, saved searches (rows in config).
-
Ingestion — scheduled job: “what’s new since
last_seen?” -
Normalization — map API JSON →
Article. - Distribution — RSS, email, GraphQL, or your UI.
Learn how to resolve publication IDs in monitor Medium publications and keyword feeds in Medium keyword research.
Starter cron (Node)
const API = 'https://api.zenndra.com';
const headers = { Authorization: `Bearer ${process.env.ZENNDRA_API_KEY}` };
const sources = [
{ type: 'publication', id: '7f60cf5620c9', label: 'Towards Data Science' },
{ type: 'search', query: 'machine learning', label: 'search:ml' },
];
async function fetchPublicationArticles(publicationId) {
const res = await fetch(`${API}/publication/${publicationId}/articles`, { headers });
const data = await res.json();
return (data.articles ?? []).map((a) => ({
article_id: a.id,
title: a.title,
url: a.url,
source: `pub:${publicationId}`,
fetched_at: new Date().toISOString(),
}));
}
async function fetchSearchArticles(query) {
const res = await fetch(
`${API}/search/articles?query=${encodeURIComponent(query)}`,
{ headers }
);
const data = await res.json();
return (data.articles ?? []).map((a) => ({
article_id: a.id,
title: a.title,
url: a.url,
source: `search:${query}`,
fetched_at: new Date().toISOString(),
}));
}
// Merge, upsert on article_id, skip duplicates
Production checklist
-
Deduplicate on
article_idacross sources. - Track cursors (timestamp or last id) per source.
- Alert when a source returns zero items three runs in a row—often a bad slug, not “no news.”
- Attribute clearly; link to original Medium URLs.
Who pays for this pattern
| Buyer | Value |
|---|---|
| Newsletter operators | Tomorrow’s digest built tonight |
| Media startups | Topic pages without a newsroom |
| B2B tools | “Everything about {keyword}” for sales intel |
Keywords
medium content aggregator, medium rss alternative, medium publication feed api, automated newsletter medium, medium api json.
Further reading
- Cron best practices on Node
- PostgreSQL UPSERT for idempotent writes
- Zenndra: Build a Medium content aggregator
Top comments (0)