DEV Community

Sebastian Casvean
Sebastian Casvean

Posted on • Originally published at zenndra.com

Build a Medium Content Aggregator: One Pipeline, Many Sources

Build a Medium Content Aggregator: One Pipeline, Many Sources

Niche newsletters, competitor monitors, and topic dashboards share one insight: Medium is a firehose of signal if you normalize it. This tutorial builds a single ingestion layer—not five one-off scrapers.

Tool outcome: A config-driven cron job that upserts articles from publications + keyword search into one database table.


Why aggregators die in maintenance

Someone scripts three publications with CSS selectors. One redesigns; the feed goes empty. The fix is a single schema:

type Article = {
  article_id: string;
  title: "string;"
  url: string;
  source: string;      // e.g. "pub:7f60cf5620c9" | "search:ml"
  published_at?: string;
  fetched_at: string;
};
Enter fullscreen mode Exit fullscreen mode

Every source maps into this shape.


Architecture (four layers)

  1. Sources — publications, writers, tags, saved searches (rows in config).
  2. Ingestion — scheduled job: “what’s new since last_seen?”
  3. Normalization — map API JSON → Article.
  4. Distribution — RSS, email, GraphQL, or your UI.

Learn how to resolve publication IDs in monitor Medium publications and keyword feeds in Medium keyword research.


Starter cron (Node)

const API = 'https://api.zenndra.com';
const headers = { Authorization: `Bearer ${process.env.ZENNDRA_API_KEY}` };

const sources = [
  { type: 'publication', id: '7f60cf5620c9', label: 'Towards Data Science' },
  { type: 'search', query: 'machine learning', label: 'search:ml' },
];

async function fetchPublicationArticles(publicationId) {
  const res = await fetch(`${API}/publication/${publicationId}/articles`, { headers });
  const data = await res.json();
  return (data.articles ?? []).map((a) => ({
    article_id: a.id,
    title: a.title,
    url: a.url,
    source: `pub:${publicationId}`,
    fetched_at: new Date().toISOString(),
  }));
}

async function fetchSearchArticles(query) {
  const res = await fetch(
    `${API}/search/articles?query=${encodeURIComponent(query)}`,
    { headers }
  );
  const data = await res.json();
  return (data.articles ?? []).map((a) => ({
    article_id: a.id,
    title: a.title,
    url: a.url,
    source: `search:${query}`,
    fetched_at: new Date().toISOString(),
  }));
}

// Merge, upsert on article_id, skip duplicates
Enter fullscreen mode Exit fullscreen mode

Production checklist

  • Deduplicate on article_id across sources.
  • Track cursors (timestamp or last id) per source.
  • Alert when a source returns zero items three runs in a row—often a bad slug, not “no news.”
  • Attribute clearly; link to original Medium URLs.

Who pays for this pattern

Buyer Value
Newsletter operators Tomorrow’s digest built tonight
Media startups Topic pages without a newsroom
B2B tools “Everything about {keyword}” for sales intel

Keywords

medium content aggregator, medium rss alternative, medium publication feed api, automated newsletter medium, medium api json.


Further reading

Top comments (0)