Building an AI release tracker: what 6 months of auto-curation taught me about signal vs noise

#ai #webdev #devjournal

I've been running ai-tldr.dev for about six months now. It auto-aggregates AI releases — models, tools, repos, papers — from a set of curated sources, deduplicates them, categorizes them, and surfaces the day's signal on a clean feed.

This is a technical retrospective on what broke, what surprised me, and what I'd do differently.

The problem I was solving

My own reading workflow was a mess. I had 20+ RSS feeds, Twitter lists, Discord servers, GitHub watchlists. I was spending 40+ minutes a day on "staying current" and retaining maybe 10% of it.

The naive solution is a newsletter. But newsletters have a fundamental structure problem: they're push, not pull. They arrive on their schedule. They optimize for perceived completeness rather than actual relevance to you.

I wanted something more like a database of releases, queryable by category, that I could check when I wanted to, filtered to what I'm actually building.

What I built (high level)

The system has three layers:

1. Ingestion — Scheduled sweeps of ~30 sources: arXiv, Hacker News, GitHub trending, deeplearning.ai, Nathan Lambert's blog, a few Discord servers. Each source has a different fetch strategy (RSS, API, scrape).

2. Processing — Deduplication (same release covered by multiple sources gets one card), categorization (model / repo / tool / paper / ecosystem), and LLM-assisted title normalization and summary generation.

3. Display — A filterable card grid at ai-tldr.dev/?cat=tool (or model, repo, paper, ecosystem). Currently 421 tracked releases.

What broke

Deduplication is harder than it sounds. "GPT-4 Turbo" and "gpt-4-turbo-preview" and "OpenAI releases new flagship model" are the same thing, but fuzzy string matching gets messy fast. I ended up with an LLM-based semantic similarity check as a fallback, which works but adds cost and latency.

Categories drift. Something that starts as a "tool" often becomes a "model" a month later (or vice versa). My initial categorization scheme was too rigid. I've since made it multi-label and allow manual overrides.

Source reliability varies a lot. Some sources go quiet for weeks, some change their structure, some start publishing garbage. I added freshness monitoring and source health scores, but this is ongoing maintenance.

What surprised me

The curation judgment is the actual hard problem. The automation gets you 80% there. The remaining 20% — deciding whether something is actually notable, catching nuance in source descriptions, knowing when to merge vs. keep separate — that's judgment, and it doesn't automate easily.

After six months, I've built intuitions about what makes a release notable that I can't fully articulate, let alone code. The system is better at surfacing candidates. I'm better at rating them.

People use it very differently than I expected. I built it for daily review. Most users seem to use it as a reference — searching for a specific release they heard about, browsing by category for a specific use case. The filterable feed turned out more useful as a lookup tool than a digest.

What I'd do differently

Invest earlier in source quality over source quantity. I spent too long adding sources and not enough time improving coverage of the ones that mattered. 20 good sources beats 60 mediocre ones.

Build the "freshness" signal earlier. When did this release actually happen vs. when did we index it? The distinction matters. News from six months ago shouldn't compete with news from yesterday.

Current state

421 releases tracked. 6 categories. Daily sweeps. Free to browse.

If you're building in the AI space and want to see what landed this week: https://ai-tldr.dev/?cat=tool

Feedback welcome — especially if you notice something missing that should be in the feed.

DEV Community