Stale Awesome Lists: How I Built a Self-Regulating Curation System

#english #reflections

Open GitHub and search awesome-python in trending. You'll find a repo with 240k stars, 39 thousand forks, a PR list pushing past 2,000 open. Looks like the holy scripture of modern Python.

Now check when the last merge happened. A few days ago. Okay, fine. Now do the same with awesome-react, awesome-nodejs, awesome-flutter. Half of them are frozen solid. Last commit 8 months ago. PRs rotting in the hundreds. Entries pointing to expired domains. Tools that stopped being maintained in 2022.

That's the most common story out there: an awesome-* list that was once gold, now just a museum.

The Problem

Awesome lists are the entry point for devs who want to discover tools in an ecosystem. They're the first Google result, they have tens of thousands of stars, they get linked in tutorials, threads, bookmarks.

The problem is they don't scale over time. The original curator eventually loses interest or gets swallowed by their day job. Contribution PRs pile up faster than they get merged. And when the maintainer disappears, the list doesn't "break" — it just quietly ages while the world moves on around it.

A dev landing on awesome-X in 2026 might end up installing a tool that's been deprecated for two years. And nobody warns them.

The Idea

A few days ago a friend told me:

"I want to write a post about every awesome repo I read."

I said something like:

"But first we need to filter which ones are actually worth it. And we need a system that does it automatically — because if every month we're manually deciding which lists are alive and which aren't, this doesn't scale."

That's how this project started.

The simple idea: build a self-regulating system that maintains a dynamic "roster" of the 15 most active, highest-quality awesome lists in the ecosystem, plus a "bench" of 5 candidates waiting for promotion. Re-evaluate weekly. Scrape only the live ones. Deduplicate items cross-source (if three lists mention the same tool, that's a signal). Classify with AI to pre-sort. And at the end, publish our own list that updates itself.

In other words: don't create another awesome list by hand. Build a system that curates the awesomes.

Three-Phase Architecture

  ┌─────────────────────────────────┐
  │   PHASE 1 — Discovery           │
  │   GitHub search + seeds         │
  │   → scoring 0-100 (5 dims)      │
  │   → ROSTER 15 · BENCH 5         │
  └──────────────┬──────────────────┘
                 │
  ┌──────────────▼──────────────────┐
  │   PHASE 2 — Scrape + Dedupe     │
  │   README parsing                │
  │   → normalized items            │
  │   → unique CuratedTools         │
  │   → appearsInCount = signal     │
  └──────────────┬──────────────────┘
                 │
  ┌──────────────▼──────────────────┐
  │   PHASE 3 — Curate              │
  │   Claude classifies GEM/HYPE/…  │
  │   Human confirms or overrides   │
  │   → auto-generated README       │
  └─────────────────────────────────┘

Each phase is a different problem. Discovery is a search + ranking problem. Scrape is parsing + deduplication. Curation is prompt engineering + cost control.

The next three posts are each a technical deep dive:

Part 2: Batched GraphQL + raw SQL: from 5 minutes down to 25 seconds processing 28,000 items. The optimization journey, the real bottlenecks (connection_limit), and how Prisma sometimes betrays you.
Part 3: Classifying 5,000 tools with Claude for $1. Batched Haiku, prompt design, the quality/cost tradeoff, and how to write a prompt the model won't trash.
Part 4: The launch. The public awesome-curated repo, the weekly automation, and the metric of whether anyone's actually using it.

The Scoring

Before going further, it's worth showing the heart of the system — the formula that decides whether an awesome list is alive or dead.

score = freshness * 0.35
      + activity * 0.20
      + popularity * 0.15
      + depth * 0.20
      + community_health * 0.10

Five dimensions, 0 to 100 each.

Freshness — when was the last commit. Steep curve: under 3 days is 100, over 180 days is 0.
Activity — PRs merged in the last 30 days. A live list has active contributions even if the maintainer isn't writing much themselves.
Popularity — stars. But not pure log: in the 250–2500 star range the curve is linear so we don't crush legitimate niches (cryptography, rust-embedded, etc.).
Depth — number of items in the README plus organized categories. An awesome with 30 poorly grouped items is worth less than one with 300 well-structured ones.
Community health — openIssues / stars ratio. If it's >10% it's a neglected project. If it's ~1% it's a project that actually responds.

The first run threw some interesting data: vinta/awesome-python with 240k stars dropped to the BENCH because the unattended issues ratio was high and PRs merged per month were low. Meanwhile awesome-mcp-servers with barely 12k stars entered the ROSTER with a score of 99 because it's being actively maintained right as the MCP world is exploding.

That's exactly what a dev needs: not the biggest list, but the one that's going to have the tool that shipped yesterday.

What's Coming

In the next post I get into the code: how I went from the first prototype with classic REST (4 requests per repo) to the pipeline with batched GraphQL + raw SQL that processes 20 repos and 28,000 items in 25 seconds. With the real bugs along the way.

In the meantime, you can follow the live evolution:

The repo with the curated list: github.com/JuanTorchia/awesome-curated (public launch in a few weeks)
This blog: a new post in the series every 3 days
Discussion: @Juanchi_AR on Twitter if you want to weigh in on the scoring or propose an awesome-* that should make the roster

If everything goes to plan, in six months nobody should have to open awesome-X and find out it's been dead for a year. The list lives on its own.

That's the plan.