DEV Community

Cover image for I Indexed 67,000 Open-Source AI Agent Projects. Here's What's Actually Inside.
Jason Zhu
Jason Zhu

Posted on

I Indexed 67,000 Open-Source AI Agent Projects. Here's What's Actually Inside.

A few months ago I started a side project called AgentSkillsHub — a directory that indexes every meaningful open-source AI agent project on GitHub: MCP servers, Claude Skills, Codex Skills, agent tools, the works.

Six months in, the database has 67,196 projects, refreshed every 8 hours.

I expected to find a healthy ecosystem with a long tail. What I actually found was so lopsided I had to stop and write a 12-chapter book about it (free, CC BY-NC-SA, PDF).

This post is a 1,500-word version. If any of it surprises you, the source data is open and reproducible.


TL;DR (5 findings)

  1. The Gini coefficient of star distribution is 0.983 — more lopsided than the App Store (0.95), npm (0.93), or YouTube (0.87)
  2. 54% of all projects have 0 stars. Not "few stars." Zero.
  3. The top 1% of projects own 83% of all stars in the entire ecosystem
  4. Monthly new project creation 45×'d between January 2025 and March 2026
  5. The best engineering pattern I found isn't more stars or better code — it's MISTAKES.md files (only 2.8% of top projects have one)

I'll unpack each below, plus three things my own data proved I was wrong about.


Finding 1: A new world record for inequality

The Gini coefficient measures distribution inequality on a 0–1 scale. 0 = perfect equality (everyone has the same), 1 = one person owns everything.

AI Agent Projects (2026):  0.983
App Store (2024):          0.95
npm packages (2022):       0.93
YouTube videos (2020):     0.87
US wealth (2023):          0.40
China wealth (2023):       0.47
Enter fullscreen mode Exit fullscreen mode

I checked the math three times. The 67K project dataset really does have a Gini of 0.983.

For context: the bottom 50% of all open-source AI agent projects own 0.4% of the stars. The top 0.1% (about 67 projects) own about half of all the stars on the platform.

This isn't a "long tail." This is a needle and a desert.

Finding 2: 54% have zero stars

Of 67,196 projects:

  • 36,346 have exactly 0 stars (54.1%)
  • 47,381 have ≤ 5 stars (70.5%)
  • Only 1,693 have ≥ 100 stars (2.5%)
  • Only 403 have ≥ 1,000 stars (0.6%)

The 36K zero-star projects represent a real human behavior: someone wrote a Skill or MCP server, pushed it to GitHub, and literally no one — not even themselves on a different account — clicked the star button.

Most of these aren't spam. They're earnest first attempts. Someone learned about Skills, wrote one in a weekend, pushed it, and then never came back.

Finding 3: 1% own 83%

Top 1% of projects:    83.2% of all stars
Top 10%:               96.8% of all stars
Bottom 90%:            3.2% of all stars
Enter fullscreen mode Exit fullscreen mode

If you're an open-source agent author and you're not in the top 10%, mathematically you are competing for 3% of the visible attention.

The cliff between #1,000 and #10,000 in star rank is steeper than between #100 and #1,000.

Finding 4: The 2026 supply explosion

Monthly new agent projects, by year:

Month Count
2024 Jan ~50
2024 Dec ~280
2025 Jun ~620
2025 Dec ~1,400
2026 Mar ~27,720

That's a 45× jump from 2024's monthly average to one month in 2026.

What changed? Three things compounded:

  1. Anthropic published the Skill Spec (October 2025), giving creators a concrete format
  2. Claude Code shipped ~/.claude/skills/ (February 2026), making installation one-step
  3. Cursor + Codex CLI added Skill loading in the same quarter

When the format went from "5 commands and one config file" to "drop a folder, done," the supply curve broke.

The demand curve hasn't kept up. Hence the 54% zero-star rate.

Finding 5: The 2.8% who write MISTAKES.md

Of the top 500 projects (≥500 stars), I checked which had files like MISTAKES.md, LESSONS.md, or POSTMORTEM.md.

Only 14 do (2.8%). Of those, 8 are forks/templates. Real authors actively recording mistakes: 6.

Those 6 projects' mean quality score is 55.8 vs 47.2 for the Top 500 average. A +8.6 delta on a 100-point scale.

Sample size is small, but the signal is loud: the engineers who write down their mistakes ship better Skills. This isn't a Skill design rule. It's a personality trait that leaks into the artifact.

If I had to pick one signal to predict whether a Skill will still be alive in 6 months, it'd be "does the author maintain a MISTAKES.md?" — beating star count, commit frequency, and quality score combined.


Three things I was wrong about

This is the part I almost didn't write.

Wrong #1: "Quality score will surface hidden gems"

I built a 6-dimension quality score (completeness, clarity, specificity, examples, README structure, agent readiness). It's open-source: quality_analyzer.py.

I assumed: if I rank by quality instead of stars, the underrated stuff will float up.

Reality: quality and stars correlate at r = 0.71. The hidden gems mostly aren't hidden — they're below the noise floor. Quality scoring helps within tiers (separating B from A), but it doesn't move pages from rank 5,000 to rank 50.

Wrong #2: "Categories will help users find what they need"

I categorized everything into 7 buckets (mcp-server, claude-skill, codex-skill, agent-tool, etc.). 9.5% of projects ended up in uncategorized — too generic to classify.

Bigger problem: users don't search by category. They search by use case ("PDF parsing", "code review", "Slack integration"). Category is an artifact of how I think, not how anyone uses the site.

I had to build 58 separate /best/{scenario}/ landing pages to fix this.

Wrong #3: "Verified Creator badges will reward real authors"

I designed a Verified Creator program with strict criteria. I even pre-filled the founding member list with 4 well-known names from the ecosystem.

I forgot to ask them.

One of them politely said "I haven't actually joined." I pulled the entire list within 4 hours. The lesson — never pre-announce someone else's name without consent, even when it makes your launch look better — is in chapter 10 of the book if you want the full postmortem.


The chart that explains everything

If you take one image away, take this one:

       LOG SCALE — stars distribution (67K projects)

10K █▍
 5K █████▎
 1K ████████████████▎
500 ███████████████████████████████▎
100 ████████████████████████████████████████████████▎
 10 █████████████████████████████████████████████████████████████████████████████▎
  0 ██████████████████████████████████████████████████████████████████████████████████████████████████ (54%)
Enter fullscreen mode Exit fullscreen mode

Every healthy ecosystem looks like a Pareto curve. This one looks like a wall.


What this means if you build agent stuff

For consumers: don't grade Skills by stars alone. The signal stops being useful below the top 1%. Look at MISTAKES.md, recent commits, and whether the README has decision rules vs. prose.

For authors: you're competing in a market where 99% of attention goes to 1% of projects. Either invest in becoming top 1% (months of consistent shipping) or pick a niche where the top 1% doesn't exist yet (most domain-specific MCPs).

For platform builders: the constraint isn't supply. It's discovery. Whoever solves "how do I find the right Skill in 30 seconds" wins more than whoever ships the next 1,000 Skills.


Where to dig deeper

If you found this useful and want the daily Top 10 picks of newly-indexed projects, AgentSkillsHub has a free newsletter (Mondays only).

If you have data that contradicts any of this, please post it. The point of indexing 67K projects publicly is so the conclusions can be checked.


Cover image and methodology from Skill 蓝皮书 2026 (Chapter 3). All data snapshots are reproducible — pull the script from data/ch03_analysis.py.

Top comments (0)