airano

Posted on Feb 3

How I Indexed 172,000+ AI Agent Skills Using Multi-Strategy Discovery

#opensource #ai #github #javascript

GitHub's search API has a hard limit: 1,000 results per query.

We have 172,000+ skills indexed.

Here's how we built a discovery system that found them all—without breaking any rules.

The Problem: Skills Are Everywhere

AI agents like Claude Code, OpenAI Codex, and GitHub Copilot use SKILL.md files to learn new capabilities. These skills teach agents how to handle PDFs, write Excel formulas, follow brand guidelines, and much more.

The problem? These skills are scattered across thousands of GitHub repositories:

Some live in ~/.claude/skills/
Others in .github/skills/
Many in random skills/ folders
And countless more in personal dotfiles repos

Finding the right skill is like searching for a needle in a haystack of haystacks.

I tried GitHub's search: filename:SKILL.md. It returned results, but never more than 1,000. The GitHub API documentation confirms this limit—and there's no way around it with a single query.

So I built something different.

Our Approach: Multi-Strategy Discovery

Instead of fighting the 1,000-result limit, we work with it by running multiple specialized searches. Each strategy targets a different slice of the skill ecosystem.

Strategy 1: Path-Based Search

Skills follow predictable directory patterns. We search each path separately:

filename:SKILL.md path:skills
filename:SKILL.md path:.claude
filename:SKILL.md path:.github
filename:SKILL.md path:.codex

Each query can return up to 1,000 results. Four queries = up to 4,000 potential discoveries.

Strategy 2: File Size Segmentation

GitHub lets you filter by file size. We segment our searches:

filename:SKILL.md size:<1000      # Small skills
filename:SKILL.md size:1000..5000 # Medium skills
filename:SKILL.md size:>5000      # Large skills

Same file, different queries, different result sets.

Strategy 3: Topic-Based Discovery

Many skill repositories use GitHub topics. We search for repos tagged with:

claude-skills
agent-skills
ai-skills
mcp-skills
llm-skills

Then deep-scan each repository for SKILL.md files.

Strategy 4: Awesome List Crawling

The community maintains curated lists of skills:

awesome-claude-skills
awesome-agent-skills
awesome-copilot

We parse these lists and index every linked repository.

Strategy 5: Fork Network Traversal

When we find a popular skills repository, we also check its forks. Forks often contain additional or modified skills that never made it back to the original repo.

The Stack

Here's what powers the discovery and search:

Component	Technology	Purpose
Web App	Next.js 15	Marketplace UI
Database	PostgreSQL	Skill metadata, ratings
Search	Meilisearch	Full-text search with typo tolerance
Queue	Redis + BullMQ	Background crawl jobs
CLI	Node.js	Install skills from terminal

The indexer runs on a schedule:

Daily: Incremental crawl (new/updated skills)
Weekly: Full discovery (all strategies)
On-demand: Process user-submitted repositories

All queries use authenticated GitHub API requests with proper rate limit handling. We rotate between multiple tokens to stay well within limits.

Results

After running our multi-strategy discovery:

Metric	Count
Skills Indexed	172,000+
Contributors	4,000+
Categories	30
Platforms	Claude, Codex, Copilot

The search is fast. Type "pdf" and get relevant results in milliseconds, ranked by GitHub stars, download count, and security status.

Every skill is scanned for:

Dangerous shell commands
Prompt injection patterns
Data exfiltration attempts

Skills that pass get a green checkmark. Those with issues get flagged.

Try It Now

Install the CLI:

npm install -g skillhub

Search for skills:

skillhub search pdf

Install a skill:

skillhub install anthropics/skills/pdf

Or browse all 172,000+ skills on the web:

skills.palebluedot.live

What's Next

We're working on:

Native Claude Code integration via MCP protocol
Skill verification with author confirmation
Usage analytics so you know which skills actually work

The entire project is open source under MIT license.

Your Turn

What skills would you like to see indexed? Any repositories we should add?

Drop a comment below—I read every one.

Built with Next.js, PostgreSQL, Meilisearch, and way too much coffee.

DEV Community