DEV Community

Cover image for How I Indexed 172,000+ AI Agent Skills Using Multi-Strategy Discovery
airano
airano

Posted on

How I Indexed 172,000+ AI Agent Skills Using Multi-Strategy Discovery

GitHub's search API has a hard limit: 1,000 results per query.

We have 172,000+ skills indexed.

Here's how we built a discovery system that found them all—without breaking any rules.


The Problem: Skills Are Everywhere

AI agents like Claude Code, OpenAI Codex, and GitHub Copilot use SKILL.md files to learn new capabilities. These skills teach agents how to handle PDFs, write Excel formulas, follow brand guidelines, and much more.

The problem? These skills are scattered across thousands of GitHub repositories:

  • Some live in ~/.claude/skills/
  • Others in .github/skills/
  • Many in random skills/ folders
  • And countless more in personal dotfiles repos

Finding the right skill is like searching for a needle in a haystack of haystacks.

I tried GitHub's search: filename:SKILL.md. It returned results, but never more than 1,000. The GitHub API documentation confirms this limit—and there's no way around it with a single query.

So I built something different.

SkillHub Homepage


Our Approach: Multi-Strategy Discovery

Instead of fighting the 1,000-result limit, we work with it by running multiple specialized searches. Each strategy targets a different slice of the skill ecosystem.

Strategy 1: Path-Based Search

Skills follow predictable directory patterns. We search each path separately:

filename:SKILL.md path:skills
filename:SKILL.md path:.claude
filename:SKILL.md path:.github
filename:SKILL.md path:.codex
Enter fullscreen mode Exit fullscreen mode

Each query can return up to 1,000 results. Four queries = up to 4,000 potential discoveries.

Strategy 2: File Size Segmentation

GitHub lets you filter by file size. We segment our searches:

filename:SKILL.md size:<1000      # Small skills
filename:SKILL.md size:1000..5000 # Medium skills
filename:SKILL.md size:>5000      # Large skills
Enter fullscreen mode Exit fullscreen mode

Same file, different queries, different result sets.

Strategy 3: Topic-Based Discovery

Many skill repositories use GitHub topics. We search for repos tagged with:

  • claude-skills
  • agent-skills
  • ai-skills
  • mcp-skills
  • llm-skills

Then deep-scan each repository for SKILL.md files.

Strategy 4: Awesome List Crawling

The community maintains curated lists of skills:

  • awesome-claude-skills
  • awesome-agent-skills
  • awesome-copilot

We parse these lists and index every linked repository.

Strategy 5: Fork Network Traversal

When we find a popular skills repository, we also check its forks. Forks often contain additional or modified skills that never made it back to the original repo.


The Stack

Here's what powers the discovery and search:

Component Technology Purpose
Web App Next.js 15 Marketplace UI
Database PostgreSQL Skill metadata, ratings
Search Meilisearch Full-text search with typo tolerance
Queue Redis + BullMQ Background crawl jobs
CLI Node.js Install skills from terminal

The indexer runs on a schedule:

  • Daily: Incremental crawl (new/updated skills)
  • Weekly: Full discovery (all strategies)
  • On-demand: Process user-submitted repositories

All queries use authenticated GitHub API requests with proper rate limit handling. We rotate between multiple tokens to stay well within limits.


Results

After running our multi-strategy discovery:

Metric Count
Skills Indexed 172,000+
Contributors 4,000+
Categories 30
Platforms Claude, Codex, Copilot

Search Results

The search is fast. Type "pdf" and get relevant results in milliseconds, ranked by GitHub stars, download count, and security status.

Every skill is scanned for:

  • Dangerous shell commands
  • Prompt injection patterns
  • Data exfiltration attempts

Skills that pass get a green checkmark. Those with issues get flagged.


Try It Now

Install the CLI:

npm install -g skillhub
Enter fullscreen mode Exit fullscreen mode

Search for skills:

skillhub search pdf
Enter fullscreen mode Exit fullscreen mode

Install a skill:

skillhub install anthropics/skills/pdf
Enter fullscreen mode Exit fullscreen mode

CLI Demo

Or browse all 172,000+ skills on the web:

skills.palebluedot.live


What's Next

We're working on:

  1. Native Claude Code integration via MCP protocol
  2. Skill verification with author confirmation
  3. Usage analytics so you know which skills actually work

The entire project is open source under MIT license.


Your Turn

What skills would you like to see indexed? Any repositories we should add?

Drop a comment below—I read every one.


Built with Next.js, PostgreSQL, Meilisearch, and way too much coffee.

Top comments (0)