<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adam Lankamer</title>
    <description>The latest articles on DEV Community by Adam Lankamer (@adamlankamer).</description>
    <link>https://dev.to/adamlankamer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949472%2F924f361e-a6bb-461d-8be5-2d5a4679ff26.webp</url>
      <title>DEV Community: Adam Lankamer</title>
      <link>https://dev.to/adamlankamer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adamlankamer"/>
    <language>en</language>
    <item>
      <title>How I indexed 69,000 Claude Code skills (and what I learned doing it)</title>
      <dc:creator>Adam Lankamer</dc:creator>
      <pubDate>Sun, 24 May 2026 19:46:54 +0000</pubDate>
      <link>https://dev.to/adamlankamer/how-i-indexed-69000-claude-code-skills-and-what-i-learned-doing-it-76f</link>
      <guid>https://dev.to/adamlankamer/how-i-indexed-69000-claude-code-skills-and-what-i-learned-doing-it-76f</guid>
      <description>

&lt;p&gt;One month ago I started building an open catalog of Claude Code skills. Yesterday it crossed &lt;strong&gt;69,369 indexed &lt;code&gt;SKILL.md&lt;/code&gt; files&lt;/strong&gt;. This post is the engineering story — what I built, what surprised me, and what's free for anyone to use.&lt;/p&gt;

&lt;p&gt;If you've never written a Claude Code skill: it's a Markdown file with YAML frontmatter that gives Anthropic's Claude Code agent specialized behavior. Drop it in &lt;code&gt;~/.claude/skills/&amp;lt;name&amp;gt;/SKILL.md&lt;/code&gt; and Claude can invoke it as a slash command. Think of it like a Vim plugin or a VSCode extension, except the contract is "instructions in English" rather than "code in Lua / TypeScript."&lt;/p&gt;

&lt;p&gt;The format is brand-new. The official spec doesn't ship a catalog. The awesome-* lists I could find at the time covered maybe 300 hand-picked entries. Meanwhile, GitHub's code search showed thousands of public repos with &lt;code&gt;SKILL.md&lt;/code&gt; files in them. &lt;strong&gt;The long tail of the ecosystem was completely invisible.&lt;/strong&gt; That's the gap I set out to close.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the problem
&lt;/h2&gt;

&lt;p&gt;Here's what I knew going in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Discovery was broken.&lt;/strong&gt; A skill author would push their &lt;code&gt;SKILL.md&lt;/code&gt; to GitHub and ... nothing. No directory, no aggregator, no search surface. The only way another developer found it was Twitter, Discord, or stumbling onto the repo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality varied wildly.&lt;/strong&gt; Some skills were 200-line operator-grade tools with pricing tables, anti-trigger sections, and structured examples. Others were 4-line stubs that read like "TODO: write a skill that does X." Both were indexable, neither was distinguishable from outside.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The format itself was changing fast.&lt;/strong&gt; The frontmatter spec gained fields monthly — &lt;code&gt;allowed-tools&lt;/code&gt;, &lt;code&gt;user-invokable&lt;/code&gt;, &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;metadata.api_base&lt;/code&gt;. Yesterday's "good" SKILL.md could be tomorrow's missing-required-field.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;There was no good API surface.&lt;/strong&gt; If you wanted to build something on top of the skill ecosystem (a tool for evaluating skills, a recommender, an installer), you had to scrape GitHub yourself.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I wanted a catalog that fixed all four. Open data, daily refresh, free API, free dataset. No pay-to-list, no listing fees, no ranking-for-money. &lt;strong&gt;The only paid product would be an evaluation layer for end-users (a quality score in the desktop app), never anything skill authors had to opt into.&lt;/strong&gt; Anti-rent-seeking by construction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The miner — 24 sources, every night
&lt;/h2&gt;

&lt;p&gt;The catalog is built by a single Python script that runs on a Mac mini in my office at 01:00 local. It crawls 24 public sources looking for &lt;code&gt;SKILL.md&lt;/code&gt; files:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it discovers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub code search (&lt;code&gt;filename:SKILL.md&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;The bulk of the catalog — 101 query variants covering language hints, frontmatter fields, and date-bounded slices to defeat the 1000-result hard cap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Topics (&lt;code&gt;topic:claude-code-skills&lt;/code&gt;) + 31 variants&lt;/td&gt;
&lt;td&gt;Topic-tagged repos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Gists&lt;/td&gt;
&lt;td&gt;Single-file skills posted as gists (most catalogs miss these)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Awesome-list READMEs (32 lists)&lt;/td&gt;
&lt;td&gt;Anything the existing curators picked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitLab, Codeberg&lt;/td&gt;
&lt;td&gt;Skills outside GitHub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HuggingFace&lt;/td&gt;
&lt;td&gt;Skills uploaded as datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reddit, HackerNews Algolia, Bluesky, Mastodon, dev.to, YouTube, Telegram&lt;/td&gt;
&lt;td&gt;Mentions in posts/comments — text-blob scan for repo URLs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wayback Machine CDX API&lt;/td&gt;
&lt;td&gt;Renamed / deleted repos still discoverable via archive.org&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stargazer graph mining&lt;/td&gt;
&lt;td&gt;Once we find one good skill repo, mine who starred it — they often have skills too&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Author repo enumeration&lt;/td&gt;
&lt;td&gt;When we admit one of an author's skills, scan their other repos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Topic co-occurrence&lt;/td&gt;
&lt;td&gt;Topics tagged alongside &lt;code&gt;claude-code-skills&lt;/code&gt; get crawled for next run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VSCode + Open VSX marketplaces&lt;/td&gt;
&lt;td&gt;Some extensions ship with SKILL.md companions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brave Search API&lt;/td&gt;
&lt;td&gt;Web-search-anchored discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM query expansion&lt;/td&gt;
&lt;td&gt;Claude generates next-week's search queries based on what's been found&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each source returns candidate repo URLs. The miner fetches the &lt;code&gt;SKILL.md&lt;/code&gt;, validates the YAML frontmatter, runs admission scoring (more on this below), categorizes by domain (Engineering / Security / Growth / etc. — 10 categories total), tags across ~100 orthogonal dimensions (language, framework, AI provider, cloud, integration type), and writes a static HTML page at &lt;code&gt;/skills/&amp;lt;slug&amp;gt;/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The miner is bounded: per-source caps prevent any one source from draining the GitHub API budget; every section runs inside a &lt;code&gt;_safe_section()&lt;/code&gt; try-block so a single broken endpoint can't kill the run.&lt;/p&gt;

&lt;p&gt;A full run takes about 4 hours. New skills appear on the live catalog the same day they're discovered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Admission — content signals only, no popularity
&lt;/h2&gt;

&lt;p&gt;This is the part I'm most opinionated about. &lt;strong&gt;Ranking can't be bought.&lt;/strong&gt; The moment a paid signal influences who appears in the catalog (or in what order), the value proposition collapses — nobody pays for "objective evaluation" when it isn't objective.&lt;/p&gt;

&lt;p&gt;So the catalog admits skills based on a content score derived from the &lt;code&gt;SKILL.md&lt;/code&gt; itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anti-trigger discipline&lt;/strong&gt; — does the SKILL.md have a "when NOT to use" or "out of scope" section? That's a +4 per pattern, capped at +16. Strong negative-space marking is the single best signal that the author thought about edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing / quota transparency&lt;/strong&gt; — does it document costs, rate limits, or expected API spend? +10.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontmatter depth&lt;/strong&gt; — beyond &lt;code&gt;name:&lt;/code&gt; and &lt;code&gt;description:&lt;/code&gt;, how many other fields are present (&lt;code&gt;model:&lt;/code&gt;, &lt;code&gt;tags:&lt;/code&gt;, &lt;code&gt;version:&lt;/code&gt;, &lt;code&gt;license:&lt;/code&gt;, &lt;code&gt;allowed-tools:&lt;/code&gt;, &lt;code&gt;metadata.*&lt;/code&gt;)? Capped at 10 distinct keys to prevent gaming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Length × structure&lt;/strong&gt; — is the body substantive (&amp;gt;800 chars in &lt;code&gt;description:&lt;/code&gt;, multiple code blocks, headings)?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filler-phrase penalty&lt;/strong&gt; — &lt;code&gt;// TODO&lt;/code&gt;, &lt;code&gt;Lorem ipsum&lt;/code&gt;, generic templated phrases → minus 5.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The score never weighs stars, forks, install counts, GitHub follower count, or any other popularity signal. &lt;strong&gt;A skill written by a developer with 0 GitHub followers and a clear anti-trigger section beats a flashy skill by a 50k-follower influencer that's just frontmatter-and-vibes.&lt;/strong&gt; That's the bar.&lt;/p&gt;

&lt;p&gt;For ranking inside the desktop app's Pro tier — a separate evaluation layer — the formula is the same content-only structural score plus frontmatter-completeness, rescaled to [50, 100]. Still no popularity signals.&lt;/p&gt;

&lt;p&gt;This costs me about 30% of what an unconstrained "rank by stars" catalog would surface. I'm OK with that trade.&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The catalog is dominated by a handful of prolific authors.&lt;/strong&gt; One contributor has 3,446 admitted skills (yes, really). The top 25 authors account for ~30% of the catalog. There's a Pareto distribution underneath the long tail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Sales-category skills score highest on content quality.&lt;/strong&gt; Counter-intuitive — I expected Engineering or Security to be most polished. Turns out sales-focused skill authors over-index on structure (anti-trigger sections, scope discipline, pricing transparency) because that's their professional habit. Engineering authors more often skip the "when NOT to use" section because they assume it's obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Vendor-side adoption is still 0.&lt;/strong&gt; The catalog has zero skills with &lt;code&gt;author_url&lt;/code&gt; pointing at anthropic.com, OpenAI.com, or any other large AI vendor. Every entry is independent. The ecosystem is fully community-driven.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The SKILL.md format is leaking sideways.&lt;/strong&gt; I found skills in repos tagged &lt;code&gt;cline-skills&lt;/code&gt;, &lt;code&gt;cursor-rules&lt;/code&gt;, &lt;code&gt;aider-skills&lt;/code&gt;, &lt;code&gt;windsurf-rules&lt;/code&gt;. The format is becoming a portable agent-skill standard, not just a Claude Code thing. The catalog admits these too — they're SKILL.md files, the agent that loads them is the user's choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The biggest discovery surface isn't GitHub code search.&lt;/strong&gt; It's the stargazer graph. When a SKILL.md hits a few hundred stars, the people who star it have a 30%+ rate of having their own SKILL.md somewhere in their account. Mining the graph yields skills the code-search queries don't find.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's free
&lt;/h2&gt;

&lt;p&gt;Everything the catalog produces is open:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public catalog&lt;/strong&gt; at &lt;code&gt;https://claudskills.com/&lt;/code&gt; — browseable + searchable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open dataset&lt;/strong&gt; at &lt;a href="https://github.com/claudskills/catalog-public" rel="noopener noreferrer"&gt;github.com/claudskills/catalog-public&lt;/a&gt; — daily refresh in 6 formats (JSON, NDJSON, CSV, Parquet, Atom feed, README). CC BY 4.0.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HuggingFace mirror&lt;/strong&gt; at &lt;a href="https://huggingface.co/datasets/claudskills/skills" rel="noopener noreferrer"&gt;huggingface.co/datasets/claudskills/skills&lt;/a&gt; — same data, parquet-native, suitable for LLM training.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public REST API&lt;/strong&gt; at &lt;code&gt;https://claudskills.com/api/v1/&lt;/code&gt; — read-only, no auth, CORS-open, edge-cached. &lt;a href="https://claudskills.com/api/v1/openapi.json" rel="noopener noreferrer"&gt;OpenAPI 3.1 spec&lt;/a&gt; covers every endpoint. Paginated &lt;code&gt;/skills&lt;/code&gt;, single-skill &lt;code&gt;/skills/&amp;lt;slug&amp;gt;&lt;/code&gt;, &lt;code&gt;/categories&lt;/code&gt;, &lt;code&gt;/tags&lt;/code&gt;, &lt;code&gt;/stats&lt;/code&gt;. The catalog API itself is ~300 LOC of Cloudflare Worker code; the heavy lifting is the daily miner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddable skill card&lt;/strong&gt; at &lt;code&gt;https://claudskills.com/embed/&amp;lt;slug&amp;gt;.js&lt;/code&gt; — one-line &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag that injects a styled card into any blog post or doc page. The card you'd drop into your own writeup of a favorite skill.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shields.io-style badge&lt;/strong&gt; at &lt;code&gt;https://claudskills.com/badge/&amp;lt;slug&amp;gt;.svg&lt;/code&gt; — for skill authors to drop into their GitHub READMEs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily Skill-of-the-Day archive&lt;/strong&gt; at &lt;code&gt;/sotd/YYYY-MM-DD/&lt;/code&gt; — every UTC day picks one skill via a date-hash that stays consistent across mobile push, social posts, and the web.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-category, per-tag, per-author, and per-use-case landing pages&lt;/strong&gt; — about 2,800 hub pages total covering the catalog from every browsing angle.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd change if starting over
&lt;/h2&gt;

&lt;p&gt;A few things I learned the hard way:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build the public dataset first, the website second.&lt;/strong&gt; I spent the first two months making the website nice. The dataset would have driven more usage faster — researchers and tool-builders pick up CC BY 4.0 data within days of finding it; consumer-facing UIs take months to build word-of-mouth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cloudflare Workers + R2 + Netlify together is more reliable than any one of them.&lt;/strong&gt; The site has 64,000+ per-skill HTML pages, which would blow Netlify's deploy-prep budget at scale. So per-skill HTML files live in Cloudflare R2 with a Netlify rewrite to serve them from &lt;code&gt;claudskills.com/skills/&amp;lt;slug&amp;gt;/&lt;/code&gt;. API + embed + badge endpoints are Cloudflare Workers bound to the same domain. The homepage + static pages are direct from Netlify. Each layer doing what it's best at.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anti-popularity signals were the hardest decision and the most important one.&lt;/strong&gt; Every time I evaluate a candidate change to the ranking algorithm, "would skill authors pay to influence this?" is the test. If yes, the change doesn't ship. The discipline pays off when you have a Pro subscription product — it's "pay $9/month for the multi-signal Quality Score in the desktop app," and there's nothing for me to defend about why the score is honest. It's honest by construction.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The next quarter is about distribution — the catalog exists, now developers need to find it. The roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;25 awesome-list PRs (live next week)&lt;/li&gt;
&lt;li&gt;A weekly catalog-growth report cross-posted to dev.to / Hashnode / Medium / LinkedIn&lt;/li&gt;
&lt;li&gt;Embed cards in third-party blog posts (the API is ready; the inbound demand will tell us if the embed surface gets traction)&lt;/li&gt;
&lt;li&gt;iOS and Android companion apps for discovery (already in App Store review at the time of writing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've written a SKILL.md, it's probably already in the catalog — search for your repo name at &lt;a href="https://claudskills.com/" rel="noopener noreferrer"&gt;claudskills.com&lt;/a&gt;. If you haven't, the catalog will pick it up within 24 hours of you pushing to a public GitHub repo. If you want to fast-track it, there's a submit form on the homepage.&lt;/p&gt;

&lt;p&gt;If you're a researcher, a tool-builder, or an LLM-pipeline operator who wants to ingest the data: the &lt;a href="https://github.com/claudskills/catalog-public" rel="noopener noreferrer"&gt;public dataset&lt;/a&gt; refreshes daily, and the &lt;a href="https://claudskills.com/api/" rel="noopener noreferrer"&gt;API&lt;/a&gt; is rate-limit-free for normal use. Build something cool — I'd love to hear about it.&lt;/p&gt;

&lt;p&gt;The catalog is at &lt;a href="https://claudskills.com/" rel="noopener noreferrer"&gt;claudskills.com&lt;/a&gt;. The dataset is at &lt;a href="https://github.com/claudskills/catalog-public" rel="noopener noreferrer"&gt;github.com/claudskills/catalog-public&lt;/a&gt;. Comments + questions welcome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ClaudSkills is an independent community catalog. Claude™ is a trademark of Anthropic PBC; ClaudSkills is not affiliated with, endorsed by, or sponsored by Anthropic.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
