The most valuable AI agent skills are buried in GitHub repos

#ai #agents #machinelearning #opensource

Most agent skills people create are a rewrite of what the LLM already knows. Claude Code's skill-creator, "save this as a skill" at the end of a chat. These tools are easy to use, but the output rarely adds real capability. The LLM can already do what the skill describes.

Skills become valuable when they encode procedural knowledge. Not facts. Not prompts. Procedures that someone found through debugging and iteration. That kind of knowledge is buried in open-source repositories on GitHub. In March 2026, a research team at East China Normal University proposed a 3-stage pipeline that extracts skills from OSS repos and converts them to standard SKILL.md format.

Why writing skills by hand does not scale

There are three ways to build agent skills.

Manual creation by experts produces high quality. Anthropic adopted this approach officially. OpenAI Codex also supports SKILL.md compatibility. But it does not scale. Each skill requires domain knowledge and testing time. When an agent needs hundreds of skills, manual creation breaks down.

Autonomous discovery by the agent is the second path. EvoSkill, covered in a previous post, takes this approach. It scales, but semantic consistency is hard to maintain. The quality of auto-generated skills varies widely.

OSS mining is the third path and the focus of this paper. Agent repositories on GitHub contain procedures that someone spent time debugging and iterating on. The framework finds those procedures automatically and converts them to standard format. It reuses existing human knowledge, so semantic consistency is higher than generating from scratch.

How the pipeline works, and what it found

The pipeline has three stages.

Repository structure analysis comes first. Tools like repo2AI convert the entire codebase to Markdown and map core scripts and helper modules in a hierarchy.

Semantic skill identification follows. Code modules are converted to dense vectors. A bi-encoder calculates cosine similarity to narrow down candidates. A cross-encoder then refines the ranking. Only modules that pass four promotion criteria become skill candidates. The criteria are recurrence (appears in multiple contexts), verified (works and is documented), non-obviousness (required domain expertise to discover), and generalizability (can be parameterized for other contexts). Modules that fail any criterion are not promoted. This filter prevents both extremes of "make everything a skill" and "make nothing a skill."

SKILL.md conversion is the final stage. Identified patterns are standardized into three layers. YAML frontmatter for metadata. Markdown body for procedures. An assets directory for scripts and templates. Hardcoded paths and API keys are removed to make the skill portable.

The team tested this on two repositories. TheoremExplainAgent from TIGER AI Lab generates Manim animations that explain STEM theorems using a 2-agent system (Planner and Coding Agent). Code2Video from Show Lab at the National University of Singapore generates educational videos using a 3-agent system (Planner, Coder, Critic). It was accepted at the NeurIPS 2025 Deep Learning for Code Workshop. The Code2Video paper reports a 40-point improvement in knowledge transfer efficiency when comparing its full pipeline to a baseline code generation model. TeachQuiz scores went from about 40 to about 80. (arXiv:2510.01174)

For managing large skill libraries, the paper proposes SkillNet. It is an ontology-based structure that connects skills through relationships like "is a subset of" and "requires output from." The paper cites a 30% reduction in execution steps and 40% improvement in task reward, though the experimental conditions behind these numbers are limited.

Caveats

This paper is a preprint, submitted March 12, 2026. It has not been peer-reviewed. The 40-point improvement in knowledge transfer efficiency comes from the Code2Video paper (arXiv:2510.01174), not from the mining framework itself. The framework was tested on only two education-focused repositories. Its applicability to other domains has not been verified. The framework code has not been open-sourced.

On security, a survey paper (arXiv:2602.12430) found vulnerabilities in 26.1% of community-distributed skills. Data theft accounts for 13.3% and privilege escalation for 11.8%. OSS mining amplifies this risk. The paper proposes a 4-stage verification pipeline from static analysis (G1) to permission verification (G4), but no production deployment has been reported.

The SKILL.md specification was published as an open standard by Anthropic on December 18, 2025. OpenAI has documented compatibility in both Codex and its API. The output format is becoming an industry standard.

Conclusion

Skill mines are on GitHub. Manual creation does not scale. Autonomous discovery is inconsistent. OSS mining reuses existing human knowledge, making it a credible third path for skill acquisition.

The four promotion criteria from this paper, recurrence, verified, non-obviousness, and generalizability, work as a practical filter for deciding what should become a skill. You do not need the full pipeline to use them. Start by looking at the OSS repositories your team already uses. There may be procedural knowledge buried in the code that is worth extracting.

DEV Community

The most valuable AI agent skills are buried in GitHub repos

Why writing skills by hand does not scale

How the pipeline works, and what it found

Caveats

Conclusion

Top comments (0)