How I Manage 200+ Agent Skills: L0-L4 Classification + YAML Templates + Python Toolchain
When your Agent project balloons past 200 Skills, "it works" and "it's manageable" are two very different things. In this post, I'll walk you through an open-source governance framework—skill-framework—that uses a five-tier classification model, standardized templates, and an automated toolchain to take Agent skills from wild-west chaos to engineering-grade ops.
1. The Problem: Why Do You Even Need a Skill Governance Framework?
The Agent ecosystem is repeating the same mistake microservices made—grow wild early, lose control later.
Classic symptoms:
| Symptom | What It Looks Like | Root Cause |
|---|---|---|
| Hard to locate | "Where's that credit-check Skill again?" | No unified classification, skills piled flat |
| Dependency chaos | Tweak one atomic Skill, 3 scenario Skills break | Dependencies spread by word of mouth, no explicit declarations |
| Format drift | Same team, different field names and structures | No enforced templates, convention is voluntary |
| Production incidents | New Skill ships with zero security audit | No quality gate, no checklist |
| Reuse deadlock | Project A wrote a Skill, Project B has no idea it exists | No industry blueprints, start from scratch every time |
At 10 Skills, you can keep it all in your head. At 50, you patch with docs. At 200+—you need an engineering framework.
skill-framework exists for exactly this: https://github.com/yuzhaopeng-up/skill-framework
2. The Core: L0-L4 Five-Tier Classification Model
The foundation of the whole framework is a five-tier model. Each tier has a clear responsibility boundary and dependency direction—higher tiers depend on lower ones, lower tiers never know about higher ones.
┌─────────────────────────────────────┐
│ L4 Multi-Agent Agent Team Orchestration │ ← Team orchestration
├─────────────────────────────────────┤
│ L3 Scenario Business Composition │ ← Business composition
├─────────────────────────────────────┤
│ L2 Gateway/Routing Intent Routing │ ← Intent routing
├─────────────────────────────────────┤
│ L1 Base Skills Atomic Skills │ ← Atomic skills
├─────────────────────────────────────┤
│ L0 Infrastructure Infra Connectors │ ← DB/API connectors
└─────────────────────────────────────┘
Tier Breakdown
| Tier | Name | Responsibility | Scope | Typical Examples |
|---|---|---|---|---|
| L0 | Infrastructure | Connect to external systems, wrap data source access | DB connectors, API clients, file I/O |
mysql-connector, redis-client, oss-file-handler
|
| L1 | Base Skill | Atomic operations, single responsibility, independently executable | Info extraction, data query, report generation, security checks |
info-extractor, data-analyst, report-generator, security-guard
|
| L2 | Gateway/Routing | Accept natural language input, identify intent, route to the right L1 Skill | Intent recognition, permission checks, query dispatch |
nl2-query, l3-gw-01 (data query gateway) |
| L3 | Scenario (Ceiling) | Orchestrate multiple L1/L2 Skills into end-to-end business flows | Multi-phase pipelines, business composition |
scoring-engine (opportunity scoring), evidence-chain (evidence chain analysis) |
| L4 | Multi-Agent | Spin up independent sub-Agent teams, role isolation, parallel collaboration | Team orchestration, task scheduling |
agent-teams-orchestrator, l7-arkclaw-01 (enterprise ops assistant) |
Key constraints:
- One-way dependency: L4 → L3 → L2 → L1 → L0, reverse dependencies strictly forbidden
- L1 is stateless: Base skills must be pure-function-style, no session state
- L2 is stateful: Gateway tier manages session context and routing tables
- L3 orchestrates, doesn't execute: Scenario tier only schedules; actual execution sinks to L1
- L4 runs in isolation: Each sub-Agent has independent context, data passes via structured JSON
The biggest value of this model isn't theoretical completeness—it's that it assigns each of the 208 Skills to exactly one tier. When you need to find a Skill, first pin down the tier, then narrow by domain, and you're looking at 10–20 candidates max.
3. YAML Templates: 3 Ready-to-Copy Specifications
skill-framework provides three YAML templates covering the three most common Skill shapes:
| Template | Target Tier | File | Key Feature |
|---|---|---|---|
| L1 Base Skill | L1 Base Skill tier | templates/l1-base-skill.yaml |
Single responsibility, declares inputs/outputs and trigger keywords |
| L2 Gateway Skill | L2 Gateway/Routing tier | templates/l2-gateway-skill.yaml |
Routing table + permission checks + downstream dependency declarations |
| L3 Ceiling Skill | L3 Scenario tier | templates/l3-ceiling-skill.yaml |
Multi-phase orchestration + structured JSON data passing |
L1 Base Skill Template Example
# templates/l1-base-skill.yaml
skill_name: "" # Required: skill name, kebab-case
skill_level: L1 # Required: fixed at L1
version: "1.0.0" # Required: semantic version
description: "" # Required: one-line description
trigger_keywords: [] # Required: trigger keyword list
# - "keyword1"
# - "keyword2"
inputs: # Required: input parameter definitions
- name: "" # Parameter name
type: "" # Type: string/number/boolean/json/file
required: true # Is it required?
description: "" # Parameter description
outputs: # Required: output definitions
- name: ""
type: ""
description: ""
dependencies: # Dependency declarations (L0 only)
- skill_name: ""
version: ">=1.0.0"
usage: "" # What this dependency is used for
execution: # Execution spec
type: prompt # prompt | python | hybrid
timeout_seconds: 300 # Timeout
retry_policy:
max_retries: 2
backoff: exponential
security: # Security declarations
data_access_scope: [] # Data access scope
sensitive_fields: [] # Sensitive field list
audit_logging: true # Enable audit logging?
quality: # Quality metrics
min_accuracy: 0.85 # Minimum accuracy
test_cases: [] # Test case paths
Field design philosophy:
-
skill_levelis mandatory—the toolchain uses it for dependency legality checks -
dependenciesis restricted to same-tier or lower; L1 can only depend on L0 -
securitysection is required for the quality gate—missing any item gets blocked by skill-lint -
qualitysection is currently declarative; future versions will plug into automated test frameworks
Key Differences in L2 and L3 Templates
L2 Gateway adds:
routing_table: # L2-only: routing table
- intent: "" # User intent
target_skill: "" # Target Skill to route to
confidence_threshold: 0.8 # Confidence threshold
permission_check: # L2-only: permission checks
enabled: true
whitelist: []
L3 Ceiling adds:
phases: # L3-only: multi-phase orchestration
- phase: 1
name: ""
skill: "" # L1/L2 Skill being called
input_mapping: {} # Input mapping
output_key: "" # Key to store output
- phase: 2
name: ""
skill: ""
input_mapping: {}
output_key: ""
orchestration: # L3-only: orchestration strategy
mode: sequential # sequential | parallel | conditional
failure_policy: stop # stop | skip | retry
4. Python Toolchain: Scan → Lint → Backfill Pipeline
skill-framework bundles 3 Python tools that form an automated pipeline from discovery to compliance:
inventory-scan ──→ skill-lint ──→ backfill-frontmatter
(scan & build inventory) (compliance check) (backfill frontmatter)
Tool 1: inventory-scan — Scan & Build Inventory
Scans all Skills under a directory, auto-detects tiers, extracts metadata, and generates a unified inventory.
# Basic: scan all Skills in your project
python tools/inventory-scan.py --root ./skills --output ./output
# With tier validation: auto-detect level tag legality
python tools/inventory-scan.py \
--root ./skills \
--output ./output \
--validate-levels
Outputs:
-
unified_skill_inventory.json— Complete inventory of 208 Skills (structured JSON) -
unified_skill_inventory.csv— Same inventory in tabular form (easy Excel filtering) -
skills_dependencies.json— Skill dependency graph
Inventory JSON structure example:
{
"scan_timestamp": "2026-07-01T10:00:00Z",
"total_skills": 208,
"by_level": {
"L0": 12,
"L1": 86,
"L2": 24,
"L3": 58,
"L4": 28
},
"skills": [
{
"name": "info-extractor",
"level": "L1",
"version": "2.1.0",
"description": "从非结构化文本中提取结构化字段",
"trigger_keywords": ["信息提取", "提取字段", "结构化"],
"dependencies": ["mysql-connector@L0"],
"file_path": "skills/info-extractor/SKILL.md"
}
]
}
Tool 2: skill-lint — YAML Compliance Checker
Checks each Skill's YAML declarations against the template spec and outputs a violation report.
# Check a single Skill
python tools/skill-lint.py --target ./skills/info-extractor
# Batch check the whole project
python tools/skill-lint.py --root ./skills --template-dir ./templates
# Strict mode: treat Warnings as Errors
python tools/skill-lint.py --root ./skills --strict
Sample lint rules:
| Rule ID | Level | What It Checks |
|---|---|---|
L001 |
Error | skill_level must be one of L0–L4 |
L002 |
Error | Dependencies must be at a lower tier than the current Skill |
L003 |
Error | security section cannot be empty |
L004 |
Warning | Recommend adding description for every input |
L005 |
Error | L3 Skills must have a phases section |
L006 |
Warning | trigger_keywords should have at least 3 entries |
Output example:
[ERROR] L002: skills/scoring-engine/skill.yaml
→ dependency "data-analyst" is L1, same level as current skill (L3)
→ Suggestion: L3 skills should depend on L1/L2, not other L3 skills directly
[WARNING] L004: skills/info-extractor/skill.yaml
→ Input "raw_text" missing description
→ Suggestion: Add description field for better discoverability
Scan complete: 208 skills checked, 3 errors, 7 warnings
Tool 3: backfill-frontmatter — Auto-Fill Missing Frontmatter
For Skill files missing YAML frontmatter, this tool extracts content from SKILL.md and generates standard frontmatter.
# Dry run: preview what would be backfilled
python tools/backfill-frontmatter.py --root ./skills --dry-run
# After reviewing, apply changes
python tools/backfill-frontmatter.py --root ./skills --apply
Typical scenario: Your team wrote SKILL.md files early on without filling in YAML templates. This tool will:
- Read the description and trigger keywords from SKILL.md
- Infer skill_level from file path and content
- Scan import/require statements to extract dependencies
- Generate template-compliant frontmatter and prepend it to the file
5. Skill Dependency Graph: skills_dependencies.json
The dependency graph is skill-framework's second-largest data asset (after the inventory itself). It explicitly declares the call relationships between Skills.
Structure design:
{
"version": "1.0.0",
"generated_at": "2026-07-01T10:00:00Z",
"nodes": [
{
"id": "info-extractor",
"level": "L1",
"group": "data-processing"
},
{
"id": "scoring-engine",
"level": "L3",
"group": "risk-management"
}
],
"edges": [
{
"from": "scoring-engine",
"to": "info-extractor",
"type": "phase-1",
"data_contract": "structured-json"
},
{
"from": "scoring-engine",
"to": "knowledge-rag",
"type": "phase-2",
"data_contract": "structured-json"
}
],
"orphan_nodes": ["unused-skill-demo"]
}
Three killer use cases:
-
Change impact analysis: Before modifying
info-extractor, check edges to see which downstream Skills likescoring-engineare affected -
Dead skill discovery:
orphan_nodeslists Skills with zero dependencies—candidates for deletion or archival - Tier violation detection: Use alongside skill-lint to catch illegal calls like L1 depending on L3
6. Quality Gate: 6-Step Checklist from Dev to Production
skill-framework ships with audit-checklist.md that defines a 6-step quality gate. Every Skill must pass all items before going live:
| Step | Check Item | Owner | Tool Support |
|---|---|---|---|
| 1. Structural compliance | YAML fields complete, tier correct | Developer | skill-lint |
| 2. Dependency legality | One-way dependencies, no cycles | Developer | inventory-scan --validate-levels |
| 3. Security audit | Minimal data scope, sensitive fields masked | Security reviewer | skill-lint L003 |
| 4. Integration test | End-to-end flow verification, timeout & retry testing | QA engineer | Manual + automated test framework |
| 5. Documentation completeness | README, trigger keyword examples, I/O samples | Developer | backfill-frontmatter --dry-run |
| 6. Production approval | Manual sign-off + archived approval record | Tech lead | Manual |
Practical command sequence:
# Step 1: Structural compliance
python tools/skill-lint.py --root ./skills --strict
# Step 2: Dependency legality
python tools/inventory-scan.py --root ./skills --validate-levels
# Step 3: Security audit (focus on L003 rule)
python tools/skill-lint.py --root ./skills --rule L003 --strict
# Step 4-5: Documentation backfill
python tools/backfill-frontmatter.py --root ./skills --dry-run
# review dry-run output, then:
python tools/backfill-frontmatter.py --root ./skills --apply
# Step 6: Manual approval (review check report, sign off)
cat output/audit-report.md
7. Four Industry Vertical Blueprints
The framework bundles 4 industry blueprints, each pre-defining the core Skill combinations and dependency relationships for that industry:
| Industry | Blueprint File | Core Skill Combo | Special Component |
|---|---|---|---|
| Finance | blueprints/finance.yaml |
Risk scoring, compliance review, investment research, customer profiling |
financial-ai-skills integration |
| Telecom | blueprints/telecom.yaml |
Complaint analysis, field service dispatch, 5G private network assessment, network root cause |
teleagent-skills integration |
| Healthcare | blueprints/healthcare.yaml |
Medical record extraction, diagnosis assistance, medication review, scheduling optimization | HIPAA compliance Skill |
| Government | blueprints/government.yaml |
Document review, policy interpretation, public opinion analysis, approval workflows | Red-header document parser Skill |
How to use:
# Initialize a project based on the finance blueprint
python tools/inventory-scan.py \
--blueprint blueprints/finance.yaml \
--init ./my-finance-project
The blueprint auto-generates the Skill directory skeleton, dependency declarations, and pre-filled YAML templates for that industry.
8. The Full 208-Skill Inventory at a Glance
unified_skill_inventory.json catalogs 208 Skills, distributed by tier as follows:
| Tier | Count | Share | Representative Skills |
|---|---|---|---|
| L0 | 12 | 5.8% | mysql-connector, redis-client, oss-handler |
| L1 | 86 | 41.3% | info-extractor, data-analyst, report-generator, security-guard, knowledge-rag |
| L2 | 24 | 11.5% | nl2-query, l3-gw-01, data-query-gateway |
| L3 | 58 | 27.9% | scoring-engine, evidence-chain, live-stream-script-system, contract-review |
| L4 | 28 | 13.5% | agent-teams-orchestrator, l7-arkclaw-01, auto-pilot |
The inventory comes in both JSON and CSV—JSON for programmatic use, CSV for Excel filtering and human review.
9. Quick Start
# Clone the repo
git clone https://github.com/yuzhaopeng-up/skill-framework.git
cd skill-framework
# Install dependencies
pip install -r requirements.txt
# Step 1: Scan your Skill project
python tools/inventory-scan.py --root /path/to/your/skills --output ./output
# Step 2: Compliance check
python tools/skill-lint.py --root /path/to/your/skills --template-dir ./templates
# Step 3: Backfill missing metadata
python tools/backfill-frontmatter.py --root /path/to/your/skills --dry-run
# Confirm and apply
python tools/backfill-frontmatter.py --root /path/to/your/skills --apply
Repo structure:
skill-framework/
├── templates/ # 3 YAML templates
│ ├── l1-base-skill.yaml
│ ├── l2-gateway-skill.yaml
│ └── l3-ceiling-skill.yaml
├── tools/ # 3 Python tools
│ ├── inventory-scan.py
│ ├── skill-lint.py
│ └── backfill-frontmatter.py
├── blueprints/ # 4 industry blueprints
│ ├── finance.yaml
│ ├── telecom.yaml
│ ├── healthcare.yaml
│ └── government.yaml
├── data/ # Data assets
│ ├── unified_skill_inventory.json
│ ├── unified_skill_inventory.csv
│ └── skills_dependencies.json
├── docs/
│ └── audit-checklist.md # Quality gate checklist
├── LICENSE # MIT
└── README.md
10. Design Trade-offs
| Decision | Choice | Rejected Alternative | Reason |
|---|---|---|---|
| Classification model | 5 tiers | 3 tiers, 7 tiers | 5 tiers balances granularity and complexity |
| Template format | YAML | JSON Schema, TOML | YAML is readable, comment-friendly, and dominant in the Agent ecosystem |
| Dependency declarations | Static files | Runtime discovery | Static declarations enable offline checks and are secure/controllable |
| Tooling language | Python | Node.js | Higher Python penetration in AI/data teams |
| License | MIT | Apache 2.0 | MIT is the most permissive, lowers adoption barrier |
The Open-Source Agent Skills Ecosystem
skill-framework isn't an isolated project—it's the governance hub of an open-source Agent Skills ecosystem. These 5 repos work together to cover the full chain from Skill development to industry adoption:
| Repo | Role | GitHub |
|---|---|---|
| skill-framework | L0-L4 classification + YAML templates + Python toolchain | https://github.com/yuzhaopeng-up/skill-framework |
| financial-ai-skills | Finance industry vertical Skill set (risk, compliance, research) | https://github.com/yuzhaopeng-up/financial-ai-skills |
| teleagent-skills | Telecom industry vertical Skill set (field service, complaints, 5G) | https://github.com/yuzhaopeng-up/teleagent-skills |
| agent-cluster-comm | Multi-Agent cluster communication protocol & orchestration engine | https://github.com/yuzhaopeng-up/agent-cluster-comm |
| fintech-h5-demos | Fintech H5 interactive demos (courses/training) | https://github.com/yuzhaopeng-up/fintech-h5-demos |
How they work together:
-
skill-frameworkdefines the standards and tools; other repos follow its classification system and template specs -
financial-ai-skillsandteleagent-skillsare two concrete implementations of industry blueprints -
agent-cluster-commprovides the L4 multi-Agent orchestration communication protocol; skill-framework's L4 template is based on it -
fintech-h5-demosis the front-end display layer, visualizing Skill execution results as interactive H5
License: MIT — use it, fork it, modify it, just don't delete the copyright notice.
If this framework helped you make sense of Skill governance, drop a Star on https://github.com/yuzhaopeng-up/skill-framework
Top comments (0)