DEV Community

兆鹏 于
兆鹏 于

Posted on

How to Govern 200+ Agent Skills: L0-L4 Classification, YAML Templates, and Python Toolchain

How I Manage 200+ Agent Skills: L0-L4 Classification + YAML Templates + Python Toolchain

When your Agent project balloons past 200 Skills, "it works" and "it's manageable" are two very different things. In this post, I'll walk you through an open-source governance framework—skill-framework—that uses a five-tier classification model, standardized templates, and an automated toolchain to take Agent skills from wild-west chaos to engineering-grade ops.

1. The Problem: Why Do You Even Need a Skill Governance Framework?

The Agent ecosystem is repeating the same mistake microservices made—grow wild early, lose control later.

Classic symptoms:

Symptom What It Looks Like Root Cause
Hard to locate "Where's that credit-check Skill again?" No unified classification, skills piled flat
Dependency chaos Tweak one atomic Skill, 3 scenario Skills break Dependencies spread by word of mouth, no explicit declarations
Format drift Same team, different field names and structures No enforced templates, convention is voluntary
Production incidents New Skill ships with zero security audit No quality gate, no checklist
Reuse deadlock Project A wrote a Skill, Project B has no idea it exists No industry blueprints, start from scratch every time

At 10 Skills, you can keep it all in your head. At 50, you patch with docs. At 200+—you need an engineering framework.

skill-framework exists for exactly this: https://github.com/yuzhaopeng-up/skill-framework


2. The Core: L0-L4 Five-Tier Classification Model

The foundation of the whole framework is a five-tier model. Each tier has a clear responsibility boundary and dependency direction—higher tiers depend on lower ones, lower tiers never know about higher ones.

┌─────────────────────────────────────┐
│  L4 Multi-Agent   Agent Team Orchestration │  ← Team orchestration
├─────────────────────────────────────┤
│  L3 Scenario       Business Composition    │  ← Business composition
├─────────────────────────────────────┤
│  L2 Gateway/Routing  Intent Routing       │  ← Intent routing
├─────────────────────────────────────┤
│  L1 Base Skills    Atomic Skills         │  ← Atomic skills
├─────────────────────────────────────┤
│  L0 Infrastructure  Infra Connectors      │  ← DB/API connectors
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Tier Breakdown

Tier Name Responsibility Scope Typical Examples
L0 Infrastructure Connect to external systems, wrap data source access DB connectors, API clients, file I/O mysql-connector, redis-client, oss-file-handler
L1 Base Skill Atomic operations, single responsibility, independently executable Info extraction, data query, report generation, security checks info-extractor, data-analyst, report-generator, security-guard
L2 Gateway/Routing Accept natural language input, identify intent, route to the right L1 Skill Intent recognition, permission checks, query dispatch nl2-query, l3-gw-01 (data query gateway)
L3 Scenario (Ceiling) Orchestrate multiple L1/L2 Skills into end-to-end business flows Multi-phase pipelines, business composition scoring-engine (opportunity scoring), evidence-chain (evidence chain analysis)
L4 Multi-Agent Spin up independent sub-Agent teams, role isolation, parallel collaboration Team orchestration, task scheduling agent-teams-orchestrator, l7-arkclaw-01 (enterprise ops assistant)

Key constraints:

  • One-way dependency: L4 → L3 → L2 → L1 → L0, reverse dependencies strictly forbidden
  • L1 is stateless: Base skills must be pure-function-style, no session state
  • L2 is stateful: Gateway tier manages session context and routing tables
  • L3 orchestrates, doesn't execute: Scenario tier only schedules; actual execution sinks to L1
  • L4 runs in isolation: Each sub-Agent has independent context, data passes via structured JSON

The biggest value of this model isn't theoretical completeness—it's that it assigns each of the 208 Skills to exactly one tier. When you need to find a Skill, first pin down the tier, then narrow by domain, and you're looking at 10–20 candidates max.


3. YAML Templates: 3 Ready-to-Copy Specifications

skill-framework provides three YAML templates covering the three most common Skill shapes:

Template Target Tier File Key Feature
L1 Base Skill L1 Base Skill tier templates/l1-base-skill.yaml Single responsibility, declares inputs/outputs and trigger keywords
L2 Gateway Skill L2 Gateway/Routing tier templates/l2-gateway-skill.yaml Routing table + permission checks + downstream dependency declarations
L3 Ceiling Skill L3 Scenario tier templates/l3-ceiling-skill.yaml Multi-phase orchestration + structured JSON data passing

L1 Base Skill Template Example

# templates/l1-base-skill.yaml
skill_name: ""                    # Required: skill name, kebab-case
skill_level: L1                   # Required: fixed at L1
version: "1.0.0"                  # Required: semantic version

description: ""                   # Required: one-line description
trigger_keywords: []              # Required: trigger keyword list
  # - "keyword1"
  # - "keyword2"

inputs:                           # Required: input parameter definitions
  - name: ""                      # Parameter name
    type: ""                      # Type: string/number/boolean/json/file
    required: true                # Is it required?
    description: ""               # Parameter description

outputs:                          # Required: output definitions
  - name: ""
    type: ""
    description: ""

dependencies:                     # Dependency declarations (L0 only)
  - skill_name: ""
    version: ">=1.0.0"
    usage: ""                     # What this dependency is used for

execution:                        # Execution spec
  type: prompt                    # prompt | python | hybrid
  timeout_seconds: 300            # Timeout
  retry_policy:
    max_retries: 2
    backoff: exponential

security:                         # Security declarations
  data_access_scope: []           # Data access scope
  sensitive_fields: []            # Sensitive field list
  audit_logging: true             # Enable audit logging?

quality:                          # Quality metrics
  min_accuracy: 0.85              # Minimum accuracy
  test_cases: []                  # Test case paths
Enter fullscreen mode Exit fullscreen mode

Field design philosophy:

  • skill_level is mandatory—the toolchain uses it for dependency legality checks
  • dependencies is restricted to same-tier or lower; L1 can only depend on L0
  • security section is required for the quality gate—missing any item gets blocked by skill-lint
  • quality section is currently declarative; future versions will plug into automated test frameworks

Key Differences in L2 and L3 Templates

L2 Gateway adds:

routing_table:                    # L2-only: routing table
  - intent: ""                    # User intent
    target_skill: ""              # Target Skill to route to
    confidence_threshold: 0.8     # Confidence threshold
permission_check:                 # L2-only: permission checks
  enabled: true
  whitelist: []
Enter fullscreen mode Exit fullscreen mode

L3 Ceiling adds:

phases:                           # L3-only: multi-phase orchestration
  - phase: 1
    name: ""
    skill: ""                     # L1/L2 Skill being called
    input_mapping: {}             # Input mapping
    output_key: ""                # Key to store output
  - phase: 2
    name: ""
    skill: ""
    input_mapping: {}
    output_key: ""
orchestration:                    # L3-only: orchestration strategy
  mode: sequential               # sequential | parallel | conditional
  failure_policy: stop            # stop | skip | retry
Enter fullscreen mode Exit fullscreen mode

4. Python Toolchain: Scan → Lint → Backfill Pipeline

skill-framework bundles 3 Python tools that form an automated pipeline from discovery to compliance:

inventory-scan  ──→  skill-lint  ──→  backfill-frontmatter
   (scan & build inventory)  (compliance check)   (backfill frontmatter)
Enter fullscreen mode Exit fullscreen mode

Tool 1: inventory-scan — Scan & Build Inventory

Scans all Skills under a directory, auto-detects tiers, extracts metadata, and generates a unified inventory.

# Basic: scan all Skills in your project
python tools/inventory-scan.py --root ./skills --output ./output

# With tier validation: auto-detect level tag legality
python tools/inventory-scan.py \
  --root ./skills \
  --output ./output \
  --validate-levels
Enter fullscreen mode Exit fullscreen mode

Outputs:

  • unified_skill_inventory.json — Complete inventory of 208 Skills (structured JSON)
  • unified_skill_inventory.csv — Same inventory in tabular form (easy Excel filtering)
  • skills_dependencies.json — Skill dependency graph

Inventory JSON structure example:

{
  "scan_timestamp": "2026-07-01T10:00:00Z",
  "total_skills": 208,
  "by_level": {
    "L0": 12,
    "L1": 86,
    "L2": 24,
    "L3": 58,
    "L4": 28
  },
  "skills": [
    {
      "name": "info-extractor",
      "level": "L1",
      "version": "2.1.0",
      "description": "从非结构化文本中提取结构化字段",
      "trigger_keywords": ["信息提取", "提取字段", "结构化"],
      "dependencies": ["mysql-connector@L0"],
      "file_path": "skills/info-extractor/SKILL.md"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Tool 2: skill-lint — YAML Compliance Checker

Checks each Skill's YAML declarations against the template spec and outputs a violation report.

# Check a single Skill
python tools/skill-lint.py --target ./skills/info-extractor

# Batch check the whole project
python tools/skill-lint.py --root ./skills --template-dir ./templates

# Strict mode: treat Warnings as Errors
python tools/skill-lint.py --root ./skills --strict
Enter fullscreen mode Exit fullscreen mode

Sample lint rules:

Rule ID Level What It Checks
L001 Error skill_level must be one of L0–L4
L002 Error Dependencies must be at a lower tier than the current Skill
L003 Error security section cannot be empty
L004 Warning Recommend adding description for every input
L005 Error L3 Skills must have a phases section
L006 Warning trigger_keywords should have at least 3 entries

Output example:

[ERROR] L002: skills/scoring-engine/skill.yaml
  → dependency "data-analyst" is L1, same level as current skill (L3)
  → Suggestion: L3 skills should depend on L1/L2, not other L3 skills directly

[WARNING] L004: skills/info-extractor/skill.yaml
  → Input "raw_text" missing description
  → Suggestion: Add description field for better discoverability

Scan complete: 208 skills checked, 3 errors, 7 warnings
Enter fullscreen mode Exit fullscreen mode

Tool 3: backfill-frontmatter — Auto-Fill Missing Frontmatter

For Skill files missing YAML frontmatter, this tool extracts content from SKILL.md and generates standard frontmatter.

# Dry run: preview what would be backfilled
python tools/backfill-frontmatter.py --root ./skills --dry-run

# After reviewing, apply changes
python tools/backfill-frontmatter.py --root ./skills --apply
Enter fullscreen mode Exit fullscreen mode

Typical scenario: Your team wrote SKILL.md files early on without filling in YAML templates. This tool will:

  1. Read the description and trigger keywords from SKILL.md
  2. Infer skill_level from file path and content
  3. Scan import/require statements to extract dependencies
  4. Generate template-compliant frontmatter and prepend it to the file

5. Skill Dependency Graph: skills_dependencies.json

The dependency graph is skill-framework's second-largest data asset (after the inventory itself). It explicitly declares the call relationships between Skills.

Structure design:

{
  "version": "1.0.0",
  "generated_at": "2026-07-01T10:00:00Z",
  "nodes": [
    {
      "id": "info-extractor",
      "level": "L1",
      "group": "data-processing"
    },
    {
      "id": "scoring-engine",
      "level": "L3",
      "group": "risk-management"
    }
  ],
  "edges": [
    {
      "from": "scoring-engine",
      "to": "info-extractor",
      "type": "phase-1",
      "data_contract": "structured-json"
    },
    {
      "from": "scoring-engine",
      "to": "knowledge-rag",
      "type": "phase-2",
      "data_contract": "structured-json"
    }
  ],
  "orphan_nodes": ["unused-skill-demo"]
}
Enter fullscreen mode Exit fullscreen mode

Three killer use cases:

  1. Change impact analysis: Before modifying info-extractor, check edges to see which downstream Skills like scoring-engine are affected
  2. Dead skill discovery: orphan_nodes lists Skills with zero dependencies—candidates for deletion or archival
  3. Tier violation detection: Use alongside skill-lint to catch illegal calls like L1 depending on L3

6. Quality Gate: 6-Step Checklist from Dev to Production

skill-framework ships with audit-checklist.md that defines a 6-step quality gate. Every Skill must pass all items before going live:

Step Check Item Owner Tool Support
1. Structural compliance YAML fields complete, tier correct Developer skill-lint
2. Dependency legality One-way dependencies, no cycles Developer inventory-scan --validate-levels
3. Security audit Minimal data scope, sensitive fields masked Security reviewer skill-lint L003
4. Integration test End-to-end flow verification, timeout & retry testing QA engineer Manual + automated test framework
5. Documentation completeness README, trigger keyword examples, I/O samples Developer backfill-frontmatter --dry-run
6. Production approval Manual sign-off + archived approval record Tech lead Manual

Practical command sequence:

# Step 1: Structural compliance
python tools/skill-lint.py --root ./skills --strict

# Step 2: Dependency legality
python tools/inventory-scan.py --root ./skills --validate-levels

# Step 3: Security audit (focus on L003 rule)
python tools/skill-lint.py --root ./skills --rule L003 --strict

# Step 4-5: Documentation backfill
python tools/backfill-frontmatter.py --root ./skills --dry-run
# review dry-run output, then:
python tools/backfill-frontmatter.py --root ./skills --apply

# Step 6: Manual approval (review check report, sign off)
cat output/audit-report.md
Enter fullscreen mode Exit fullscreen mode

7. Four Industry Vertical Blueprints

The framework bundles 4 industry blueprints, each pre-defining the core Skill combinations and dependency relationships for that industry:

Industry Blueprint File Core Skill Combo Special Component
Finance blueprints/finance.yaml Risk scoring, compliance review, investment research, customer profiling financial-ai-skills integration
Telecom blueprints/telecom.yaml Complaint analysis, field service dispatch, 5G private network assessment, network root cause teleagent-skills integration
Healthcare blueprints/healthcare.yaml Medical record extraction, diagnosis assistance, medication review, scheduling optimization HIPAA compliance Skill
Government blueprints/government.yaml Document review, policy interpretation, public opinion analysis, approval workflows Red-header document parser Skill

How to use:

# Initialize a project based on the finance blueprint
python tools/inventory-scan.py \
  --blueprint blueprints/finance.yaml \
  --init ./my-finance-project
Enter fullscreen mode Exit fullscreen mode

The blueprint auto-generates the Skill directory skeleton, dependency declarations, and pre-filled YAML templates for that industry.


8. The Full 208-Skill Inventory at a Glance

unified_skill_inventory.json catalogs 208 Skills, distributed by tier as follows:

Tier Count Share Representative Skills
L0 12 5.8% mysql-connector, redis-client, oss-handler
L1 86 41.3% info-extractor, data-analyst, report-generator, security-guard, knowledge-rag
L2 24 11.5% nl2-query, l3-gw-01, data-query-gateway
L3 58 27.9% scoring-engine, evidence-chain, live-stream-script-system, contract-review
L4 28 13.5% agent-teams-orchestrator, l7-arkclaw-01, auto-pilot

The inventory comes in both JSON and CSV—JSON for programmatic use, CSV for Excel filtering and human review.


9. Quick Start

# Clone the repo
git clone https://github.com/yuzhaopeng-up/skill-framework.git
cd skill-framework

# Install dependencies
pip install -r requirements.txt

# Step 1: Scan your Skill project
python tools/inventory-scan.py --root /path/to/your/skills --output ./output

# Step 2: Compliance check
python tools/skill-lint.py --root /path/to/your/skills --template-dir ./templates

# Step 3: Backfill missing metadata
python tools/backfill-frontmatter.py --root /path/to/your/skills --dry-run

# Confirm and apply
python tools/backfill-frontmatter.py --root /path/to/your/skills --apply
Enter fullscreen mode Exit fullscreen mode

Repo structure:

skill-framework/
├── templates/                    # 3 YAML templates
│   ├── l1-base-skill.yaml
│   ├── l2-gateway-skill.yaml
│   └── l3-ceiling-skill.yaml
├── tools/                        # 3 Python tools
│   ├── inventory-scan.py
│   ├── skill-lint.py
│   └── backfill-frontmatter.py
├── blueprints/                   # 4 industry blueprints
│   ├── finance.yaml
│   ├── telecom.yaml
│   ├── healthcare.yaml
│   └── government.yaml
├── data/                         # Data assets
│   ├── unified_skill_inventory.json
│   ├── unified_skill_inventory.csv
│   └── skills_dependencies.json
├── docs/
│   └── audit-checklist.md        # Quality gate checklist
├── LICENSE                       # MIT
└── README.md
Enter fullscreen mode Exit fullscreen mode

10. Design Trade-offs

Decision Choice Rejected Alternative Reason
Classification model 5 tiers 3 tiers, 7 tiers 5 tiers balances granularity and complexity
Template format YAML JSON Schema, TOML YAML is readable, comment-friendly, and dominant in the Agent ecosystem
Dependency declarations Static files Runtime discovery Static declarations enable offline checks and are secure/controllable
Tooling language Python Node.js Higher Python penetration in AI/data teams
License MIT Apache 2.0 MIT is the most permissive, lowers adoption barrier

The Open-Source Agent Skills Ecosystem

skill-framework isn't an isolated project—it's the governance hub of an open-source Agent Skills ecosystem. These 5 repos work together to cover the full chain from Skill development to industry adoption:

Repo Role GitHub
skill-framework L0-L4 classification + YAML templates + Python toolchain https://github.com/yuzhaopeng-up/skill-framework
financial-ai-skills Finance industry vertical Skill set (risk, compliance, research) https://github.com/yuzhaopeng-up/financial-ai-skills
teleagent-skills Telecom industry vertical Skill set (field service, complaints, 5G) https://github.com/yuzhaopeng-up/teleagent-skills
agent-cluster-comm Multi-Agent cluster communication protocol & orchestration engine https://github.com/yuzhaopeng-up/agent-cluster-comm
fintech-h5-demos Fintech H5 interactive demos (courses/training) https://github.com/yuzhaopeng-up/fintech-h5-demos

How they work together:

  • skill-framework defines the standards and tools; other repos follow its classification system and template specs
  • financial-ai-skills and teleagent-skills are two concrete implementations of industry blueprints
  • agent-cluster-comm provides the L4 multi-Agent orchestration communication protocol; skill-framework's L4 template is based on it
  • fintech-h5-demos is the front-end display layer, visualizing Skill execution results as interactive H5

License: MIT — use it, fork it, modify it, just don't delete the copyright notice.

If this framework helped you make sense of Skill governance, drop a Star on https://github.com/yuzhaopeng-up/skill-framework

Top comments (0)