兆鹏于

Posted on Jul 1

How to Govern 200+ Agent Skills: L0-L4 Classification, YAML Templates, and Python Toolchain

#ai #agents #architecture #python

How I Manage 200+ Agent Skills: L0-L4 Classification + YAML Templates + Python Toolchain

When your Agent project balloons past 200 Skills, "it works" and "it's manageable" are two very different things. In this post, I'll walk you through an open-source governance framework—skill-framework—that uses a five-tier classification model, standardized templates, and an automated toolchain to take Agent skills from wild-west chaos to engineering-grade ops.

1. The Problem: Why Do You Even Need a Skill Governance Framework?

The Agent ecosystem is repeating the same mistake microservices made—grow wild early, lose control later.

Classic symptoms:

Symptom	What It Looks Like	Root Cause
Hard to locate	"Where's that credit-check Skill again?"	No unified classification, skills piled flat
Dependency chaos	Tweak one atomic Skill, 3 scenario Skills break	Dependencies spread by word of mouth, no explicit declarations
Format drift	Same team, different field names and structures	No enforced templates, convention is voluntary
Production incidents	New Skill ships with zero security audit	No quality gate, no checklist
Reuse deadlock	Project A wrote a Skill, Project B has no idea it exists	No industry blueprints, start from scratch every time

At 10 Skills, you can keep it all in your head. At 50, you patch with docs. At 200+—you need an engineering framework.

skill-framework exists for exactly this: https://github.com/yuzhaopeng-up/skill-framework

2. The Core: L0-L4 Five-Tier Classification Model

The foundation of the whole framework is a five-tier model. Each tier has a clear responsibility boundary and dependency direction—higher tiers depend on lower ones, lower tiers never know about higher ones.

┌─────────────────────────────────────┐
│  L4 Multi-Agent   Agent Team Orchestration │  ← Team orchestration
├─────────────────────────────────────┤
│  L3 Scenario       Business Composition    │  ← Business composition
├─────────────────────────────────────┤
│  L2 Gateway/Routing  Intent Routing       │  ← Intent routing
├─────────────────────────────────────┤
│  L1 Base Skills    Atomic Skills         │  ← Atomic skills
├─────────────────────────────────────┤
│  L0 Infrastructure  Infra Connectors      │  ← DB/API connectors
└─────────────────────────────────────┘

Tier Breakdown

Tier	Name	Responsibility	Scope	Typical Examples
L0	Infrastructure	Connect to external systems, wrap data source access	DB connectors, API clients, file I/O	`mysql-connector`, `redis-client`, `oss-file-handler`
L1	Base Skill	Atomic operations, single responsibility, independently executable	Info extraction, data query, report generation, security checks	`info-extractor`, `data-analyst`, `report-generator`, `security-guard`
L2	Gateway/Routing	Accept natural language input, identify intent, route to the right L1 Skill	Intent recognition, permission checks, query dispatch	`nl2-query`, `l3-gw-01` (data query gateway)
L3	Scenario (Ceiling)	Orchestrate multiple L1/L2 Skills into end-to-end business flows	Multi-phase pipelines, business composition	`scoring-engine` (opportunity scoring), `evidence-chain` (evidence chain analysis)
L4	Multi-Agent	Spin up independent sub-Agent teams, role isolation, parallel collaboration	Team orchestration, task scheduling	`agent-teams-orchestrator`, `l7-arkclaw-01` (enterprise ops assistant)

Key constraints:

One-way dependency: L4 → L3 → L2 → L1 → L0, reverse dependencies strictly forbidden
L1 is stateless: Base skills must be pure-function-style, no session state
L2 is stateful: Gateway tier manages session context and routing tables
L3 orchestrates, doesn't execute: Scenario tier only schedules; actual execution sinks to L1
L4 runs in isolation: Each sub-Agent has independent context, data passes via structured JSON

The biggest value of this model isn't theoretical completeness—it's that it assigns each of the 208 Skills to exactly one tier. When you need to find a Skill, first pin down the tier, then narrow by domain, and you're looking at 10–20 candidates max.

3. YAML Templates: 3 Ready-to-Copy Specifications

skill-framework provides three YAML templates covering the three most common Skill shapes:

Template	Target Tier	File	Key Feature
L1 Base Skill	L1 Base Skill tier	`templates/l1-base-skill.yaml`	Single responsibility, declares inputs/outputs and trigger keywords
L2 Gateway Skill	L2 Gateway/Routing tier	`templates/l2-gateway-skill.yaml`	Routing table + permission checks + downstream dependency declarations
L3 Ceiling Skill	L3 Scenario tier	`templates/l3-ceiling-skill.yaml`	Multi-phase orchestration + structured JSON data passing

L1 Base Skill Template Example

# templates/l1-base-skill.yaml
skill_name: ""                    # Required: skill name, kebab-case
skill_level: L1                   # Required: fixed at L1
version: "1.0.0"                  # Required: semantic version

description: ""                   # Required: one-line description
trigger_keywords: []              # Required: trigger keyword list
  # - "keyword1"
  # - "keyword2"

inputs:                           # Required: input parameter definitions
  - name: ""                      # Parameter name
    type: ""                      # Type: string/number/boolean/json/file
    required: true                # Is it required?
    description: ""               # Parameter description

outputs:                          # Required: output definitions
  - name: ""
    type: ""
    description: ""

dependencies:                     # Dependency declarations (L0 only)
  - skill_name: ""
    version: ">=1.0.0"
    usage: ""                     # What this dependency is used for

execution:                        # Execution spec
  type: prompt                    # prompt | python | hybrid
  timeout_seconds: 300            # Timeout
  retry_policy:
    max_retries: 2
    backoff: exponential

security:                         # Security declarations
  data_access_scope: []           # Data access scope
  sensitive_fields: []            # Sensitive field list
  audit_logging: true             # Enable audit logging?

quality:                          # Quality metrics
  min_accuracy: 0.85              # Minimum accuracy
  test_cases: []                  # Test case paths

Field design philosophy:

skill_level is mandatory—the toolchain uses it for dependency legality checks
dependencies is restricted to same-tier or lower; L1 can only depend on L0
security section is required for the quality gate—missing any item gets blocked by skill-lint
quality section is currently declarative; future versions will plug into automated test frameworks

Key Differences in L2 and L3 Templates

L2 Gateway adds:

routing_table:                    # L2-only: routing table
  - intent: ""                    # User intent
    target_skill: ""              # Target Skill to route to
    confidence_threshold: 0.8     # Confidence threshold
permission_check:                 # L2-only: permission checks
  enabled: true
  whitelist: []

L3 Ceiling adds:

phases:                           # L3-only: multi-phase orchestration
  - phase: 1
    name: ""
    skill: ""                     # L1/L2 Skill being called
    input_mapping: {}             # Input mapping
    output_key: ""                # Key to store output
  - phase: 2
    name: ""
    skill: ""
    input_mapping: {}
    output_key: ""
orchestration:                    # L3-only: orchestration strategy
  mode: sequential               # sequential | parallel | conditional
  failure_policy: stop            # stop | skip | retry

4. Python Toolchain: Scan → Lint → Backfill Pipeline

skill-framework bundles 3 Python tools that form an automated pipeline from discovery to compliance:

inventory-scan  ──→  skill-lint  ──→  backfill-frontmatter
   (scan & build inventory)  (compliance check)   (backfill frontmatter)

Tool 1: inventory-scan — Scan & Build Inventory

Scans all Skills under a directory, auto-detects tiers, extracts metadata, and generates a unified inventory.

# Basic: scan all Skills in your project
python tools/inventory-scan.py --root ./skills --output ./output

# With tier validation: auto-detect level tag legality
python tools/inventory-scan.py \
  --root ./skills \
  --output ./output \
  --validate-levels

Outputs:

unified_skill_inventory.json — Complete inventory of 208 Skills (structured JSON)
unified_skill_inventory.csv — Same inventory in tabular form (easy Excel filtering)
skills_dependencies.json — Skill dependency graph

Inventory JSON structure example:

{
  "scan_timestamp": "2026-07-01T10:00:00Z",
  "total_skills": 208,
  "by_level": {
    "L0": 12,
    "L1": 86,
    "L2": 24,
    "L3": 58,
    "L4": 28
  },
  "skills": [
    {
      "name": "info-extractor",
      "level": "L1",
      "version": "2.1.0",
      "description": "从非结构化文本中提取结构化字段",
      "trigger_keywords": ["信息提取", "提取字段", "结构化"],
      "dependencies": ["mysql-connector@L0"],
      "file_path": "skills/info-extractor/SKILL.md"
    }
  ]
}

Tool 2: skill-lint — YAML Compliance Checker

Checks each Skill's YAML declarations against the template spec and outputs a violation report.

# Check a single Skill
python tools/skill-lint.py --target ./skills/info-extractor

# Batch check the whole project
python tools/skill-lint.py --root ./skills --template-dir ./templates

# Strict mode: treat Warnings as Errors
python tools/skill-lint.py --root ./skills --strict

Sample lint rules:

Rule ID	Level	What It Checks
`L001`	Error	skill_level must be one of L0–L4
`L002`	Error	Dependencies must be at a lower tier than the current Skill
`L003`	Error	security section cannot be empty
`L004`	Warning	Recommend adding description for every input
`L005`	Error	L3 Skills must have a phases section
`L006`	Warning	trigger_keywords should have at least 3 entries

Output example:

[ERROR] L002: skills/scoring-engine/skill.yaml
  → dependency "data-analyst" is L1, same level as current skill (L3)
  → Suggestion: L3 skills should depend on L1/L2, not other L3 skills directly

[WARNING] L004: skills/info-extractor/skill.yaml
  → Input "raw_text" missing description
  → Suggestion: Add description field for better discoverability

Scan complete: 208 skills checked, 3 errors, 7 warnings

Tool 3: backfill-frontmatter — Auto-Fill Missing Frontmatter

For Skill files missing YAML frontmatter, this tool extracts content from SKILL.md and generates standard frontmatter.

# Dry run: preview what would be backfilled
python tools/backfill-frontmatter.py --root ./skills --dry-run

# After reviewing, apply changes
python tools/backfill-frontmatter.py --root ./skills --apply

Typical scenario: Your team wrote SKILL.md files early on without filling in YAML templates. This tool will:

Read the description and trigger keywords from SKILL.md
Infer skill_level from file path and content
Scan import/require statements to extract dependencies
Generate template-compliant frontmatter and prepend it to the file

5. Skill Dependency Graph: skills_dependencies.json

The dependency graph is skill-framework's second-largest data asset (after the inventory itself). It explicitly declares the call relationships between Skills.

Structure design:

{
  "version": "1.0.0",
  "generated_at": "2026-07-01T10:00:00Z",
  "nodes": [
    {
      "id": "info-extractor",
      "level": "L1",
      "group": "data-processing"
    },
    {
      "id": "scoring-engine",
      "level": "L3",
      "group": "risk-management"
    }
  ],
  "edges": [
    {
      "from": "scoring-engine",
      "to": "info-extractor",
      "type": "phase-1",
      "data_contract": "structured-json"
    },
    {
      "from": "scoring-engine",
      "to": "knowledge-rag",
      "type": "phase-2",
      "data_contract": "structured-json"
    }
  ],
  "orphan_nodes": ["unused-skill-demo"]
}

Three killer use cases:

Change impact analysis: Before modifying info-extractor, check edges to see which downstream Skills like scoring-engine are affected
Dead skill discovery: orphan_nodes lists Skills with zero dependencies—candidates for deletion or archival
Tier violation detection: Use alongside skill-lint to catch illegal calls like L1 depending on L3

6. Quality Gate: 6-Step Checklist from Dev to Production

skill-framework ships with audit-checklist.md that defines a 6-step quality gate. Every Skill must pass all items before going live:

Step	Check Item	Owner	Tool Support
1. Structural compliance	YAML fields complete, tier correct	Developer	`skill-lint`
2. Dependency legality	One-way dependencies, no cycles	Developer	`inventory-scan --validate-levels`
3. Security audit	Minimal data scope, sensitive fields masked	Security reviewer	`skill-lint L003`
4. Integration test	End-to-end flow verification, timeout & retry testing	QA engineer	Manual + automated test framework
5. Documentation completeness	README, trigger keyword examples, I/O samples	Developer	`backfill-frontmatter --dry-run`
6. Production approval	Manual sign-off + archived approval record	Tech lead	Manual

Practical command sequence:

# Step 1: Structural compliance
python tools/skill-lint.py --root ./skills --strict

# Step 2: Dependency legality
python tools/inventory-scan.py --root ./skills --validate-levels

# Step 3: Security audit (focus on L003 rule)
python tools/skill-lint.py --root ./skills --rule L003 --strict

# Step 4-5: Documentation backfill
python tools/backfill-frontmatter.py --root ./skills --dry-run
# review dry-run output, then:
python tools/backfill-frontmatter.py --root ./skills --apply

# Step 6: Manual approval (review check report, sign off)
cat output/audit-report.md

7. Four Industry Vertical Blueprints

The framework bundles 4 industry blueprints, each pre-defining the core Skill combinations and dependency relationships for that industry:

Industry	Blueprint File	Core Skill Combo	Special Component
Finance	`blueprints/finance.yaml`	Risk scoring, compliance review, investment research, customer profiling	`financial-ai-skills` integration
Telecom	`blueprints/telecom.yaml`	Complaint analysis, field service dispatch, 5G private network assessment, network root cause	`teleagent-skills` integration
Healthcare	`blueprints/healthcare.yaml`	Medical record extraction, diagnosis assistance, medication review, scheduling optimization	HIPAA compliance Skill
Government	`blueprints/government.yaml`	Document review, policy interpretation, public opinion analysis, approval workflows	Red-header document parser Skill

How to use:

# Initialize a project based on the finance blueprint
python tools/inventory-scan.py \
  --blueprint blueprints/finance.yaml \
  --init ./my-finance-project

The blueprint auto-generates the Skill directory skeleton, dependency declarations, and pre-filled YAML templates for that industry.

8. The Full 208-Skill Inventory at a Glance

unified_skill_inventory.json catalogs 208 Skills, distributed by tier as follows:

Tier	Count	Share	Representative Skills
L0	12	5.8%	mysql-connector, redis-client, oss-handler
L1	86	41.3%	info-extractor, data-analyst, report-generator, security-guard, knowledge-rag
L2	24	11.5%	nl2-query, l3-gw-01, data-query-gateway
L3	58	27.9%	scoring-engine, evidence-chain, live-stream-script-system, contract-review
L4	28	13.5%	agent-teams-orchestrator, l7-arkclaw-01, auto-pilot

The inventory comes in both JSON and CSV—JSON for programmatic use, CSV for Excel filtering and human review.

9. Quick Start

# Clone the repo
git clone https://github.com/yuzhaopeng-up/skill-framework.git
cd skill-framework

# Install dependencies
pip install -r requirements.txt

# Step 1: Scan your Skill project
python tools/inventory-scan.py --root /path/to/your/skills --output ./output

# Step 2: Compliance check
python tools/skill-lint.py --root /path/to/your/skills --template-dir ./templates

# Step 3: Backfill missing metadata
python tools/backfill-frontmatter.py --root /path/to/your/skills --dry-run

# Confirm and apply
python tools/backfill-frontmatter.py --root /path/to/your/skills --apply

Repo structure:

skill-framework/
├── templates/                    # 3 YAML templates
│   ├── l1-base-skill.yaml
│   ├── l2-gateway-skill.yaml
│   └── l3-ceiling-skill.yaml
├── tools/                        # 3 Python tools
│   ├── inventory-scan.py
│   ├── skill-lint.py
│   └── backfill-frontmatter.py
├── blueprints/                   # 4 industry blueprints
│   ├── finance.yaml
│   ├── telecom.yaml
│   ├── healthcare.yaml
│   └── government.yaml
├── data/                         # Data assets
│   ├── unified_skill_inventory.json
│   ├── unified_skill_inventory.csv
│   └── skills_dependencies.json
├── docs/
│   └── audit-checklist.md        # Quality gate checklist
├── LICENSE                       # MIT
└── README.md

10. Design Trade-offs

Decision	Choice	Rejected Alternative	Reason
Classification model	5 tiers	3 tiers, 7 tiers	5 tiers balances granularity and complexity
Template format	YAML	JSON Schema, TOML	YAML is readable, comment-friendly, and dominant in the Agent ecosystem
Dependency declarations	Static files	Runtime discovery	Static declarations enable offline checks and are secure/controllable
Tooling language	Python	Node.js	Higher Python penetration in AI/data teams
License	MIT	Apache 2.0	MIT is the most permissive, lowers adoption barrier

The Open-Source Agent Skills Ecosystem

skill-framework isn't an isolated project—it's the governance hub of an open-source Agent Skills ecosystem. These 5 repos work together to cover the full chain from Skill development to industry adoption:

Repo	Role	GitHub
skill-framework	L0-L4 classification + YAML templates + Python toolchain	https://github.com/yuzhaopeng-up/skill-framework
financial-ai-skills	Finance industry vertical Skill set (risk, compliance, research)	https://github.com/yuzhaopeng-up/financial-ai-skills
teleagent-skills	Telecom industry vertical Skill set (field service, complaints, 5G)	https://github.com/yuzhaopeng-up/teleagent-skills
agent-cluster-comm	Multi-Agent cluster communication protocol & orchestration engine	https://github.com/yuzhaopeng-up/agent-cluster-comm
fintech-h5-demos	Fintech H5 interactive demos (courses/training)	https://github.com/yuzhaopeng-up/fintech-h5-demos

How they work together:

skill-framework defines the standards and tools; other repos follow its classification system and template specs
financial-ai-skills and teleagent-skills are two concrete implementations of industry blueprints
agent-cluster-comm provides the L4 multi-Agent orchestration communication protocol; skill-framework's L4 template is based on it
fintech-h5-demos is the front-end display layer, visualizing Skill execution results as interactive H5

License: MIT — use it, fork it, modify it, just don't delete the copyright notice.

If this framework helped you make sense of Skill governance, drop a Star on https://github.com/yuzhaopeng-up/skill-framework

DEV Community