DEV Community

Cover image for Refactoring Agent Skills: From Context Explosion to a Fast, Reliable Workflow
Dechun Wang
Dechun Wang

Posted on

Refactoring Agent Skills: From Context Explosion to a Fast, Reliable Workflow

Refactoring Agent Skills: The Day My Context Window Died

There’s a specific kind of pain you only experience once:

You’re in Claude Code, you trigger a couple of “helpful” Skills, and suddenly the model is chewing through thousands of lines of markdown + snippets it didn’t ask for.

Your “AI co-pilot” stops feeling like a co-pilot and starts feeling like a browser tab you can’t close.

This piece is a practical rewrite (and upgrade) of a popular Claude Code community refactor story: a developer thought “more info = better,” built Skills like mini-wikis, and accidentally created a context explosion. The fix wasn’t a clever prompt. It was an architectural refactor. The result: dramatically leaner initial context and much better token efficiency.

Let’s steal the playbook.


1) The Root Cause: Treating Skills Like Docs

The first trap is incredibly human:

“If I include everything, the model will always have what it needs.”

So you create one Skill per tool, and each Skill becomes a documentation dump:

  • setup steps
  • API references
  • exhaustive examples
  • “don’t do X” lists
  • every edge case since 2017

Then a task like “deploy a serverless function with a small UI” pulls in:

  • your Cloudflare skill,
  • your Docker skill,
  • your UI styling skill,
  • your web framework skill…

…and the model starts its job already half-drowned.

Claude Code’s own docs warn that Skills share the context window with the conversation, the request, and other Skills — which means uncontrolled loading is a direct performance tax. (You feel it as slowness, drift, and “why is it ignoring the obvious part?”)

So: your problem isn’t “lack of info.” It’s “too much irrelevant info.”


2) The Fix: Progressive Disclosure (Three Layers)

Claude Code docs explicitly recommend progressive disclosure: keep essential info in SKILL.md, and store the heavy stuff in separate files that get loaded only when the task requires them.

This maps cleanly to a three-layer system:

Layer 1 — Metadata (always loaded)

A short YAML frontmatter: name + description + the “routing signal.”

Think of it like a book cover and blurb. You’re not teaching. You’re helping the model decide whether to open the book.

Layer 2 — Entry point: SKILL.md (loaded on activation)

Your navigation map:

  • what the Skill is for
  • when to use it
  • what steps to follow
  • what files to open next

Not a tutorial. Not a wiki.

Layer 3 — References & scripts (loaded only when needed)

Small, focused files:

  • one topic per file
  • 200–300 lines per file is a good target
  • scripts do deterministic work so the model doesn’t burn tokens “describing” actions

Here’s what that looks like in a real folder:

.claude/skills/devops/
├── SKILL.md
├── references/
│   ├── serverless-cloudflare.md
│   ├── containers-docker.md
│   └── ci-cd-basics.md
└── scripts/
    ├── validate_env.py
    └── deploy_helper.sh
Enter fullscreen mode Exit fullscreen mode

3) The “200-Line Rule”: Brutal, Slightly Arbitrary, Weirdly Effective

In the community refactor story, the author landed on a hard constraint:

Keep SKILL.md under ~200 lines.

If you can’t, you’re putting too much in the entry point.

Claude’s own best practices docs recommend keeping the body under a few hundred lines (and splitting content as you approach that limit). But “200 lines” is a sharper knife: it forces you to write a table of contents, not a textbook.

Why it works:

  • The model can scan the entry quickly
  • It can decide what reference file to load next
  • Total “initial load” stays small enough that the conversation still has room to breathe

A quick test you can steal

  • Start a fresh session (cold start)
  • Trigger your Skill
  • If your first activation loads more than ~500 lines of content, your design is likely leaking scope

4) The Real Mental Shift: From Tool-Centric to Workflow-Centric

This is the part most people miss.

Tool-centric Skills look like:

  • cloudflare-skill
  • tailwind-skill
  • postgres-skill
  • kubernetes-skill

They’re encyclopedias. They don’t compose well.

Workflow-centric Skills look like:

  • devops (deploy + environments + CI/CD)
  • ui-styling (design rules + component patterns)
  • web-frameworks (routing + project structure + SSR pitfalls)
  • databases (schema design + migrations + query patterns)

They map to what you actually do during development.

A workflow Skill answers:

“When I’m in this stage of work, what does the agent need to know to act correctly?”

Not:

“What is everything this tool can do?”

That one reframing prevents context blowups almost by itself.


5) A Minimal, Production-Grade SKILL.md (Example)

Here’s a deliberately small entry point you can copy and customise.

Notice what’s missing: long examples, full docs, and “everything you might ever need.”

---
name: ui-styling
description: Apply consistent UI styling across the app (Tailwind + component conventions). Use when building or refactoring UI.
---

# UI Styling Skill

## When to use
- You are building UI components or pages
- You need consistent spacing, typography, and responsive behaviour
- You need to align with existing design conventions

## Workflow
1. Identify the UI surface (page/component) and constraints (responsive, dark mode, accessibility)
2. Apply styling rules from the references (pick only what you need)
3. Validate output against the checklist

## References (load only if needed)
- `references/design-tokens.md` — spacing, font scale, colour usage
- `references/tailwind-patterns.md` — layouts, common utility combos
- `references/accessibility-checklist.md` — keyboard, focus, contrast

## Output contract
- Use UK English in UI strings
- Prefer reusable components over copy-paste blocks
- Keep className readable (extract when it gets messy)
Enter fullscreen mode Exit fullscreen mode

That’s it.

The Skill’s job is to route the agent to the right file at the right moment — not to become an on-page encyclopedia.


6) Measuring Improvements (Without Lying to Yourself)

If you want repeatable results, track metrics that actually matter:

  • Initial lines loaded on activation
  • Time to activation (roughly: how “snappy” it feels)
  • Relevance ratio (how much of the loaded content is used)
  • Context overflow frequency (how often long tasks crash)

You don’t need a full observability stack. A simple repo audit script helps.

Tiny Python audit: count lines per Skill

from pathlib import Path

skills_dir = Path(".claude/skills")

def count_lines(p: Path) -> int:
    return sum(1 for _ in p.open("r", encoding="utf-8", errors="ignore"))

for skill in sorted(skills_dir.iterdir()):
    skill_md = skill / "SKILL.md"
    if skill_md.exists():
        lines = count_lines(skill_md)
        status = "OK" if lines <= 200 else "REFACTOR"
        print(f"{skill.name:20} {lines:4} lines  ->  {status}")
Enter fullscreen mode Exit fullscreen mode

If you run this weekly, you’ll catch “documentation creep” before it becomes a crisis.


7) Common Failure Modes (And How to Avoid Them)

Failure mode: Claude writes “a doc” instead of “a Skill”

LLMs love expanding markdown into tutorials.

Fix:

  • explicitly tell it: this is not documentation
  • remove “beginner” filler
  • keep examples short, push detail into references

Failure mode: Entry point bloats because the Skill scope is too wide

Fix:

  • split the Skill by workflow stage
  • or move decision trees into references

Failure mode: Too many references, still hard to navigate

Fix:

  • put a short “map” section in SKILL.md
  • keep reference files single-topic and named by intent, not by tool

8) A Copyable Refactor Checklist

1) Audit: list Skills + line counts, find any SKILL.md > 200 lines

2) Group by workflow: merge tool-specific Skills into capability Skills

3) Create references: move detailed info out of SKILL.md

4) Enforce entry constraints: keep SKILL.md lean and navigational

5) Cold start test: ensure first activation stays under your chosen budget

6) Keep scripts deterministic: offload “do the thing” to code where possible

7) Re-check monthly: Skills drift over time; treat them like code


Final Take: Context Engineering Is “Right Info, Right Time”

The big lesson isn’t “200 lines” or “three layers.”

It’s this:

Context is a budget.

And the best Skill design spends it like an engineer, not like a librarian.

Don’t load everything. Load what matters — when it matters — and keep the rest one file away.

Top comments (0)