DEV Community

Cover image for AGENTS.md, SKILL.md, DESIGN.md: How AI Instructions Split into Three Layers

AGENTS.md, SKILL.md, DESIGN.md: How AI Instructions Split into Three Layers

In April 2026, Google Labs released a spec called DESIGN.md. It's a design system specification readable by AI agents, packaged with a CLI validator: npx @google/design.md lint.

With DESIGN.md in the picture, we now have three different file types for instructing AI agents. AGENTS.md has been spreading as an industry standard since 2025 (jointly developed by OpenAI, Google, Sourcegraph, Cursor, and Factory; donated to the Linux Foundation in December 2025). SKILL.md sits at the core of Anthropic's Claude Skills. And now DESIGN.md. The three handle different concerns and don't overlap.

This article is for developers using coding agents like Claude Code, Cursor, or Codex in their work, and for tech leads operating natural-language instruction files like CLAUDE.md and style guides. If your team is doing Spec-Driven Development (SDD), this should also reach you.

What I want to lay out is two things: how AI instructions are starting to split across three layers — behavior, individual tasks, and visual appearance — and how that connects with SDD as a parallel movement.

The Old Pattern: Natural-Language Documents

A few years into the ChatGPT era, most engineers have written some form of "rules I want the AI to follow" in a Markdown file. CLAUDE.md, styleguide.md, CONTRIBUTING.md, internal coding conventions. The locations vary, but the format is roughly the same: unstructured natural language.

A writing-style-guide.md file I've been building over the past few months is a typical example. It's a style guide I use when writing technical articles with Claude — a list of patterns common in AI-generated text, written down as forbidden phrases. By making Claude Desktop read it every session, the tone of my output stays consistent. It's part of a personal repository (ikenyal-ai-agents) I use as the harness for my business automation agents — the one I covered in my previous post.

https://dev.to/aws-builders/harness-engineering-with-nothing-but-markdown-g6b

The file contains roughly 150 lines: rules like "don't use em dashes," "avoid invitations like 'let's try…!'," "drop AI-style preambles like 'what's interesting is…'." The same repository has 15 instruction files under agents/, organized by team and role: executive-assistant.md, sre-support.md, qa-support.md, accounting.md. Each describes "the assumptions to operate under as this role" in plain natural language.

This approach has clear benefits. You can articulate tone, stance, and implicit rules. New team members can read the files and pick up the expectations. With CLAUDE.md, Claude Code reads it every session, so persona-level instructions land consistently.

There are limits, too. First, validation falls on humans. Whether a rule was followed or not gets decided by a human reading the output. Second, individual judgment leaks in. "Write politely" means different things to different reviewers.

The third limit is the actual subject of this article. Rules that are formally verifiable (forbidden phrases, em-dash usage, specific pattern matches) and rules that require judgment (tone, structural choices, how to open with empathy) sit in the same file. So even the verifiable parts end up depending on human review. That's the problem the three new file types are addressing.

New Type 1: How DESIGN.md (Google Labs) Specifies Visual Appearance

On April 10, 2026, Google Labs published the DESIGN.md specification at google-labs-code/design.md. As of early May, the repo has over 11,000 stars. It's the reference implementation for Google Stitch (stitch.withgoogle.com), an AI-driven UI generation product.

https://github.com/google-labs-code/design.md

The specification doc lives on the Stitch side.

https://stitch.withgoogle.com/docs/design-md/specification

What DESIGN.md covers is the design system specification. You write machine-readable design tokens in YAML at the top of the file (colors, typography, spacing, components), and human-readable design intent in the Markdown body underneath. Both live in the same file.

---
name: Heritage
colors:
  primary: "#1A1C1E"
  tertiary: "#B8422E"
typography:
  h1:
    fontFamily: Public Sans
    fontSize: 3rem
---

## Overview

Architectural Minimalism meets Journalistic Gravitas.

## Colors

- Primary (#1A1C1E): Deep ink for headlines and core text.
- Tertiary (#B8422E): "Boston Clay", the sole driver for interaction.
Enter fullscreen mode Exit fullscreen mode

The headline feature of this format is the CLI validator that ships with it.

npx @google/design.md lint DESIGN.md
Enter fullscreen mode Exit fullscreen mode

This checks token reference integrity, WCAG contrast ratios, and structural rule compliance, returning the result as JSON. Wire it into CI and you can verify design system consistency on every pull request. There's also a diff command that compares two DESIGN.md files and returns token-level changes in a structured form. Design system version control — historically a manual process — gains a verifiable layer.

For Japanese UIs, the Google Labs spec alone falls short. It doesn't define the typography requirements specific to Japanese (CJK font fallback chains, line height, letter-spacing, kinsoku shori, mixed typesetting). The gap is filled by kzhrknt/awesome-design-md-jp, which publishes Japan-localized DESIGN.md files for over 10 services including Apple Japan, SmartHR, freee, note, MUJI, Mercari, LINE, and Toyota. For Japanese products, using both the Google Labs spec and the Japan edition together is the practical approach.

https://github.com/kzhrknt/awesome-design-md-jp

What DESIGN.md carries is the design system that used to be scattered across Figma files and style guide PDFs, now consolidated into a single file with both machine-readable and human-readable parts. Think of it as the spec foundation that lets AI agents generate UIs with a consistent look every time.

New Type 2: How SKILL.md (Anthropic) and AGENTS.md Specify Behavior

While DESIGN.md covers "appearance," SKILL.md and AGENTS.md cover "behavior" — defining what the agent is trying to do, how it should proceed, and what it must not do.

SKILL.md is the file format standardized by agentskills.io as part of the Agent Skills open standard. Anthropic's Claude Skills is one implementation of this standard; the same SKILL.md works across Claude Code, Claude.ai, and the Agent SDK. Because it's standards-compliant, the same file is also readable by other agents like OpenClaw and Hermes. The structure: declare metadata (skill name, description, allowed tools) in the YAML at the top of the file, and write the task procedure or domain knowledge in the Markdown body below.

https://agentskills.io/home

A clear example of SKILL.md is conorbronsdon/avoid-ai-writing. It's an English-only skill that detects and rewrites AI patterns in English text — transition phrases like "Moreover," significance inflation like "watershed moment," and roundabout verb constructions like "serves as." It uses a 100+ word replacement table organized into 3 tiers (Tier 1 always replaces, Tier 2 flags when 2+ words appear in the same paragraph, Tier 3 flags only at high density), and audits 36 pattern categories. Two modes: detect and rewrite.

https://github.com/conorbronsdon/avoid-ai-writing

What sets it apart from a one-shot prompt is the structured audit it returns. In rewrite mode, you get four discrete sections: identified issues, the rewritten text, a summary of changes, and a second-pass audit. What changed and why becomes transparent.

AGENTS.md covers the agent's overall behavior. Project assumptions, roles, prohibitions, escalation rules. As I mentioned at the top, it started with the Amp team at Sourcegraph; today OpenAI, Google, Cursor, and Factory jointly drive it, and it was donated to the Linux Foundation in December 2025. Think of CLAUDE.md as the Claude-specific version of AGENTS.md. Claude Code reads CLAUDE.md rather than AGENTS.md in its spec, but the pattern recommended by agents.md is to make AGENTS.md the actual file and symlink CLAUDE.md to it. In the personal repository I introduced earlier, the files under agents/ belong to this layer.

SKILL.md and AGENTS.md cover different ranges. AGENTS.md handles "overall context and boundaries." SKILL.md handles "an executable unit for a specific task."

The avoid-ai-writing English style auditor I mentioned is a specific task, so it ships as SKILL.md. A file like agents/genda/qa-support.md, which describes the assumptions and engagement style of a QA role, defines the agent's boundary — that goes on the AGENTS.md side.

The shared concern of these formats is "behavior and procedure," not visual appearance. What the agent knows, what it's tasked with, what it must avoid. That's a movement to fix these in a verifiable form.

The Three-Layer Split

Lining up the three file types, the layers each one handles become clear.

Layer Format What it carries Examples
Behavior AGENTS.md / CLAUDE.md (natural language + rules) Overall context, roles, prohibitions CLAUDE.md, role-specific files like agents/genda/qa-support.md
Individual task SKILL.md (YAML at top + Markdown body) Reusable tasks, procedures, domain knowledge avoid-ai-writing, in-house procedure skills
Appearance DESIGN.md (YAML at top + Markdown body) Design system spec, verifiable visual rules The Google Labs reference, individual service files in kzhrknt/awesome-design-md-jp

The three are complementary, not competing. CLIs like bergside/typeui are emerging as tools that can generate or update either SKILL.md or DESIGN.md, depending on what you choose — a sign of tooling that assumes the division of labor.

https://github.com/bergside/typeui

What's actually different across the layers is "where to place the balance between machine-readable and human-readable." AGENTS.md skews almost entirely human-readable; over-structuring it would block the contextual judgment and nuance it needs to convey. SKILL.md is partially structured by the YAML at the top, but the body stays human-readable — task granularity has to be readable by humans before it can be instructed. DESIGN.md puts machine-readable design tokens in the top YAML and human-readable design intent in the body, with the two cleanly separated.

The center of gravity between "machine-readable" and "human-readable" sits in different places per layer. That's just the standard structuring principle — "manage things at different layers in different files" — applied to AI agents. The file names themselves spell out the division: AGENTS.md ("instructions to the agent"), SKILL.md ("a reusable skill"), DESIGN.md ("the design system"). The names match what each one carries.

Teams that have been packing all their "AI rules" into a single CLAUDE.md now face a split decision. Open up your CLAUDE.md and run these questions against it — splits start to surface:

  • Is there a section writing design system rules? → If yes, that goes to DESIGN.md
  • Are specific task procedures in there (monthly aggregation, test review, contract review)? → If yes, those go to SKILL.md
  • What's left is overall agent context and boundaries (roles, prohibitions, escalation criteria) → that's the AGENTS.md equivalent that stays

The three-layer split works as a framework for splitting your file.

Connecting with SDD

Stepping back to look at the bigger picture: how does the three-layer split relate to the broader movement of "specs for AI"?

SDD is a development style where you write the spec — requirements, design, tasks, implementation — before generating the code. The underlying idea: "specs aren't disposable scaffolding, they're executable artifacts that produce code." AWS's Kiro provides a workflow that generates requirements.md, design.md, and tasks.md in order under .kiro/specs/{feature}/. GitHub's Spec Kit (over 90,000 stars) supports the same flow with slash commands like /specify, /plan, /tasks, /implement. The EARS notation (Easy Approach to Requirements Syntax) used by Kiro reduces ambiguity by formatting requirements into 5 fixed templates. SDD has spread quickly between 2025 and 2026.

https://kiro.dev/

https://github.com/github/spec-kit

The three-layer split (AGENTS.md / SKILL.md / DESIGN.md) and SDD look like separate movements on the surface. The SDD community concentrates on Kiro and spec-kit usage; the DESIGN.md side concentrates on formal specs and validation tooling. You don't see many articles bridging the two.

But put their philosophies side by side and the overlap is striking.

# Shared philosophy SDD (Kiro etc.) DESIGN.md / SKILL.md / AGENTS.md
1 Specify before implementing requirements → design → tasks → implementation behavior → implementation, appearance → implementation
2 Mix machine-readable + human-readable requirements.md (EARS notation) + natural language YAML at top + Markdown body
3 Persistent context for the AI reference .kiro/specs/{feature}/ every time reference DESIGN.md / AGENTS.md every time
4 Reduce ambiguity through structured syntax EARS notation structures requirements (5 templates) lint validates WCAG contrast ratios and structural rules
5 Fix "decisions made" as a place spec files are where decisions live spec files are where decisions live

Both sit inside the larger "specs for AI" movement and share the same underlying philosophy.

That said, they're not the same thing. The biggest difference, in one phrase: time horizon.

# Axis SDD DESIGN.md / SKILL.md / AGENTS.md
1 Time horizon Describes "what to build next" Describes "rules that already exist"
2 Scope Single feature / project lifecycle Persistent rules and styles
3 Update rhythm New per feature → consume → archive Long-term maintenance, gradual growth
4 Subject Requirements, design, tasks (procedure for action) Rules for behavior, individual tasks, appearance

SDD specs describe "what we're going to build." requirements.md is "what this feature needs to satisfy"; design.md is "how to implement this feature"; tasks.md is "how to break the feature into work." Once the feature ships, they finish their job and get archived.

The three-layer specs describe "what should always hold." DESIGN.md provides the color and typography rules every time you generate a UI; AGENTS.md provides the agent's assumptions across every session. They get maintained long-term and grow incrementally.

This time-horizon difference is why the two don't compete. Transient specs and persistent specs coexist in the same project. They can also reference each other. Imagine writing "use {colors.tertiary} for the button" inside .kiro/specs/checkout-feature/design.md — that lets a transient feature spec reference a color token from a persistent DESIGN.md. The pattern isn't widely established yet, but the structure fits cleanly.

One thing worth noting: as of May 2026, the active areas of SDD (the Kiro community and similar) and the active areas of DESIGN.md / SKILL.md / AGENTS.md haven't really crossed paths. The SDD side concentrates on "how to build a feature"; the three-layer side concentrates on "how to deliver the rules."

You don't have to be doing SDD to start with the three-layer split — the split alone gets you to the door of "specs for AI." If your team is already on SDD, start referencing DESIGN.md tokens from inside your feature specs and you avoid maintaining the same rules in two places. The two movements look set to converge in the next phase.

Not Everything Becomes a Spec

The discussion of the three-layer split tends to drift toward "shouldn't we just spec everything," but in practice, that doesn't happen.

Rules that can't be formally verified stay as natural-language documents. Tone, structural choices, cultural nuance. Things like "how to open an article with empathy" or "how to give an ending the right amount of resonance" — judgment-based qualities. The cost of speccing them isn't the issue; the essence falls out when you try.

The judgment is straightforward: "is this formally verifiable?"

  • Color contrast ratios (verifiable) → DESIGN.md
  • Word substitutions like "leverage → use" (verifiable) → SKILL.md
  • Tone (soft assertions, not textbook-sounding), overall stance (not teaching, just organizing) and similar (not verifiable) → stays in AGENTS.md / CLAUDE.md

For small teams, "one natural-language file" is often enough. If CLAUDE.md alone is keeping things running, there's no need to force a split. The trade-off between the cost of speccing and the load of operating it depends on team size and how long the operation has to last.

The three-layer split is something you adopt incrementally, just like SDD — you don't need to spec everything at once. Start with the complex areas, the areas where verification helps most.

In other words, the three-layer split isn't a goal. It's an option you adopt when the situation calls for it.

Where to Start

A few options come into view from this overview.

A reasonable first move is to open your CLAUDE.md or style guide and sort it into "formally verifiable" and "judgment-based" sections. Color and typography rules, word substitution lists, structural rules. If a useful amount of verifiable content sits there, pick one to break out into either DESIGN.md (appearance) or SKILL.md (task). Don't try to split everything at once — start with the most independent piece.

Pulling in external skills is another route. Drop a ready-made SKILL.md like avoid-ai-writing into ~/.claude/skills/ and your stance as a writer doesn't change — only the verification gets handed off to the machine.

Teams already running Kiro or spec-kit are probably at the stage where they could try referencing DESIGN.md tokens from inside .kiro/specs/{feature}/design.md. The cross-reference between feature specs and persistent specs is still a thin area in terms of public examples.

The shared stance: don't try to spec everything at once. Document split → operational trial → speccing — staged migration is the realistic path. The three-layer split isn't a finished form. It's a movement still in progress, and that's the safer way to read it.

AI rules started splitting from a single natural-language document into three spec formats. That's another side of the same movement as SDD.

Not everything becomes a spec, but managing different roles in different files — that ordinary structuring is starting to apply to AI agents, too.

Top comments (0)