How files like AGENTS.md, CLAUDE.md, MEMORY.md, SKILLS.md, slash commands, hooks, MCP servers, and settings.json plug into the agent architecture.
1. The big idea
Configuration files are not a new architectural layer. They're a control plane, a parallel set of artifacts that parameterize the existing layers (interface, orchestrator, LLM, memory, tools, environment) without changing the agent's code.
The terminology is borrowed from networking and distributed systems: the data plane processes traffic at runtime, while the control plane is everything that configures what the data plane does (routes, policies, ACLs) without participating in the per-packet path. The split is standard in software-defined networking and Kubernetes. (§7 explains the analogy in more detail.)
A useful teaching framing (not a standard literature term, just a way these notes organize the same material) is that the agent has three knobs you can turn from outside:
- What it knows: the memory layer (facts, history, docs).
- How it acts: the reasoning layer (instructions, conventions, style).
- What it can do: the tool and orchestrator layers (tools, permissions, hooks, runtime limits).
Every config file fits one of these knobs.
2. The mapping: from file to layer
Reasoning layer (becomes part of the system prompt)
| File | Purpose |
|---|---|
AGENTS.md / CLAUDE.md / .cursorrules
|
Project-level conventions, dos/don'ts, output style. Loaded once at session start and merged into the system prompt. |
.claude/agents/*.md |
Subagent definitions: system prompt, tool allow-list, model. Loaded only when that subagent is spawned. See Claude Code subagent docs. |
commands/*.md (slash commands) |
User-message templates. Expand into the user turn when typed. See Claude Code slash commands. |
skills/* (body) |
Capability instructions. Lazily injected when the skill is invoked. See Claude Code skills. |
These shape how the LLM thinks before it sees a single user message.
AGENTS.mdis a small, opinionated, cross-vendor spec (agents.md) originated by OpenAI Codex, Amp, Jules (Google), Cursor, and Factory, now stewarded by the Agentic AI Foundation under the Linux Foundation.CLAUDE.mdis the Claude-Code-specific equivalent, predatesAGENTS.md, and is read by Claude Code in addition to (not instead of) anyAGENTS.mdit finds.
Memory layer (persistent context the LLM reads)
| File | Purpose |
|---|---|
MEMORY.md |
Always-loaded index of persistent memories. |
memory/*.md |
Individual memory files. Loaded on reference or retrieval. |
docs/, READMEs, knowledge bases |
Semantic memory available for RAG. |
| User profile / preference files | Long-term user-level state. |
Distinction vs system-prompt files: memory is factual (what's true, what happened);
AGENTS.md-style files are behavioral (how to act).
Tool layer
| File | Purpose |
|---|---|
.mcp.json (MCP server registry) |
Declares external MCP tool servers; their tools become callable. |
| Permission allow / deny lists | Which tools run freely, which need approval. |
| Custom tool definitions | User-defined functions exposed to the LLM. |
skills/* (metadata) |
The frontmatter (name, description) appears in the tool catalog so the LLM knows skills exist. |
Orchestrator layer (the loop, not the LLM)
These are read by the harness (the orchestrator runtime). The LLM never sees them.
| File | Purpose |
|---|---|
settings.json |
Model selection, permission mode, max steps, env vars, caching, cost limits. See Claude Code settings. |
hooks/ |
Shell commands the harness runs at specific points: PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, Stop, Notification. See Claude Code hooks. |
| Output style / status line / keybindings | Interface and runtime config. |
Hooks are special. They let you inject deterministic behavior into the loop without trusting the LLM to do it. The orchestrator triggers them on events; the LLM never sees or invokes them. This is the "harness-only" loading strategy in §5.
Environment layer
| File | Purpose |
|---|---|
| Sandbox config | Containers, jails, ephemeral worktrees. |
.env |
Secrets and environment variables for tool execution. |
| Working dir / worktree settings | Where tool calls actually run. |
| Container settings | Isolation, resource limits. |
3. Skills: the interesting case
Skills span the boundary between Reasoning and Tools. They're neither pure prompt nor pure tool, they're prompt expansions triggered on demand.
- The metadata (YAML frontmatter:
name,description,when_to_use) is always in the tool catalog, so the LLM knows the skill exists. - The body (the actual instructions) is only injected into context when the skill is invoked (via the
Skilltool or auto-matched).
This is the clever bit: lazy loading. You can have 50 skills available and only ~50 lines of metadata in context. Full bodies enter only when relevant. That's how capability scales without blowing the prompt budget.
Skills are now a cross-vendor open standard. Originally a Claude Code feature, the
SKILL.mdformat was released by Anthropic as the open Agent Skills spec (agentskills.io) and has been adopted by Cursor, GitHub Copilot, VS Code, Gemini CLI, OpenAI Codex, OpenHands, Goose, Letta, JetBrains Junie, Factory, Amp, Kiro, OpenCode, Spring AI, Mistral Vibe, Snowflake Cortex, Databricks Genie, Laravel Boost, and others. So while the rest of this section uses Claude Code's terminology, the primitive (lazy-loaded prompt expansions discoverable through tool-catalog metadata) is portable across the ecosystem in the same way that MCP tool servers are. The spec lives at agentskills.io/specification, the source at github.com/agentskills/agentskills, and Anthropic's reference example skills at github.com/anthropics/skills.
Slash commands work similarly but expand into the user message rather than being invoked as a tool call. The official descriptions live at Claude Code skills and Claude Code slash commands.
4. Loading model: when each artifact enters context
| Artifact | LLM sees it? | When loaded |
|---|---|---|
AGENTS.md / CLAUDE.md
|
Yes | Session start, always |
MEMORY.md (index) |
Yes | Session start, always |
| Individual memory files | Yes | On reference / retrieval |
| Skill metadata (frontmatter) | Yes | Session start, always |
| Skill body | Yes | Only when invoked |
| Slash command body | Yes | Only when typed |
| Subagent prompt | Yes (in subagent's own context) | When spawned |
| MCP tool schemas | Yes | Session start, always |
| RAG retrieved chunks | Yes | On retrieval |
settings.json |
No | Process start, used by harness |
| Hooks | No | Executed by harness on events |
.mcp.json (server registry) |
No | Read by harness at startup |
keybindings.json, statusline |
No | Used by interface layer |
.env, sandbox config |
No | Used by environment layer |
The bottom half is the important distinction: the harness reads it, the LLM never does. Hooks especially, they're how you make things happen around the LLM without trusting the LLM to do them.
5. Three loading strategies
- Always loaded: sits in the system prompt or context every turn. Cost: context-window space. Use for: rules that must never be skipped.
- Lazy / on-demand: only enters context when invoked, retrieved, or referenced. Cost: a tool call or retrieval step. Use for: large bodies of capability that aren't always needed.
- Harness-only: never enters the LLM context at all. The runtime executes it deterministically. Cost: none on context. Use for: anything you don't want the LLM to be able to skip, override, or hallucinate around.
The art of agent configuration is choosing the right strategy for each piece. Critical safety policy → harness-only (hook). Skill the LLM picks when relevant → lazy. Project conventions → always loaded.
6. Mental model: three knobs
(Teaching framing, not standard terminology, see §1.)
| Knob | Layer it tunes | Files |
|---|---|---|
| What it knows | Memory |
MEMORY.md, memory files, docs, RAG sources |
| How it acts | Reasoning |
AGENTS.md, CLAUDE.md, subagent prompts, skills, slash commands |
| What it can do | Tools + Orchestrator | MCP servers, permissions, hooks, settings.json
|
Skills are special: they tune the how it acts knob lazily, so capability doesn't cost context until you need it.
7. Why "control plane"?
Borrowing the term from networking and systems: the data plane is the runtime that processes requests turn by turn (the architecture in agent-architecture.md). The control plane is everything that configures the data plane without participating in it directly: policies, capabilities, permissions, prompts, hooks. This is the same split used in software-defined networking, in Kubernetes (kube-apiserver + controllers vs kubelet + CNI), and in service meshes.
A well-designed agent has a clean separation:
- The data plane is generic: same loop, same LLM, same orchestrator regardless of project.
- The control plane is project-specific:
AGENTS.md, the MCP servers you've enabled, the hooks you've installed, the skills you've added.
This is why a single tool like Claude Code can become wildly different agents in different projects — same data plane, different control planes.
8. Practical implications
-
Don't put behavior in code that belongs in the control plane. If you find yourself forking the harness to change how the agent behaves in your project, you probably want a hook, an
AGENTS.mdrule, or a skill instead. - Don't trust the LLM to follow rules that matter. Critical policies belong in hooks (harness-enforced), not in the system prompt (LLM-suggested).
-
Use lazy loading aggressively. Putting everything in
AGENTS.mdblows your context. Move large or situational content to skills or memory. - Treat the control plane as code. Version it, review it, test it. Hooks especially can have outsized effects.
9. TL;DR
The runtime architecture is how an agent works. The control plane is how you tell it to work in your context. Each config file plugs into a specific layer, with one of three loading strategies (always / lazy / harness-only). The art is choosing the right strategy for each piece.
References
Notes on framing
A few framings used above are the author's teaching devices, not standard literature terms; they're flagged here so the reader doesn't go looking for them in a textbook:
- "The harness": common in industry usage for "the orchestrator runtime that wraps the LLM" (Claude Code, Aider, OpenHands, Devin), not in academic papers.
- "Three knobs" (what it knows / how it acts / what it can do): a teaching device specific to these notes. The categories themselves correspond to standard architectural layers; only the triplet framing is original.
- "Control plane / data plane": borrowed from networking and Kubernetes; an analogy, not a deep claim about agent internals.
- "Always loaded / lazy / harness-only": descriptive labels; the underlying mechanisms are standard but the three-way split is a notes-internal taxonomy.


Top comments (0)