CLAUDE.md, .cursor/rules, Kiro Specs, Devin Playbooks. The past year has seen an explosion of instruction files for AI. While every new tool comes with a new name, doesn't what they are doing feel like deja vu?
Isnt this a requirements definition document? Isnt this onboarding material? Isnt this a runbook?
Anyone involved in software development for a long time likely feels the same way. At the same time, you might feel that some practices previously considered common sense are starting to become a hindrance in modern development.
This article explores the true nature of this deja vu and sense of misalignment.
To state the conclusion first: the design philosophy of AI coding tools is a reinvention of practices built over 20 years of human team development. In this process, part of software development common sense is structurally flipping. There appears to be a simple law to categorize what flips and what remains universal.
What are Instruction Files Reinventing?
First, let's organize the facts.
If we map the ways to provide instructions in various AI tools to what humans have done in team development for years, it looks like this:
| AI Tool Concept | Human Team Development Equivalent |
|---|---|
| Kiro Specs | PRD / Requirements Definition |
| CLAUDE.md | Onboarding Documentation |
| .cursor/rules | Coding Conventions (.eslintrc, etc.) |
| Devin Playbooks | Operating Procedures / Runbooks |
| Hooks (PreToolUse, etc.) | Git hooks / CI Pipelines |
| Skills | Internal Common Libraries |
| Plugins | Templates distributed via npm/pip |
Many of you have likely noticed this correspondence.
The important part is what comes next. In human team development, there was a long period of trial and error regarding the design of what to provide. Giving only coding conventions doesn't help a new member get moving. Giving only a PRD doesn't convey design intent. Giving only a procedure manual doesn't allow for exceptional judgments. The knowledge that a person becomes effective only when provided with layers of why this design was chosen, what must not be touched, and criteria for judgment when stuck has been accumulated over many years.
The instruction design for AI tools is exactly a reinvention of this knowledge.
Furthermore, the differences in design philosophy among tools can be translated into differences in what kind of organization provides what to a new member first.
Kiro is like a company that starts with specifications. Let's structure the requirements and clarify acceptance criteria before starting implementation. The design of having three stages—requirements.md, design.md, and tasks.md—aims for a middle ground between pre-definition and iteration. (Note: Kiro provides Requirements-First and Design-First workflows; here, we focus mainly on the Requirements-First flow).
Claude Code is like a company that emphasizes verbalizing implicit knowledge. The guideline to write persistent context that cannot be inferred from code is spreading as a best practice for CLAUDE.md, which is the essence of onboarding materials. It is the idea of organizing technical stacks, reasons for past design decisions, and hidden pitfalls that cannot be read from the code itself.
Cursor is like a company that gives rules and lets people work freely. You write project rules flatly in .cursor/rules and leave the rest to the agent's judgment. Because the degree of freedom is high, the quality of the rules directly affects the output quality. Conversely, since the ability to write rules serves as leverage for the team's output quality, the operational design of rules (who updates them and when) becomes implicitly important.
Devin is like a company that makes people follow procedures. It is a design where humans control the task execution flow by explicitly defining steps in Playbooks.
None of these is the single correct answer; rather, they represent design differences based on the nature of the team and the task.
Predecessors Hit the Same Walls
Now for the main topic.
Looking at the failure patterns occurring in AI tool instruction design, we notice they are structurally identical to the failures repeated in human team development over the last 20 years.
Collapse Without Documentation
Using Claude Code without writing CLAUDE.md and repeatedly re-instructing it because it's not right is almost the same structure as the silos created when starting a project without design documents.
In human teams, if the basis for design decisions is not documented, a state of it's unknown unless you ask that person is born. For AI, it is even simpler: context disappears once you cross sessions. As a result, you repeat the same explanation, yet get a different output than before. This is the very challenge stated over a decade ago: without documentation, implicit knowledge evaporates.
The efforts of Kiro to structure requirements.md in EARS format and Claude Code to persist prerequisite knowledge via CLAUDE.md are architectural answers to this evaporation.
Paralysis by Information Overload
Adding ten MCP servers only to have the response quality collapse is another common pattern.
Defining too many tools crowds the context window, and the agent begins to skim the results. As pointed out in Anthropic's article on Code execution with MCP, direct calls to MCP tools can cause token consumption to explode. One example mentioned is a case where transferring a transcript of a two-hour sales meeting from Google Drive to Salesforce could consume an additional 50,000 tokens.
https://www.anthropic.com/engineering/code-execution-with-mcp
This is the same structure as inviting a new hire to 30 Slack channels on their first day and causing them to freeze from information overload. Both humans and AI have limits to the amount of information they can process, necessitating a design that provides necessary information at the appropriate granularity. The evolution of Claude Code Skills, Cursor Rules, and Kiro Powers as mechanisms to provide only necessary information when needed is a response to this problem.
Chaos Without Specifications
Kiro Specs' design philosophy of writing specifications first is an answer to the classic failure pattern of starting development without specs and seeing the project go up in flames in the latter half.
Since AI generates code probabilistically, it will fill in the blanks of ambiguous specifications on its own. You won't know if that matches your intent until you verify the output. If you run with ambiguous specs, a game of whack-a-mole style bug fixing begins. This is the same for both human and AI development.
The Inevitability of Automation
Forcing automatic formatting or linting via Hooks upon saving files is a reenactment of the history where we moved from relying on human goodwill for reviews and style consistency to automating it with CI.
Ten to fifteen years ago, themes like Jenkins implementation, test automation, and continuous integration were frequent at web conferences. Discussions on securing quality through systems became popular, leading to build servers, the cultivation of automated testing cultures, and the establishment of deployment pipelines. The rapid development of Hooks and Skills in the context of AI coding today is exactly a reenactment of this history.
The Law: What Happens When Generation Costs Drop?
While the summary so far is that history repeats itself, another point is that some common sense is flipping.
To explain this flip, I propose a law:
When generation cost decreases, the center of gravity for value shifts from the product to the intent.
The cost of writing code has dropped dramatically. Instruct an AI, and hundreds of lines of code appear in seconds. Let's consider the structural consequences of this change.
It is the same structure as the movement of bottlenecks in the Theory of Constraints (TOC). When the cost of one process drops sharply, the bottleneck moves to another process. Ten years ago, implementation man-hours were often the bottleneck, so it was rational to invest in technologies that improved code quality, such as test automation, refactoring, and coding conventions.
Now, writing code itself has become cheap. Consequently, the bottleneck moves to what to make it write. The process of verbalizing specifications, recording design decisions, and organizing context—in other words, the process of refining intent—is becoming the bottleneck for productivity.
Note that this law relies on the current asymmetry where generation costs have dropped, but verification costs have not decreased at the same rate. If AI-driven code review or formal verification advances significantly in the future, verification costs may also drop dramatically, moving the bottleneck to yet another process. In that sense, this is not a permanent law of the universe but a law explaining current structural dynamics.
Using this law, let's categorize development common sense into what flips and what remains universal.
What Flips: Refining the Product (Code)
These are things considered a given for any good developer over the past 20 years, but for which the ROI is changing in the AI era.
Note that these haven't become meaningless; rather, they have changed from things you should always invest in to things you judge based on the situation.
DRY Principle (Flip Degree: Medium to High)
DRY (Don't Repeat Yourself) is a design principle to eliminate code duplication and centralize changes. This was a rational investment when humans were maintaining the code. Duplication leads to missed updates and becomes a hotbed for bugs.
However, in the context of AI generating code, abstraction for the sake of DRY can sometimes become a risk.
Introducing abstraction layers for unification increases the chance of the AI misreading the context. If the responsibility of a common function is unclear, the AI might use that function for unintended purposes or call it in inappropriate places. Concrete, self-contained code is often less likely to be misunderstood by AI.
Of course, DRY remains important in core parts of the codebase that humans will maintain long-term. The criteria for judgment have changed: we must now discern whether this code is a core component for long-term maintenance or a peripheral one intended for regeneration.
Refactoring (Flip Degree: Medium)
Refactoring is the process of improving internal structure without changing external behavior. It has long been valued as a means of paying off technical debt.
As code generation costs drop, the calculation changes. Which is faster and more reliable: carefully refactoring existing code over a week, or clarifying specifications and having it regenerated? The latter is increasingly becoming a realistic option.
However, this doesn't mean refactoring is unnecessary. Regeneration is only effective within the scope where specifications can be clearly verbalized. Code containing implicit specifications accumulated over years of operation—such as edge case handling not written in docs or performance tuning results—risks being lost during regeneration.
The axis of investment has shifted from always refactor to refactor or regenerate in this case.
Test Coverage (Flip Degree: Low to Medium)
The value of writing tests itself does not change. What has changed is the ROI of humans writing test code by hand. What is flipping here is the allocation of investment toward metrics, not the design principle.
If you leave identifying test cases and generating test code to AI, comprehensive tests appear in seconds. What humans should do here is design the test strategy—judging what should be tested—rather than implementing the test code.
Spending time on determining whether a test verifies a truly meaningful specification is becoming higher ROI than spending time just to raise coverage numbers.
Naming and Readability (Flip Degree: Low)
The degree of flip for investment in readability is smaller compared to other items, but judgment is beginning to change depending on who is reading. If the code is for humans to read and understand, beautiful naming and clever comments remain important.
On the other hand, in code primarily read and written by AI, type information and intent explanations in JSDoc can improve AI understanding more than beautiful naming intended for humans. The definition of readability itself is changing depending on the reader.
What is Universal: Refining the Intent
While some things flip, others have always been important and are even more worth investing in now. According to the law, these are things related to the definition of intent and judgment—processes AI cannot do.
Verbalizing Specifications and Intent
The ability to accurately verbalize what to make and why is actually increasing in value in the AI era.
It is no coincidence that Kiro Specs is designed to have you write specifications first. If you throw ambiguous specs at an AI, it will probabilistically fill in the blanks, resulting in code that sort of works but you don't know if it meets the specs. The effort to verify and fix this often exceeds the effort of writing specifications from the start.
Starting development without a PRD leads to chaos later. This lesson from 10 years ago is perfectly valid for AI development. In fact, because AI generation speed is fast, ambiguous specs can lead to a large amount of rework in a short time, arguably making the importance of specifications greater than before.
Recording Design Decisions
The culture of ADR (Architecture Decision Records)—why this configuration was chosen, what alternatives existed, and why they were rejected—is being rediscovered in the context of CLAUDE.md.
The guideline for CLAUDE.md to write persistent context that cannot be inferred from code is exactly the recording of design decisions. Information such as packages/shared/src/legacy/ is a compatibility layer for old APIs. You will want to refactor it, but do not touch it (three external systems depend on it, scheduled for removal in 2026 Q3) cannot be inferred no matter how closely an AI reads the code.
To have AI write correct code, we must provide the background of these design decisions as context. This is the same structure as onboarding a new human member; the verbalization and persistence of prerequisite knowledge is universally important across eras.
Review and Verification Ability
The value of the ability to read and judge is rising relative to the ability to write.
In the era when CI was just about if the build passes, it's okay, the quality gate was the success or failure of the build. Now, the phase where humans judge if the AI output meets the requirements is becoming the quality bottleneck.
Reviewing AI-generated code differs in quality from reviewing human-written code. AI can produce inconsistent errors and it is not rare for it to lie convincingly. To spot library version mismatches, calls to non-existent APIs, or logic that ignores instructions, one needs an understanding of the entire codebase and a deep understanding of specifications.
Since review is becoming the process that determines quality, investment here is universally important.
Organizing Context
CLAUDE.md, Steering, Rules. Though the names differ, they all represent the act of organizing prerequisite knowledge to provide to the AI.
We are in an era where the most cost should be spent on organizing preconditions before writing code. This is a direct consequence of the law: when generation cost drops, the bottleneck moves to intent. Code can be regenerated as many times as you like, but if the preconditions are wrong, it will never be correct no matter how many times you regenerate it.
Categorizing by the Law
If we look at the summary so far along one axis, the criteria for categorization are simple:
- What Flips = Investment in processes AI can now do cheaply (Refining the product)
- What is Universal = Investment in processes AI still cannot do (Refining the intent)
As generation costs drop, the ROI of the former decreases, and the ROI of the latter increases. This can be called a structural law that does not depend on specific tools or languages.
Predicting the Next from History
Combining this law with patterns from software development history, we can see several problems that have not yet fully materialized but are almost certain to come in the context of AI coding. While each of these themes is deep enough to be explored in its own separate article, I will only outline their structures here.
The Problem of Document Obsolescence
The issue where no one updates the internal Wiki and a new member causes an accident using old procedures has occurred repeatedly in technical organizations.
The same will happen with CLAUDE.md or Kiro Specs. Code is generated and updated by AI, but the maintenance of instruction files is done by humans. This asymmetry will eventually become a problem. A state where the code is up-to-date but the instruction file is a lie is the classic problem of code works but docs are wrong. Handling this issue may become a primary axis in future tool competition.
Kiro's Agent Hooks is a mechanism that can trigger agents based on events such as file changes, allowing for the configuration of tasks like automatic document updates. However, as pointed out in the SDD (Spec-Driven Development) tool comparison article published on martinfowler.com, drift between specification and implementation remains a challenge. Addressing this issue may become a key factor in the future competition among tools.
https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html
The Return of It Works on My Machine
Before Docker, deployments often broke due to differences in development environments. It works on my machine hindered team development due to lack of reproducibility.
A similar problem is appearing in AI tool instruction design. On GitHub, dotfiles repositories like My Claude Code Settings or My AI Agent Setup are surging, creating a culture of sharing personally optimized CLAUDE.md, skills, and hooks. While healthy for individual productivity, It works with my CLAUDE.md could eventually become an obstacle in team development.
The AI Version of the DevOps Problem
DevOps was born out of the disconnect between those who build and those who run. In the future, the same wall may arise in the structure of humans operating code in production that was written by AI.
Lisanne Bainbridge's 1983 paper Ironies of Automation pointed out that as automation becomes more sophisticated, operators lose manual skills and become unable to respond to exceptions. This insight, repeatedly confirmed in aviation and process control, also applies to software development. As AI writes more code, humans understand the details of the codebase less, and humans may become unable to handle failures that the AI cannot handle.
https://en.wikipedia.org/wiki/Ironies_of_Automation
The Orchestration Problem of Multi-Agents
With the decentralization of microservices, how to maintain consistency between services became a major challenge. The same could happen with Agent Teams. Running 10 agents in parallel might decrease efficiency due to the overhead of sharing context. This is the Agent version of Brooks's Law—adding manpower to a late software project makes it later—proposed by Fred Brooks in The Mythical Man-Month in 1975.
https://en.wikipedia.org/wiki/Brooks%27s_law
Conclusion
The design philosophy of AI coding tools is a reinvention of practices built over 20 years of human team development. Some parts of history repeat, while some common sense flips.
The law for categorization is simple: When generation cost decreases, the center of gravity for value shifts from the product to the intent.
Refining code—DRY, refactoring, naming, coverage—is changing from things you should always invest in to things you judge based on the situation. Refining intent—verbalizing specifications, recording design decisions, review ability, organizing context—remains as important as ever, and its weight as a bottleneck has increased.
And if you know the patterns of history, you can see through new tools and concepts, recognizing them as that thing from 10 years ago. If you can see through them, you can apply the lessons left by your predecessors as they are.
Problems that look new may boil down to known patterns when viewed structurally. Thinking this way, there might not be that many truly unknown challenges after all.
Top comments (0)