Why I stopped prompting for code and started treating AI as a system
We’ve all been there.
You ask your AI coding assistant to build a simple component.
The code generates instantly. It looks clean. It runs without errors.
But then you look closer:
- It introduced a new icon library we don’t use.
- It ignored the project’s strict folder structure.
- It hardcoded
20pxpadding instead of using the design system’s spacing scale.
What follows is usually a back-and-forth: nudging the model, refactoring the output, and correcting architectural drift.
At some point it became clear to me that the issue wasn’t code quality.
It was context.
The AI behaves like a highly capable junior developer who just joined the team:
it understands syntax, but not local conventions, constraints, or intent.
Most attempts to fix this focus on “better prompts”.
What proved more effective for me was a structural shift: treating AI interaction as a system design problem.
Building an AI “Operating System”
Instead of using the AI as a conversational tool, I started treating it as a programmable engine.
Inside the repository, I gradually introduced a file-based structure that governs how the AI behaves across different tasks.
It doesn’t aim to be intelligent on its own.
It loads context explicitly.
Some protocols are always available.
Others are loaded only when they provide clear value—primarily through @plan or @deep-think.
Over time, this evolved into a consistent working model.
1. The Switchboard: A Single Entry Point
One limitation of large system prompts is that they don’t scale well.
As rules accumulate, behavior becomes harder to reason about.
Instead of adding more instructions, I repurposed copilot-instructions.md—a convention file provided by the tooling—as a routing layer.
Functionally, it acts as a switchboard.
It doesn’t contain rules.
It points to them.
For example:
“If the user types
@plan, load the Architecture Guidelines.
If they type@ui-perfect, load the Design System rules.
If they type@test, load the QA protocols.”
Smaller prompts remain lightweight.
Complex tasks load additional context explicitly.
This separation significantly reduced unpredictable behavior.
2. Deep Thinking as a Decision Step
Directly asking an AI for a solution tends to produce common or generic patterns.
For tasks involving trade-offs, I use a dedicated step: @deep-think.
In this mode, the model is required to:
- Examine the problem from a system-level perspective
- Propose multiple approaches
- Make risks and constraints explicit
To structure that reasoning, I use a simple traffic-light rubric:
- 🔴 BLOCKER — conflicts with constraints or introduces clear risk
- 🟡 WARNING — viable, but with notable trade-offs
- 🟢 OPTIMAL — aligned, maintainable, and consistent with the system
The value here isn’t the labels themselves, but the requirement to justify decisions before implementation.
A key part of this step is “search before build”.
Before proposing new solutions, the model scans the existing codebase for patterns, utilities, and prior art.
This consistently reduced duplication and divergence.
3. Governance as an Optional Layer
For larger or long-lived features, I sometimes introduce explicit governance files:
- Architecture boundaries
- Code style constraints
- Known anti-patterns
These files exist primarily to anchor AI behavior, not as human-facing documentation.
They are intentionally opt-in.
For smaller or exploratory work, I often skip them entirely.
The goal is measurable leverage, not procedural overhead.
4. Separating Planning from Execution
Another adjustment that proved useful was separating design from implementation.
Rather than asking the AI to plan and build simultaneously, I split the process.
@plan
This step produces a PLAN.md document outlining:
- Affected files
- High-level data flow
- Module boundaries and contracts
- Testing considerations
No code is generated.
The plan can be reviewed and adjusted independently.
@build
Only after the plan is accepted do I run @build.
At that point, the AI treats the plan as a specification and implements it directly.
This separation reduced unintended structural changes.
5. Handling UI Accuracy
Visual accuracy remains an area where AI output is unreliable without guidance.
When UI details matter, I use a dedicated @ui-perfect step.
The flow is straightforward:
- Analyze layout and spacing from the design
- Map measurements to design-system tokens
- Implement only after normalization
This step isn’t universal, but when precision is required, separating analysis from implementation produces more consistent results.
6. A Typical Flow
A standard feature workflow now looks like this:
- Run
@plan - Review and refine
PLAN.md - Run
@build - Validate behavior
- Run
@ui-perfectif visual precision matters - Run
@extract-concerns - Run
@test
The process itself isn’t complex.
What changed was predictability.
Outcome
This approach didn’t eliminate the need for judgment.
What it changed was where that judgment is applied.
Instead of correcting output after the fact, effort is invested upfront in defining the context in which output is produced.
This isn’t presented as a general prescription.
It’s simply the working style that emerged for me—and the one that brought AI output in line with production expectations.
Top comments (0)