Roi

Posted on Dec 9, 2025 • Edited on Dec 27, 2025

From Chatting with AI to Architecting a Prompt Framework

#promptengineering #tooling #ai #productivity

Why I stopped prompting for code and started treating AI as a system

We’ve all been there.

You ask your AI coding assistant to build a simple component.

The code generates instantly. It looks clean. It runs without errors.

But then you look closer:

It introduced a new icon library we don’t use.
It ignored the project’s strict folder structure.
It hardcoded 20px padding instead of using the design system’s spacing scale.

What follows is usually a back-and-forth: nudging the model, refactoring the output, and correcting architectural drift.

At some point it became clear to me that the issue wasn’t code quality.

It was context.

The AI behaves like a highly capable junior developer who just joined the team:

it understands syntax, but not local conventions, constraints, or intent.

Most attempts to fix this focus on “better prompts”.

What proved more effective for me was a structural shift: treating AI interaction as a system design problem.

Building an AI “Operating System”

Instead of using the AI as a conversational tool, I started treating it as a programmable engine.

Inside the repository, I gradually introduced a file-based structure that governs how the AI behaves across different tasks.

It doesn’t aim to be intelligent on its own.

It loads context explicitly.

Some protocols are always available.

Others are loaded only when they provide clear value—primarily through @plan or @deep-think.

Over time, this evolved into a consistent working model.

1. The Switchboard: A Single Entry Point

One limitation of large system prompts is that they don’t scale well.

As rules accumulate, behavior becomes harder to reason about.

Instead of adding more instructions, I repurposed copilot-instructions.md—a convention file provided by the tooling—as a routing layer.

Functionally, it acts as a switchboard.

It doesn’t contain rules.

It points to them.

For example:

“If the user types @plan, load the Architecture Guidelines.

If they type @ui-perfect, load the Design System rules.

If they type @test, load the QA protocols.”

Smaller prompts remain lightweight.

Complex tasks load additional context explicitly.

This separation significantly reduced unpredictable behavior.

2. Deep Thinking as a Decision Step

Directly asking an AI for a solution tends to produce common or generic patterns.

For tasks involving trade-offs, I use a dedicated step: @deep-think.

In this mode, the model is required to:

Examine the problem from a system-level perspective
Propose multiple approaches
Make risks and constraints explicit

To structure that reasoning, I use a simple traffic-light rubric:

🔴 BLOCKER — conflicts with constraints or introduces clear risk
🟡 WARNING — viable, but with notable trade-offs
🟢 OPTIMAL — aligned, maintainable, and consistent with the system

The value here isn’t the labels themselves, but the requirement to justify decisions before implementation.

A key part of this step is “search before build”.

Before proposing new solutions, the model scans the existing codebase for patterns, utilities, and prior art.

This consistently reduced duplication and divergence.

3. Governance as an Optional Layer

For larger or long-lived features, I sometimes introduce explicit governance files:

Architecture boundaries
Code style constraints
Known anti-patterns

These files exist primarily to anchor AI behavior, not as human-facing documentation.

They are intentionally opt-in.

For smaller or exploratory work, I often skip them entirely.

The goal is measurable leverage, not procedural overhead.

4. Separating Planning from Execution

Another adjustment that proved useful was separating design from implementation.

Rather than asking the AI to plan and build simultaneously, I split the process.

`@plan`

This step produces a PLAN.md document outlining:

Affected files
High-level data flow
Module boundaries and contracts
Testing considerations

No code is generated.

The plan can be reviewed and adjusted independently.

`@build`

Only after the plan is accepted do I run @build.

At that point, the AI treats the plan as a specification and implements it directly.

This separation reduced unintended structural changes.

5. Handling UI Accuracy

Visual accuracy remains an area where AI output is unreliable without guidance.

When UI details matter, I use a dedicated @ui-perfect step.

The flow is straightforward:

Analyze layout and spacing from the design
Map measurements to design-system tokens
Implement only after normalization

This step isn’t universal, but when precision is required, separating analysis from implementation produces more consistent results.

6. A Typical Flow

A standard feature workflow now looks like this:

Run @plan
Review and refine PLAN.md
Run @build
Validate behavior
Run @ui-perfect if visual precision matters
Run @extract-concerns
Run @test

The process itself isn’t complex.

What changed was predictability.

Outcome

This approach didn’t eliminate the need for judgment.

What it changed was where that judgment is applied.

Instead of correcting output after the fact, effort is invested upfront in defining the context in which output is produced.

This isn’t presented as a general prescription.

It’s simply the working style that emerged for me—and the one that brought AI output in line with production expectations.

DEV Community