mufeng

Posted on Jun 7

How to Make AI Coding Agents Actually Follow Engineering Process

#ai #codex #softwareengineering

The problem isn't that AI coding agents write bad code.

The problem is that they skip steps.

Ask an agent to fix a bug—it reads a few files, guesses a cause, patches the code. Ask it to add a feature—it starts writing before anyone's agreed on what the feature actually does. Ask it to refactor—it touches unrelated files, reformats half the codebase, and hands you a diff too large to review.

None of this is stupidity. It's the absence of process discipline.

Software development has always required workflow constraints: clarify before implementing, plan before coding, test before shipping, debug root causes not symptoms, verify before declaring done. The question is whether your AI agent follows them—or bypasses them entirely.

Superpowers is a plugin framework for Claude Code and Codex that encodes those constraints as loadable, composable agent workflows. This is what it is, when to use it, and how to get started.

What "Skills" Actually Are

The word "skill" is overloaded in AI contexts. Here it means something specific: a workflow protocol that loads into an agent session and constrains how the agent approaches a category of task.

Not "be more careful." Not a style guide. A specific sequence of steps with defined inputs, outputs, and verification gates.

The analogy is a checklist for a surgeon or a pilot—not because either lacks expertise, but because cognitive discipline under pressure requires procedural anchors.

The core Superpowers Skills cover the major failure modes in AI-assisted development:

Skill	Failure Mode It Prevents	What It Produces
`brainstorming`	Implementing the wrong thing	Clarified scope with edge cases surfaced
`writing-plans`	Drifting mid-implementation	Executable task list: file scope + verification per step
`test-driven-development`	"Works on my machine" guesswork	RED-GREEN-REFACTOR cycles that lock behavior first
`systematic-debugging`	Shotgun-patching symptoms	Root cause hypotheses, evidence-based elimination, minimal fix
`verification-before-completion`	"Should be done" claims	Actual test runs, browser paths, or device checks
`requesting-code-review`	Merging unreviewed code	Severity-ranked risk list before merge
`using-git-worktrees`	Task bleed across workstreams	Isolated workspaces with clean baseline

These aren't independent tips—they chain into a complete development pipeline:

Vague requirement
  → brainstorming  (scope + edge cases)
  → writing-plans  (executable task list)
  → test-driven-development  (behavior locked by tests)
  → requesting-code-review  (risks surfaced)
  → verification-before-completion  (actually verified)

The Key Insight: Process Errors vs. Code Errors

AI agents will get better at writing correct code over time. They won't automatically get better at following process—unless process is encoded somewhere.

The bugs Superpowers Skills prevents aren't syntax errors or logic bugs. They're:

Building the wrong feature because nobody asked the right clarifying questions
Writing code that "looks complete" but has zero coverage on the edge cases that matter
Patching a symptom while the root cause persists
Refactoring that expands scope until the diff is unmergeable
Shipping because the agent said "done" without running anything

A more capable model doesn't fix these. A faster agent arguably makes them worse—more code written in the wrong direction before anyone catches it.

A Real Example: Adding Invoice Export

Imagine you tell an agent: "Add a billing export feature."

Without workflow constraints, it will probably find the billing service, write an endpoint, add a download button, and report completion. Whether that implementation handles empty data, unauthorized requests, large datasets, or export format edge cases depends entirely on whether the model guessed right.

With Superpowers Skills, the flow looks like this:

Step 1: `brainstorming`

Before touching any files, the agent surfaces questions:

Export format: PDF, CSV, or Excel?
Date range limits?
Permission checks required?
Sync download or async background job?
What does the user see on failure?

This isn't bureaucracy. This is the list of decisions that will otherwise get made silently—by the model, in the wrong direction.

Step 2: `writing-plans`

A compliant plan doesn't say "implement invoice export." It says:

1. Add exportInvoiceCsv(userId, range) to billing service.
   Verify: unit tests covering empty data, normal data, unauthorized access.

2. Wire export endpoint in API routes.
   Verify: 403 on missing permissions, valid text/csv response on success.

3. Add download button to billing page.
   Verify: file downloads on click, loading and error states render correctly.

Every task has a file scope and a verification gate. That's what makes it executable instead of aspirational.

Step 3: `test-driven-development`

Tests first. Not as documentation—as behavior contracts:

describe("exportInvoiceCsv", () => {
  it("exports invoices as csv rows", () => {
    const csv = exportInvoiceCsv([
      { id: "inv_001", amount: 1999, currency: "USD" },
      { id: "inv_002", amount: 2999, currency: "USD" },
    ]);

    expect(csv).toContain("id,amount,currency");
    expect(csv).toContain("inv_001,1999,USD");
    expect(csv).toContain("inv_002,2999,USD");
  });
});

Write the failing test. Confirm it fails. Implement the minimum to pass. Confirm it passes. Then refactor. The order matters.

Step 4: `requesting-code-review`

Before merge, the review targets:

Does this match the agreed plan?
Any authorization gaps?
Large dataset edge cases?
Unhandled error states?
Files changed outside the agreed scope?

Step 5: `verification-before-completion`

Depending on project type:

Project Type	Verification Method
Web app	Start dev server, walk the critical path in browser
Backend service	Run tests, type check, hit the endpoint
CLI tool	Run the command, check actual output
iOS app	Test on real device (especially IAP, StoreKit, permissions)
SDK / Library	Unit tests + integration tests + example project

The principle: evidence over claims. "I think it's done" is not verification.

How to Install

Claude Code

/plugin install superpowers@claude-plugins-official

Or via the Superpowers marketplace:

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

Codex CLI

/plugins

Search superpowers, select Install Plugin.

Codex App

Sidebar → Plugins → Coding category → Superpowers → +

When to Use vs. Skip

Not every task needs a full workflow. A typo fix doesn't need a plan. A one-liner doesn't need TDD.

The right mental model is risk-proportional discipline:

Task	Recommended Approach
Typo fix, config lookup	Direct action—just verify the output
Single-file small change	Optional workflow; at minimum verify
Bug with unclear root cause	`systematic-debugging` required
New feature	`brainstorming` + `writing-plans` + TDD
Cross-module refactor	Plan + verification strongly recommended
Pre-merge / pre-deploy	`requesting-code-review` + `verification-before-completion`

Skills should add friction proportional to the blast radius of getting it wrong.

Three Skills to Start With

If you're integrating Superpowers into an existing project, don't try to use everything at once. Start with three:

1. `systematic-debugging`

Tell the agent:

"Use systematic-debugging. Do not modify any code yet. List your root cause hypotheses first, then we'll validate them one by one."

This stops the shotgun-patch reflex before it starts.

2. `writing-plans`

Before any non-trivial feature or change:

"Use writing-plans. Produce an executable plan first. I'll confirm before you implement anything."

This surfaces scope creep before it happens, not after you're reviewing a 500-line diff.

3. `verification-before-completion`

Add this to your project's CLAUDE.md or AGENTS.md:

"Before declaring any task complete, use verification-before-completion. Run tests, verify in browser or device, report exactly what you checked and what the result was."

This closes the gap between "I think it works" and "I confirmed it works."

The Broader Pattern: Startup Superpowers

Startup Superpowers—a companion project that applies the same framework to startup validation—illustrates why this pattern generalizes beyond coding.

It applies the same idea (codify a professional workflow into loadable agent protocols) to hypothesis tracking, competitor research, customer interviews, and MVP scoping. Available slash commands:

Command	Purpose
`/whats-next`	Assess current stage, recommend next action
`/competitors`	Map direct and indirect competitors
`/market-research`	Research customers, pricing, and trends
`/hypotheses`	Write testable hypotheses with evidence tracking
`/interviews`	Design scripts and analyze transcripts
`/surveys`	Design surveys and manage responses
`/mvp`	Design the minimum testable product

Everything is stored as Markdown in a startup/ directory—version-controllable, agent-readable, no SaaS dependency.

That's the actual pattern: take a repeatable professional workflow, encode it as agent steps with defined inputs and outputs, make it loadable in any session, and store all state in files the agent can read and write. The AI doesn't get smarter. The process gets stable.

Summary

Superpowers Skills solves a specific problem: AI coding agents that know how to write code but don't know how to do software development.

The six questions it forces an agent to answer before declaring a task complete:

Did you clarify the requirements before implementing?
Did you make a verifiable plan before writing code?
Did you write tests before the implementation?
Did you find the root cause before patching?
Did you get a review before merging?
Did you actually verify—not just assume—that it works?

Without workflow constraints, developers have to ask these questions themselves, every session, every task. With Superpowers, the constraints are stable, loadable, and consistent across sessions, developers, and projects.

If you're using AI coding agents in real projects today, start with three skills: systematic-debugging, writing-plans, and verification-before-completion. They won't make development magical. They'll make your agent behave like a collaborator with engineering discipline instead of one without it.

Superpowers: github.com/obra/superpowers
Startup Superpowers: github.com/SergeiGorbatiuk/startup-superpowers

DEV Community

How to Make AI Coding Agents Actually Follow Engineering Process

What "Skills" Actually Are

The Key Insight: Process Errors vs. Code Errors

A Real Example: Adding Invoice Export

Step 1: `brainstorming`

Step 2: `writing-plans`

Step 3: `test-driven-development`

Step 4: `requesting-code-review`

Step 5: `verification-before-completion`

How to Install

Claude Code

Codex CLI

Codex App

When to Use vs. Skip

Three Skills to Start With

1. `systematic-debugging`

2. `writing-plans`

3. `verification-before-completion`

The Broader Pattern: Startup Superpowers

Summary

Top comments (0)

What "Skills" Actually Are

The Key Insight: Process Errors vs. Code Errors

A Real Example: Adding Invoice Export

Step 1: brainstorming

Step 2: writing-plans

Step 3: test-driven-development

Step 4: requesting-code-review

Step 5: verification-before-completion

How to Install

Claude Code

Codex CLI

Codex App

When to Use vs. Skip

Three Skills to Start With

1. systematic-debugging

2. writing-plans

3. verification-before-completion

The Broader Pattern: Startup Superpowers

Summary

Step 1: `brainstorming`

Step 2: `writing-plans`

Step 3: `test-driven-development`

Step 4: `requesting-code-review`

Step 5: `verification-before-completion`

1. `systematic-debugging`

2. `writing-plans`

3. `verification-before-completion`