DEV Community

mufeng
mufeng

Posted on

How to Make AI Coding Agents Actually Follow Engineering Process

Engineering Process
The problem isn't that AI coding agents write bad code.

The problem is that they skip steps.

Ask an agent to fix a bug—it reads a few files, guesses a cause, patches the code. Ask it to add a feature—it starts writing before anyone's agreed on what the feature actually does. Ask it to refactor—it touches unrelated files, reformats half the codebase, and hands you a diff too large to review.

None of this is stupidity. It's the absence of process discipline.

Software development has always required workflow constraints: clarify before implementing, plan before coding, test before shipping, debug root causes not symptoms, verify before declaring done. The question is whether your AI agent follows them—or bypasses them entirely.

Superpowers is a plugin framework for Claude Code and Codex that encodes those constraints as loadable, composable agent workflows. This is what it is, when to use it, and how to get started.


What "Skills" Actually Are

The word "skill" is overloaded in AI contexts. Here it means something specific: a workflow protocol that loads into an agent session and constrains how the agent approaches a category of task.

Not "be more careful." Not a style guide. A specific sequence of steps with defined inputs, outputs, and verification gates.

The analogy is a checklist for a surgeon or a pilot—not because either lacks expertise, but because cognitive discipline under pressure requires procedural anchors.

The core Superpowers Skills cover the major failure modes in AI-assisted development:

Skill Failure Mode It Prevents What It Produces
brainstorming Implementing the wrong thing Clarified scope with edge cases surfaced
writing-plans Drifting mid-implementation Executable task list: file scope + verification per step
test-driven-development "Works on my machine" guesswork RED-GREEN-REFACTOR cycles that lock behavior first
systematic-debugging Shotgun-patching symptoms Root cause hypotheses, evidence-based elimination, minimal fix
verification-before-completion "Should be done" claims Actual test runs, browser paths, or device checks
requesting-code-review Merging unreviewed code Severity-ranked risk list before merge
using-git-worktrees Task bleed across workstreams Isolated workspaces with clean baseline

These aren't independent tips—they chain into a complete development pipeline:

Vague requirement
  → brainstorming  (scope + edge cases)
  → writing-plans  (executable task list)
  → test-driven-development  (behavior locked by tests)
  → requesting-code-review  (risks surfaced)
  → verification-before-completion  (actually verified)
Enter fullscreen mode Exit fullscreen mode

The Key Insight: Process Errors vs. Code Errors

AI agents will get better at writing correct code over time. They won't automatically get better at following process—unless process is encoded somewhere.

The bugs Superpowers Skills prevents aren't syntax errors or logic bugs. They're:

  • Building the wrong feature because nobody asked the right clarifying questions
  • Writing code that "looks complete" but has zero coverage on the edge cases that matter
  • Patching a symptom while the root cause persists
  • Refactoring that expands scope until the diff is unmergeable
  • Shipping because the agent said "done" without running anything

A more capable model doesn't fix these. A faster agent arguably makes them worse—more code written in the wrong direction before anyone catches it.


A Real Example: Adding Invoice Export

Imagine you tell an agent: "Add a billing export feature."

Without workflow constraints, it will probably find the billing service, write an endpoint, add a download button, and report completion. Whether that implementation handles empty data, unauthorized requests, large datasets, or export format edge cases depends entirely on whether the model guessed right.

With Superpowers Skills, the flow looks like this:

Step 1: brainstorming

Before touching any files, the agent surfaces questions:

  • Export format: PDF, CSV, or Excel?
  • Date range limits?
  • Permission checks required?
  • Sync download or async background job?
  • What does the user see on failure?

This isn't bureaucracy. This is the list of decisions that will otherwise get made silently—by the model, in the wrong direction.

Step 2: writing-plans

A compliant plan doesn't say "implement invoice export." It says:

1. Add exportInvoiceCsv(userId, range) to billing service.
   Verify: unit tests covering empty data, normal data, unauthorized access.

2. Wire export endpoint in API routes.
   Verify: 403 on missing permissions, valid text/csv response on success.

3. Add download button to billing page.
   Verify: file downloads on click, loading and error states render correctly.
Enter fullscreen mode Exit fullscreen mode

Every task has a file scope and a verification gate. That's what makes it executable instead of aspirational.

Step 3: test-driven-development

Tests first. Not as documentation—as behavior contracts:

describe("exportInvoiceCsv", () => {
  it("exports invoices as csv rows", () => {
    const csv = exportInvoiceCsv([
      { id: "inv_001", amount: 1999, currency: "USD" },
      { id: "inv_002", amount: 2999, currency: "USD" },
    ]);

    expect(csv).toContain("id,amount,currency");
    expect(csv).toContain("inv_001,1999,USD");
    expect(csv).toContain("inv_002,2999,USD");
  });
});
Enter fullscreen mode Exit fullscreen mode

Write the failing test. Confirm it fails. Implement the minimum to pass. Confirm it passes. Then refactor. The order matters.

Step 4: requesting-code-review

Before merge, the review targets:

  • Does this match the agreed plan?
  • Any authorization gaps?
  • Large dataset edge cases?
  • Unhandled error states?
  • Files changed outside the agreed scope?

Step 5: verification-before-completion

Depending on project type:

Project Type Verification Method
Web app Start dev server, walk the critical path in browser
Backend service Run tests, type check, hit the endpoint
CLI tool Run the command, check actual output
iOS app Test on real device (especially IAP, StoreKit, permissions)
SDK / Library Unit tests + integration tests + example project

The principle: evidence over claims. "I think it's done" is not verification.


How to Install

Claude Code

/plugin install superpowers@claude-plugins-official
Enter fullscreen mode Exit fullscreen mode

Or via the Superpowers marketplace:

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace
Enter fullscreen mode Exit fullscreen mode

Codex CLI

/plugins
Enter fullscreen mode Exit fullscreen mode

Search superpowers, select Install Plugin.

Codex App

Sidebar → Plugins → Coding category → Superpowers → +


When to Use vs. Skip

Not every task needs a full workflow. A typo fix doesn't need a plan. A one-liner doesn't need TDD.

The right mental model is risk-proportional discipline:

Task Recommended Approach
Typo fix, config lookup Direct action—just verify the output
Single-file small change Optional workflow; at minimum verify
Bug with unclear root cause systematic-debugging required
New feature brainstorming + writing-plans + TDD
Cross-module refactor Plan + verification strongly recommended
Pre-merge / pre-deploy requesting-code-review + verification-before-completion

Skills should add friction proportional to the blast radius of getting it wrong.


Three Skills to Start With

If you're integrating Superpowers into an existing project, don't try to use everything at once. Start with three:

1. systematic-debugging

Tell the agent:

"Use systematic-debugging. Do not modify any code yet. List your root cause hypotheses first, then we'll validate them one by one."

This stops the shotgun-patch reflex before it starts.

2. writing-plans

Before any non-trivial feature or change:

"Use writing-plans. Produce an executable plan first. I'll confirm before you implement anything."

This surfaces scope creep before it happens, not after you're reviewing a 500-line diff.

3. verification-before-completion

Add this to your project's CLAUDE.md or AGENTS.md:

"Before declaring any task complete, use verification-before-completion. Run tests, verify in browser or device, report exactly what you checked and what the result was."

This closes the gap between "I think it works" and "I confirmed it works."


The Broader Pattern: Startup Superpowers

Startup Superpowers—a companion project that applies the same framework to startup validation—illustrates why this pattern generalizes beyond coding.

It applies the same idea (codify a professional workflow into loadable agent protocols) to hypothesis tracking, competitor research, customer interviews, and MVP scoping. Available slash commands:

Command Purpose
/whats-next Assess current stage, recommend next action
/competitors Map direct and indirect competitors
/market-research Research customers, pricing, and trends
/hypotheses Write testable hypotheses with evidence tracking
/interviews Design scripts and analyze transcripts
/surveys Design surveys and manage responses
/mvp Design the minimum testable product

Everything is stored as Markdown in a startup/ directory—version-controllable, agent-readable, no SaaS dependency.

That's the actual pattern: take a repeatable professional workflow, encode it as agent steps with defined inputs and outputs, make it loadable in any session, and store all state in files the agent can read and write. The AI doesn't get smarter. The process gets stable.


Summary

Superpowers Skills solves a specific problem: AI coding agents that know how to write code but don't know how to do software development.

The six questions it forces an agent to answer before declaring a task complete:

  1. Did you clarify the requirements before implementing?
  2. Did you make a verifiable plan before writing code?
  3. Did you write tests before the implementation?
  4. Did you find the root cause before patching?
  5. Did you get a review before merging?
  6. Did you actually verify—not just assume—that it works?

Without workflow constraints, developers have to ask these questions themselves, every session, every task. With Superpowers, the constraints are stable, loadable, and consistent across sessions, developers, and projects.

If you're using AI coding agents in real projects today, start with three skills: systematic-debugging, writing-plans, and verification-before-completion. They won't make development magical. They'll make your agent behave like a collaborator with engineering discipline instead of one without it.


Superpowers: github.com/obra/superpowers
Startup Superpowers: github.com/SergeiGorbatiuk/startup-superpowers

Top comments (0)