DEV Community

Cover image for 203k Stars — How I Finally Made Claude Code, Codex & Cursor Follow the Rules
HIROKI II
HIROKI II

Posted on

203k Stars — How I Finally Made Claude Code, Codex & Cursor Follow the Rules

Cover

Cover

5-min read
Thursday night, 11 PM.
I asked my AI agent to design a dashboard data panel. It said "done."
I ran git diff.
It had modified 12 files. Five of them had nothing to do with the modal component. It rewrote the entire sidebar layout CSS. It deleted a utility file I assumed it wouldn't touch. It introduced style conflicts in three separate places.
I sat in front of my screen for two minutes. Not angry. Something deeper — a bone-tired exhaustion.
I started counting. Over the past three months, my AI agents had built components, tweaked styles, prototyped features — everything. Token bills: roughly ¥4,500.
Time? I didn't want to calculate. But I did anyway.
Each round of "fixing the bug the agent created" averaged 20 minutes. At least ten times a week. Three months in — over a hundred hours.

A hundred hours. Spent fixing code my AI wrote for me.

The problem isn't that agents aren't smart enough

I started reviewing the wreckage and realized it boils down to three root causes.
First: Sprinting without a direction.
You tell an AI agent "design this dashboard panel." It doesn't ask — desktop or mobile? What design system are you using? What's the existing component tree structure? It just starts generating hundreds of lines of code. When it's done, what it built and what you needed are two completely different things.
Second: Scope wandering.
An AI agent is happily working on your task when it decides to "also fix" something else. You ask it to adjust some modal spacing — it refactors your entire design token system. You ask it to add a button — it rewrites your layout module. You ask it to run a component test — it decides your test directory structure is wrong and reorganizes everything.
You can't say it's not trying. But the direction it's trying in is orthogonal to what you asked for.
Third: Premature victory declarations.
The agent says "done." Its output says "task succeeded." You run it — Storybook errors, unit tests failing, dark mode completely ignored. You asked for a modal with a confirmation dialog. The agent wrote it assuming a UI library was already installed. It wasn't.
Imagine hiring a genius programmer. Types faster than anyone, knows every language, produces a week's output in a day. But has three fatal habits — starts coding without confirming requirements, wanders off to refactor unrelated modules mid-task, and declares "done" without ever running the output. Would you let them push to main?

That's what we're doing every day with AI agents.

The core contradiction

AI agents are, at their core, high-throughput code text generators. But software engineering demands low-entropy incremental delivery. These two things are fundamentally in conflict.

The faster an agent writes code, the more frequently you step on landmines. Speed isn't the solution. Speed is an amplifier. It takes every flaw in your existing workflow and cranks it up by 10x.

Why writing longer prompts won't save you

I tried. I really did.
I wrote a 300-line project instruction file. Added constraints. Examples. Explicit prohibitions. Fine-tuned the system prompt.
Did it help? A little. Did it last? No.
Agents periodically relapse. Like a brilliant coworker who simply refuses to listen.
You tell it "don't modify unrelated files." It remembers — for five turns. Then the context window scrolls, and it forgets.
You tell it "write tests first." It nods. Then it writes assert True.
This isn't the agent's fault. You're relying on textual suggestions to enforce something that requires architectural guarantees.
Think of it like putting a "Please don't speed" sign on a highway instead of installing speed cameras. One is a suggestion. The other is an enforcement mechanism. They are not in the same league.
The author of Superpowers clearly hit this wall too.
They didn't create "a better prompt template." They turned software engineering best practices — requirements clarification, workspace isolation, task decomposition, TDD, code review, branch cleanup — into non-skippable steps.

Not suggesting the agent do this. Making it impossible to do anything else.

So how does it actually work?

Imagine opening your AI agent with Superpowers installed.
You say: "Design this dashboard data panel."
The agent stops. It doesn't start writing code. Instead, it fires back a few questions: Desktop or mobile? Which design system? What's the existing component tree? What breakpoints need to be covered?
This is brainstorming — Superpowers' first gate. It forces the agent to confirm three things before writing a single line: scope, existing structure, and expected outcome.
"Sprinting without direction" — blocked at step one.
Okay, scope is clear. Next, the agent says: "Let me set up an isolated workspace."
This is a worktree. Designing the dashboard? Fine — but your main branch is untouched. Your layout module is untouched. Your global styles are untouched. The agent works in a sandbox. "Scope wandering" is physically impossible when it can't even see the other files.
Remember that time the agent rewrote your entire sidebar layout when you just wanted to tweak the nav? The worktree is the solution for that exact disaster.
Then the agent writes a plan. Not "design dashboard" — that's too vague. A plan that reads: "Add dark mode support after line 85 in src/components/Modal/index.tsx, run existing 3 Storybook tests to confirm no regressions."
This is planning — breaking requirements into 2-to-5-minute tasks. Each with a file path, a code snippet, and a validation criterion.
Every task goes to an independent subagent for execution. When it's done, the agent doesn't just say "okay" — there are two gates. Gate one: is it correct? (spec compliance). Gate two: is it good? (code quality).
Then the most uncompromising step — TDD.
Not a suggestion to write tests first. A requirement.
The agent must write a failing test first. It must see the red light. Only then is it allowed to write the code that makes it pass. Green light. Only then can it refactor. Skip the test? The workflow won't let you proceed.
Now you see how "premature victory declarations" get blocked. The agent says "done" — but the TDD red light hasn't even lit up yet. It literally can't say "done."
Then comes code review. Security vulnerabilities? Rejected. Logic defects? Blocked from merging. Style issues? Flagged but not blocked.

Finally — finishing. Not "okay push it." Run full validation, then four options: merge, PR, keep, discard. Your call.

Architecture: how the enforcement actually works

Superpowers isn't one big monorepo. It has four layers.
| Layer | What it does |
|-------|-------------|
| Distribution | Packages skills into different agent platforms. Different AI agents get their corresponding harness. One skill set, multi-platform delivery. |
| Enforcement | The most elegant layer. When the agent starts, project instruction files are injected directly into context. The first thing the agent reads isn't "hello, I'm your assistant" — it's "these are the process rules you must follow." |
| Execution | Brainstorming, TDD, worktree, subagent, code review — all implemented as callable skill files. |
| Verification | Hooks and tests that check whether the skills were actually followed in real agent sessions. |
Traditional skill system = toolbox sitting in the corner. The agent can use it, but it can also ignore it.

Superpowers = toolbox mounted at the entrance, with a sign that says "you can't open the door without taking a tool from here."

The meta-skill loop is an underrated feature. Superpowers includes a skill called writing-skills — it teaches you how to write new Superpowers skills. Think there's a missing "security audit" step? Write it with writing-skills and drop it into the workflow. The framework evolves itself.

But is it perfect?

I'm a Superpowers advocate. But I won't sugarcoat it.
Token consumption goes up 2-3x. Seven steps. Each step requires the agent to process significant context. Brainstorming: hundreds of tokens of conversation. Planning: another few hundred. TDD cycle — red, green, refactor — at least three rounds. What used to cost 1000 tokens now costs 3000. Your API bill doubles or triples. This is a real cost.
Process friction is real. Sometimes you just need to add a single if statement. Three lines of code. Superpowers will take you through brainstorming, planning, subagent, TDD, code review, finishing. You want to hang a picture on the wall — Superpowers hands you a full construction plan: geological survey, structural calculation, building permit, inspection, acceptance.
Is a three-line if statement worth this process? No. But the design philosophy is "rather do too much than too little."
Installation isn't trivial. This isn't npm install and done. You need to understand the four-layer architecture, configure harnesses for each platform, adjust project instruction file priority, ensure hooks trigger correctly. When something breaks, the debugging chain is long.
TDD isn't optional. If TDD isn't your thing, Superpowers will be painful. It's not a toggle — it's a core flow constraint.
PR acceptance rate is punishing. 94% of PRs are rejected. That sounds brutal. But from another angle, it's the price of methodological purity. A "process framework" that accepts too many compromises stops being a process framework and becomes a "suggestion collection."

Not a good fit for: One-off scripts, quick prototype validation, simple conversational tasks, teams without Git/testing habits, severely constrained token budgets.

The alternatives: Amplifier vs. Speckit vs. Superpowers

Superpowers isn't alone in this space. Microsoft Amplifier and GitHub Speckit are working on similar problems.
| Dimension | Superpowers | Microsoft Amplifier | GitHub Speckit |
|-----------|-------------|-------------------|----------------|
| Core focus | Enforcement + TDD framework | Dev assistant framework | Requirements-driven dev |
| Constraint strength | Strong (non-skippable) | Partial | Strong |
| Cross-platform | 11 platforms | Microsoft ecosystem | GitHub ecosystem |
| TDD mandatory | Yes — core flow | Suggested, not required | Not involved |
| Community | 203k stars | Smaller | Smaller |
| Install complexity | Medium-high | Low | Medium |
| Meta-skill loop | Yes | No | No |
| Token overhead | 2-3x | ~1.5x | 1.5-2x |

One-liner: Amplifier is a Microsoft-ecosystem dev assistant. Speckit is a GitHub-ecosystem requirements-driven tool. Superpowers is (currently) the only cross-platform framework willing to bake TDD and code review directly into the agent's execution path.

Is it worth it?

My answer is layered.
Install it now if:
You use AI agents daily for real projects — not tinkering. You've been burned by agents that "write fast but break things" — genuinely hurt by it. You believe in TDD, code review, and branch discipline. Your project has medium or higher complexity. You can stomach a 2-3x token bill in exchange for not having to personally review every single line of generated code.
Don't install it yet if:
You only occasionally ask AI to write small scripts. Your token budget is tight. You don't already have Git and testing habits — build those foundations first. You see AI as a "faster typist."

Final advice: If you experience "the agent wrote it but now I have to fix it" at least twice a week — invest the time to set up Superpowers. The cost is real. But the cost of not having it might be higher.

I installed it. Day three.
The AI agent asked me, before designing a component: "Can you confirm the scope is limited to modifying the modal component only, and does not involve global layout changes?"
I stared at that line for five seconds.

Not because it was smart. But because, finally — under constraint — it started working like an actual engineer.

Published: 2026-06-26 · Cover by KD Agentic

Top comments (0)