After using Superpowers, my AI agent finally feels like a real companion

#ai #superpowers #agents #agentskills

I've been doing agentic development for a while, handing real tasks to coding agents and letting them write and refactor while I steer. The failure mode was always the same. I'd describe something at the wrong altitude, the agent would happily fill the gaps with its own assumptions, and we'd both sprint confidently in the wrong direction. The fix was never "a smarter model." It was structure.

That's what Superpowers gives you. It's an open source methodology by Jesse Vincent (MIT, actively maintained, currently on v5.x) that ships as a set of composable skills plus the instructions that make the agent actually use them. It runs on Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot CLI and a few others. After a few weeks of running it on real work, this is my honest feedback, with most of it aimed at the one skill that changed how I operate: brainstorming.

How to install it, and how it works

Github repo:

https://github.com/obra/superpowers

Install is a one liner per agent :
In Claude Code you run

/plugin install superpowers@claude-plugins-official

Gemini CLI uses

gemini extensions install https://github.com/obra/superpowers

Cursor uses

/add-plugin superpowers

and Codex and Copilot CLI pull it from their own marketplaces.

There's no command to remember afterward. A meta skill called using-superpowers hooks the agent's loop: on each message it checks whether a skill applies, and critically, before the agent enters plan mode it asks "have we already brainstormed?" If not, brainstorming fires first. The skills are treated as mandatory workflows, not polite suggestions, and the agent announces which one it's running and tracks each checklist item as a todo.

The full pipeline is what makes the whole thing coherent:

brainstorming turns a rough idea into an approved design and a written spec.
using-git-worktrees spins up an isolated branch and verifies a clean test baseline before any work starts.
writing-plans breaks the approved design into 2 to 5 minute tasks, each with exact file paths, the actual code, and verification steps. The bar is that an enthusiastic junior with no context could execute it.
subagent-driven-development dispatches a fresh subagent per task and runs a two stage review on each one: first spec compliance, then code quality. (executing-plans is the batch alternative with human checkpoints.)
test-driven-development enforces real RED-GREEN-REFACTOR, and it will delete code that was written before its test.
requesting-code-review runs between tasks and blocks progress on critical findings.
finishing-a-development-branch verifies tests and hands you the merge, PR, keep or discard decision, then cleans up the worktree.

The underlying philosophy is explicit: tests first, systematic over ad hoc, complexity reduction as a primary goal, and evidence over claims. None of that is novel to anyone senior. The point is that the agent now follows it without me policing every step.

A closer look at brainstorming

Brainstorming is the front door, and it sets the contract for everything downstream. The moment the agent senses you're building something, it does not touch the keyboard. It explores the existing project context, then runs a Socratic loop: one question at a time, multiple choice when it can manage it, building each question on your last answer. It proposes two or three distinct approaches rather than committing to the first. Then it presents the design in sections short enough to actually read, writes the spec to docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md, commits it, runs a self review for placeholders, contradictions and scope creep, and asks you to review the file before anything proceeds.

Two design decisions stand out to me as an engineer. First, the skill will only ever hand off to writing-plans. It explicitly refuses to jump to frontend-design, mcp-builder or any implementation skill, which is what keeps "let's just scaffold something" from leaking into the design phase. Second, there's an optional visual companion that routes the question to the right surface: mockups, wireframes and architecture comparisons go to a browser, while requirements, tradeoffs and scope decisions stay as text. A UI topic isn't automatically a visual question, and the skill is careful about that distinction.

And underneath all of it sits one rule I found almost confrontational at first, in the best way. No code, no scaffolding, no implementation of any kind until the design is presented and approved. It applies to every project regardless of perceived complexity.

Feedback 1: the hard gate produces a durable artifact

The strongest part is also the simplest. There is a hard gate between intent and implementation, and the agent will not cross it without an explicit yes.

I expected this to be friction. Instead it's the part I'd fight to keep. Most of my rework with agents traced back to ambiguous intent meeting a literal executor. The gate forces the ambiguity out of the conversation and onto the page before a single line is generated. By the time code exists, the assumptions are visible and arguable.

The spec being a committed file, not a chat message, is the detail experts will appreciate. It's a real artifact in the repo with a date and a topic, sitting right next to the code it justifies. Months later it answers "why was this built this way," which is the question that usually has no answer in an agent-built codebase. It has quietly become the most useful documentation I have.

Feedback 2: scope assessment before depth

The second thing I value is subtle. Before the questioning even starts, brainstorming assesses scope. If you hand it something that's really several independent subsystems, it refuses to refine details and instead helps you decompose into sub-projects, each of which gets its own spec, plan and implementation cycle. That single behavior prevents the classic failure of producing one enormous, unbuildable spec, and it mirrors how a careful tech lead would push back on a vague epic.

Combined with the one question at a time loop, the effect on me was the part I didn't anticipate. I now decompose and think in approaches and tradeoffs before I ever open the agent. The skill trained the operator, not just the output.

The honest critique

It isn't free, and a few costs are worth naming for anyone evaluating it seriously.

The gate applies uniformly, so a one line change gets the same ceremony as a new service. For trivial edits the design step reads as paperwork, and there are moments I just want the diff. The subagent-driven model also has a real price: a fresh subagent per task with two stage review trades wall clock time and tokens for fidelity, which is the right trade for substantial features and the wrong one for a quick spike. And the TDD enforcement is genuinely strict. Deleting code written before its test is principled, but it will fight you during exploratory work where you're still discovering the shape of the problem.

The saving grace is that the hierarchy is explicit and sane. Skills override the model's default behavior, but your direct instructions in CLAUDE.md, AGENTS.md or GEMINI.md always win. So if a skill says "always TDD" and your project says don't, your project wins. It's an opinionated default, not a cage. In practice I keep the gate on for anything spanning more than one file and dial it down for throwaway work.

Why I keep it on

What hooked me wasn't speed, it was the collapse in surprises. The code now matches what I pictured because we agreed on the picture, in writing, first. PRs are smaller, the specs exist, and the loop feels less like dictation and more like pairing with someone who refuses to let me hand wave through a design.

The irony is that the most valuable feature is a refusal. An agent that says "not yet, let's think" turned out to be far more useful than one that says "sure" to everything.

If you build with agents and you've felt that gap between what you asked for and what you got, it's worth a weekend. Superpowers is open source by Jesse Vincent and the team at Prime Radiant, and brainstorming is the right place to start.

Have you tried making your agent design before it codes? I'd like to hear whether the think-first gate helped you or got in your way.