Yaohua Chen for ImagineX

Posted on Apr 6

A Claude Code Skills Stack: How to Combine Superpowers, gstack, and GSD Without the Chaos

#ai #sde #claude

One article to compare the frameworks, see where they overlap, and land on a stable three-layer practice.

Introduction

Claude Code has quickly become one of the most widely adopted AI coding tools. Individual developers, startups, and large engineering teams alike have integrated it into their daily workflows—writing production code, reviewing pull requests, debugging, and shipping features at a pace that was hard to imagine a year ago. As usage has scaled, so has the ecosystem around it. Claude Skills—composable, auto-invoked instruction sets that shape how the agent plans, builds, and verifies—have emerged as one of the most important extension points in Claude Code. They let you go beyond one-off prompts and encode repeatable workflows directly into the agent's behavior. In fact, Anthropic has doubled down on this direction: the latest version of Claude Code consolidates the previously separate "slash commands" and "skills" systems into a single, unified skills format, signaling that skills are now the canonical way to extend the agent.

With Skills now central to the experience, the community has rallied around a handful of open-source frameworks that package best practices into ready-made skill sets. The two most discussed stacks are Superpowers and gstack. Installing both sounds easy; in practice they can conflict, and piling frameworks on without a plan often makes the setup less stable, not more. So where do they differ, and how should you choose?

This post does three things:

Compare Superpowers and gstack on repos, features, and philosophy—the material below on stars, skill lists, and trade-offs.
Add a third layer many guides skip: GSD as a context / spec stabilizer so long-running work does not drift (informed by Tricia Notes Editorial’s three-layer framing).
End with a single playbook: who owns decision, context, and execution, and how to cherry-pick skills without blowing up token use or cognitive load.

The useful question is not only “Superpowers or gstack?” but: what are you missing—decision-making, durable context, or execution?

In one line: gstack thinks, GSD stabilizes, Superpowers executes.

Orientation: Three Layers, Not Only Two

What stays stable in practice is often not picking one framework over another, but a three-way division of labor.

Layer	Stack	Role
Decision / roles	gstack	Judgment from CEO, design, architecture, QA-style lenses—not only “how to code.”
Context / spec	GSD	Keeps spec, status, boundaries, and long-horizon context from rotting.
Execution	Superpowers	Requirement clarification → plan → TDD → acceptance as a closed loop.

How each is “strong”:

Superpowers — How work gets done; smooth execution loop.
gstack — What to do and whether it should be done; richer role-based judgment.
GSD — Not drifting; steadier specs and context over long chains.

Both Superpowers and gstack have gone viral. On the surface they add process to AI; in use, they help you think clearly about what matters. When the model codes fast, that is exactly when you need clear requirements and stable context—that is what most people still overlook.

Superpowers vs gstack: Quick Facts

Superpowers (GitHub ~137K stars)

Repository: obra/superpowers
An Agent Skills framework and software development methodology: 14 built-in skills across brainstorming, planning, TDD, execution, and verification.

gstack (GitHub ~65K stars)

Repository: garrytan/gstack
From YC CEO Garry Tan, open source.
Philosophy: a team beside you—CEO, designer, eng manager, release manager, doc engineer, QA, and more—23 opinionated tools (product thinking, CEO review, architecture review, real browser testing, design review, security audits, etc.).
Garry has claimed 600K+ lines of production code (35% tests) in 60 days, part-time while running YC full-time.

Stars are a weak proxy: high star count does not mean every skill fits your workflow.

Feature Comparison (Superpowers vs gstack)

Category	Superpowers	gstack
Product brainstorming	brainstorming	/office-hours, /plan-ceo-review
Architecture planning	writing-plans	/plan-eng-review, /autoplan
Design	—	/design-consultation, /plan-design-review, /design-shotgun, /design-html
Development execution	executing-plans, subagent-driven-development, dispatching-parallel-agents	—
Testing	test-driven-development	/qa, /qa-only
Debugging	systematic-debugging	/investigate
Code review	requesting-code-review, receiving-code-review	/review, /codex
Verification & acceptance	verification-before-completion, finishing-a-development-branch	/ship, /land-and-deploy, /canary, /document-release
Security	—	/cso, /careful, /freeze, /guard, /unfreeze
Observability	—	/learn, /retro
Browser testing	—	/browse, /connect-chrome, /setup-browser-cookies
Git worktrees	using-git-worktrees	—
Skill management	using-superpowers, writing-skills	/gstack-upgrade
Performance	—	/benchmark
Deployment	—	/setup-deploy

Coverage differs a lot; quantity is not the point—design philosophy is.

Design Philosophy: “How” vs “What” (and Where GSD Fits)

Superpowers — focused on how code gets built

The workflow centers on high-quality output: clarify, plan, TDD (tests before implementation), verify. Checkpoints at each step—little room to skip. In practice it feels disciplined: you ask for X, it tends to build X. Engineers who already know what to build often find that empowering.

(Execution-layer detail from hands-on use: strong process and steady execution; small tasks can still feel **heavy* because the full rhythm applies even to tiny asks.)*

gstack — focused on what and what not to do

Before heavy coding, flows like /office-hours walk requirements; CEO and engineering reviews stress-test the approach. It is not only code—it can run real browser tests from a user angle. Rough split:

Decision layer: /office-hours, /plan-ceo-review, /plan-eng-review
Execution layer: /review, /qa, /ship, etc.

gstack shines when requirements are still fuzzy—PMs, indies, or “think while building.” Caveat: turning all roles on can feel bloated; decision skills also burn serious tokens (see below).

GSD — context / spec, not another “team chart”

GSD is not “install another team.” It is context engineering: goals, specs, status, boundaries, and summaries anchored so context rot slows down. Short demos hide this; long projects show it—when context wobbles, output scatters; that is state, not only “bad execution.”

gstack thinks but is not, by itself, a long-term context vault.
Superpowers executes but is not, by itself, a spec/context system.
GSD fills that gap so chains stay coherent.

Three-Way Comparison (Problems, Not “Who Wins”)

Dimension	Superpowers	gstack	GSD
Core question	How to get things done	What to do; whether it should	How to keep the project from diverging
Layer	Execution	Decision / roles	Context / spec
Strongest fit	Planning, TDD, acceptance loop	Multi-perspective judgment, review, QA	Context engineering; stable state
Best for	Clear requirements	Think-while-building	Long chains / many iterations
Common pain	Front-loaded process can feel heavy (details below)	Bloated and token-hungry when fully enabled (details below)	Little standalone “shipping” value on its own (details below)
Role	Own execution	Own decision-making	Own long-term context

Common Pain Points in Detail

Superpowers — front-loaded process can feel heavy. Every task, no matter how small, runs through the full cycle: clarify requirements, draft a plan, write tests first, then implement, then verify. For a large feature this rhythm pays off handsomely. For a two-line config fix or a quick copy change, the same ceremony kicks in and you end up spending more time on process than on the actual change. The overhead does not scale down with task size, so small requests can feel disproportionately slow.

gstack — bloated and token-hungry when fully enabled. Each gstack role (CEO, designer, architect, QA, etc.) injects its own perspective and prompts into the context. Turn them all on and a single execution-layer skill can consume 10K+ tokens before any real code is written. Daily usage burns through tokens fast, and the back-and-forth between multiple “virtual team members” can make even straightforward tasks feel sluggish and redundant. You may also encounter irrelevant meta-questions (e.g. “Are you applying to become a YC company?”) while your codebase is being scanned—artifacts of the framework’s opinionated persona layer.

GSD — little standalone “shipping” value. GSD excels at keeping specs, goals, and state anchored across long sessions. But if you use it alone, it does not directly produce code, run tests, or open a PR. It is a stabilizer, not a builder. Without an execution layer (Superpowers) or a decision layer (gstack) alongside it, GSD manages context that nothing acts on—useful plumbing, but no visible output. Its value only becomes apparent when paired with tools that actually ship work.

Practical takeaway: they are complements, not substitutes—Superpowers executes, gstack decides, GSD stabilizes specs and context over time.

Strengths, Weaknesses, and Friction

Superpowers

Strengths: Brainstorming and overall workflow feel solid; full process even on small asks can become smooth once habitual; execution and TDD are strong.
Weaknesses: Weaker spots are often early decision skills (e.g. planning/brainstorming) compared to gstack’s decision layer—hence many people pair gstack’s front end with Superpowers’ execution.

gstack

Strengths: Decision layer—/office-hours, /plan-ceo-review, /plan-eng-review—stand out for positioning and approach review.
Weaknesses: Execution feels rougher vs Superpowers; token cost is real—a single execution-layer skill can cost 10K+ tokens, and heavy scans can feel like noisy “process” rather than help.

The analogy

Superpowers is a scalpel — precise and efficient.

gstack is a full clinic — from diagnosis to aftercare.

Use the metaphor to choose depth: narrow execution vs full-spectrum product and review.

Consolidated Best Practices

1. Choose skills deliberately—do not install everything

Skill counts spiral easily (Superpowers today, gstack tomorrow, another stack next week). Selective deployment beats volume; random invocation feels unstable and inflates surface-level “skill count” without clarity.

Underlying idea: both stacks are experiments in Harness Engineering. The mindset is leverage strengths, cover weaknesses—not “I want it all.”

2. Decision vs execution (the classic split)—then add context when needed

gstack for the decision layer (cherry-picked):

Prioritize high-value flows: e.g. /office-hours, /plan-ceo-review, /plan-eng-review for requirements and alignment—avoid over-investing in every role.

Superpowers for the execution layer:

Prefer Superpowers as the base for TDD, plans-as-executed, verification—optionally de-emphasize its own heavy decision skills if gstack already covers that phase, so small tasks do not inherit double process.

GSD when the chain diverges:

If work spreads across sessions and threads, add GSD so spec and state stay anchored—not for flash, for anti-drift.

3. Stable workflow (three steps)

Decision → gstack — Start with /office-hours to stress-test the idea, then run /plan-ceo-review for a founder-level sanity check and /plan-eng-review to lock architecture and data flow. If design matters, add /plan-design-review. The goal: decide what to build and whether to build it before touching code.
Context → GSD — Once the decision is made, use GSD (v2) to anchor the plan: PROJECT.md for what the project is, DECISIONS.md for architectural choices, KNOWLEDGE.md for cross-session rules and patterns, and milestone roadmaps (M001-ROADMAP.md) for sliced execution. These v2 artifacts keep spec, status, and boundaries stable so context does not rot between sessions. (The original GSD uses REQUIREMENTS.md, ROADMAP.md, and STATE.md instead.)
Execution → Superpowers — With clear requirements and stable context in place, hand off to Superpowers’ execution loop: brainstorming (if lightweight refinement is still needed), writing-plans → executing-plans for implementation, test-driven-development for the RED-GREEN-REFACTOR cycle, requesting-code-review / receiving-code-review for review, and verification-before-completion → finishing-a-development-branch to close the loop. For parallel work, use dispatching-parallel-agents or subagent-driven-development.

Merged tagline: gstack handles thinking, Superpowers handles doing, GSD keeps long context honest. Combining the strong decision slice of gstack with Superpowers’ execution (and GSD when needed) keeps skill count and collisions under control—similar to the author’s experience building a small tool on a weekend with a curated mix.

4. Final heuristics

Requirements still fuzzy → start with gstack (decision).
Work keeps diverging across the chain → add GSD (context).
You want execution steady and closed-loop → lean on Superpowers (execution).

Stop asking only: “Superpowers or gstack?” Ask: Am I missing decision, context, or execution?

Closing:

Skills are not stronger because you install more—they are stronger when you combine the right pieces for the gap you actually have and understand what each layer does, then assemble a workflow that is yours.

References

Superpowers — github.com/obra/superpowers
gstack — github.com/garrytan/gstack
GSD (Get Shit Done) — github.com/gsd-build/get-shit-done (original) | github.com/gsd-build/gsd-2 (v2, standalone CLI)

Top comments (2)

Jared Sisk • May 21

Thanks. I've read several developer posts like this on how they ideally can be used for the different layers (decision, context, execution layer), but isn't there chaos in the model auto-invocation for overlapping skills? Each has auto-invoked skills for each of those steps, despite their different strengths in the steps, and can conflict with each other or a bit randomly be selected by the agents based on the keyword similarities.

Yaohua Chen ImagineX • May 21 • Edited

Simply speaking, yes, if the descriptions/definitions for skills have a lot of overlapping. That is why you need to carefully look at the skill's description to understand what it is, when it should be used, and when it should not be used. A good skill should follow the best practices to have a very clear & structured description and details on the definition, purpose, conditions to invoke, the workflow, tools to use, etc.