DEV Community

Vitor Norton
Vitor Norton

Posted on

I tried to fork GSD, or: it's for vibe coders, not real devs

As a senior engineer I deep dive into spec-driven AI development: the architecture, the maintenance math, a 250-fork graveyard, a rejected GitHub issue, and how I chose between GSD, GitHub Spec Kit, and OpenSpec.

TL;DR

  • I genuinely like GSD. Its research, planning, and roadmap/phase model are some of the best I've used. It still doesn't fit how I work.
  • My problem was never a missing setting. It was the category of tool: GSD is an autonomous orchestrator. I'm a senior engineer who wants a human-in-the-loop assistant.
  • I considered forking it into a "lite" version. The maintenance math says a divergent fork is the worst available option, and a scan of all 250 forks confirms it: not one is a successful start point to me.
  • I'm switching to GitHub Spec Kit, with OpenSpec as a close runner-up. This is the entire reasoning, including the parts that argue against me.

Loving a tool that doesn't fit you

There's a specific kind of frustration that doesn't get written about much: when a tool is good, you can see the craft in it, and it still fights you on every project.

That's GSD for me. GSD ("Git. Ship. Done.") is a spec-driven, context-engineering framework for AI coding agents. It drives an agent through a disciplined loop (discuss → plan → execute → verify → ship) running the heavy research and planning work in fresh-context subagents so your main session never rots. The phase/roadmap model is excellent. The research step is excellent. I like the ceremony. For me it's a comprehension gate, the thing that forces the agent (and me) to actually understand the work before touching code.

And yet, every time I ran it on a real project, it worked against the way I work. I'm a senior engineer. My job is to be helped, not replaced. GSD is built to take the wheel.

I almost did the obvious engineer thing: fork it, file off the parts I don't like, ship my own "GSD lite". This post is what I found when I actually did the homework first, and why I'm doing something else entirely.

What GSD actually is, under the hood

Before you can reason about forking or replacing a tool, you have to know how it's wired. So I read the repo.

GSD is three layers stacked on top of each other:

Layer What it is Do I care about it? Cost to maintain in a fork
Markdown "brain" commands/workflows/agents/references/templates/: the prompts that drive the whole loop Yes, this is what I love High, these are the most-churned files upstream
CJS tooling + installer gsd-tools.cjs • ~104 library modules, plus a ~10,700-line multi-runtime installer and a set of hooks Almost none Medium/high, and it's work that gives me nothing
Config .planning/config.json This is where the cheap wins live ~Zero

Two facts fall out of this that matter for any customization decision:

The workflows aren't standalone prompts. They call gsd-tools constantly, to load context, update state, and commit. So "just lift the good prompts into my own tool" isn't a copy-paste. The prompts assume a 104-module CLI sitting next to them. Lifting the brain means lifting the whole nervous system.

Most of GSD's complexity is infrastructure I'd never touch. The installer alone supports 16+ runtimes (Claude Code, Codex, Gemini CLI, Cursor, Windsurf, Copilot, and more). There are 33 agents, ~67 skills, 11 hooks, a capability registry, and a supply-chain gate against hallucinated packages. It's impressive engineering. It's also 90% surface area I don't need but would inherit the moment I forked.

There's a name for what GSD is, and it's the crux of this entire post: GSD is an autonomous orchestrator. It spawns subagents, commits per task, and advances on its own. The human approves at checkpoints. That's by design, and its made for a different operator than me.


The one hard "no": atomic per-task commits

If I strip my complaints down to the irreducible core, it's this: GSD commits automatically, one commit per task, as it executes. Here's the literal guidance from its git-integration.md:

| Task completed | YES | Atomic unit of work (1 commit per task) |
Enter fullscreen mode Exit fullscreen mode
# How GSD stages and commits during execution
git add src/api/auth.ts src/types/user.ts
git commit -m "feat(08-02): create user registration endpoint"
Enter fullscreen mode Exit fullscreen mode

The workflow I actually want is human-in-the-loop:

  1. the agent does the work, locally, uncommitted
  2. it stops. I validate on my machine
  3. I give the order: now test it, run the checks, do the code review, open the PR
  4. the commit happens under my command, late, not per task, not automatically

This is not a niche preference. The 2026 consensus on agentic workflows says the same thing in plain terms: "code review is the step most agentic workflows quietly drop," and the healthy pattern is "a human reviews the plan before execution begins, not after." GSD's auto-commit removes exactly the control points a senior engineer needs.

Plus, I cannot touch the code. If I change one line, if I do anything, it will just mess the entire commit history and it would spend a few minutes (and lots of tokens) to understand that: “yeah, I can do my one code, thank you”.

So I opened an issue…

I filed issue #745: add a config option to defer all commits during execution, leave the working tree dirty, and let me review the whole phase as one diff before anything lands.

The maintainer's response was fast and unambiguous:

"this is how its designed, not interested in changing the design at this time. the commits are there to protect against loss of context or other. this would be a redesign of how it works, not an enhancement."

I want to be fair here, because this is the part that actually clarified everything: the maintainer is right, from inside GSD's design. If your tool is an autonomous agent that may lose its context window mid-run, then committing every task is a feature, a crash-recovery and context-recovery mechanism. My request wasn't a tweak to that design. It was a request for a different philosophy. We don't disagree about quality. We disagree about who holds the wheel.

But the practical consequence is sharp: I can't upstream my way out of this. The thing I most want to change is precisely what the maintainer defends as the core of the design, and will keep elaborating with every release, rightfully so. Any customization I keep would be fighting an actively-maintained opposite opinion.

The subtler tax: it looks like collaboration, right up until it isn't

Auto-commit is the complaint I can point at. The one that actually cost me is harder to name, because it wears the disguise of good work.

Here's the shape of it. I had a tiny, one-off task on a small personal repo: pull a handful of scattered notes into a single file. Twenty minutes by hand. I ran it through the proper loop anyway: discuss → plan → execute. Roughly two hours later it handed me a polished, completely wrong deliverable: a reusable, unit-tested tool for what was a one-time edit. Something I never asked for, never wanted, and now had to unwind.

The maddening part is where it went wrong. The discuss step is the part I like most, the comprehension gate, the thing that's supposed to catch exactly this. And it didn't. Not because discussion is bad, but because it has a grain: it latches onto one reading of what you said and drives hard in that direction, confidently, past the point where it still makes sense. It keeps asking sharp questions (which feels like rigor) while quietly walking off the edge of the scope you actually gave it. By the time the drift is visible it's already downstream in a plan, then in code, and you're the one paying to reel it back.

And the whole time it looks productive. It's debating. It's planning. It's spinning up research. Every signal says "this is going well," so you trust it, and the bill only arrives at the end, as hours spent generating something that misses the nuance you could never get it to hold. That's the real failure mode of a highly-opinionated orchestrator: the opinions are load-bearing, and when your intent doesn't match them the tool can't bend. It can only drift, eloquently.

And this isn't one unlucky run. On a recent 70-phase project I had to redo or restructure at least 50 of them. The ones that survived only did because I'd front-loaded a lot of time organizing the code and kept each phase small and tightly scoped, and even then, every ~5 phases I had to stop and fix things in the code by hand so the next 5 wouldn't wander off. The review-and-re-prompt loop isn't the exception. It's the steady state. A huge share of my time on these projects goes to catching the tool before it commits to the wrong idea.

The maintenance math I did next

If you can't upstream it, your options are fork, patch, config, or build from scratch. I worked through all four.

Config can turn things off. It can't reshape the flow. GSD's config.json is genuinely deep. You can lean it out a lot:

{
  "mode": "interactive",
  "granularity": "coarse",
  "workflow": {
    "research": false,
    "plan_check": false,
    "verifier": false,
    "discuss_mode": "assumptions"
  },
  "planning": { "commit_docs": false },
  "parallelization": false
}
Enter fullscreen mode Exit fullscreen mode

But no key reshapes the flow. You can't tell config "merge discuss into plan," "change how discussion works," or "commit my way." Reshaping behavior means editing the workflow Markdown, the hottest files in the repo. Which lands you in the maintenance trap.

The fork-vs-patch comparison is less flattering to patches than people think:

Fork (tracking upstream) Patch repo
Conflict frequency Same, breaks when upstream edits your lines Same
Conflict resolution git merge is 3-way (knows the ancestor) → resolves better git apply is 2-way → rejects more on moved code
Real advantage "I own the tree" Legibility: your delta is a small, separate artifact
As a public product Installs cleanly Bad UX: two-step install, version-coupled rejects hit your users

The only overlay pattern that survives upstream churn cleanly is additive files marked merge=ours: you only ever add files, never edit existing ones, so a periodic sync never conflicts. That's powerful, but it has a hard limit: it works for things you can add (your config, a launcher, your own skills, a host-side wrapper) and never for behavior edits. My change is a behavior edit. So it's stuck in the expensive lane no matter what.

The conclusion that reframed my whole decision: for a public tool that re-opinionates the flow, a fork is the worst quadrant. It makes you maintain 90% infrastructure you don't want and fight merge conflicts on exactly the files you customized.

The fork graveyard

I didn't want to argue this from theory, so I looked at the evidence: every fork of the repo.

250 forks. Not one is a successful divergent "lite." The breakdown:

  • The vast majority are stale snapshots carrying an old name the upstream itself used: untouched personal copies, not projects.
  • A few are README rebrands or translations, one commit of real work.
  • One is a runtime adapter with zero commits on its main branch despite an ambitious description.
  • The only serious one is an integration wrapper, and it's the most instructive object in the whole study.

That serious fork belongs to a company embedding GSD into a product. Their entire diff against upstream is this:

added     .gitattributes              (merge=ours rules)
added     scripts/sync-upstream.sh
added     NOTICE / CONTRIBUTING-UPSTREAM.md
modified  README.md                   (their banner only)
Enter fullscreen mode Exit fullscreen mode

Their README states the strategy outright: "GSD itself runs unmodified." They wrap it host-side via the Claude Agent SDK and intercept the approval checkpoints in their own orchestrator. And their .gitattributes is the merge=ours trick in the wild:

# Keep "ours" on merge with upstream so periodic
# sync-upstream.sh runs don't conflict on files we maintain.
NOTICE                    merge=ours
scripts/sync-upstream.sh  merge=ours
.gitattributes            merge=ours
Enter fullscreen mode Exit fullscreen mode

The reframe that changed everything

Here's the thing I wish I'd understood on day one. When you line GSD up against the rest of the field, a split appears that has nothing to do with features:

  • Autonomous orchestrators (GSD, BMAD): they take the wheel. Spawn subagents, commit on their own, advance on their own. The human approves occasionally.
  • Spec/planning layers (GitHub Spec Kit, OpenSpec, Agent OS, Taskmaster): they hand you structure and hold the door. They generate spec/plan/tasks, and execution runs through your normal agent, where you're already in the loop and commits are yours by default.

The landscape

Spec-driven development exploded in 2026. A community map tracks 30+ frameworks. Mapped against what I, as a senior engineer, actually care about:

Tool Category Commit / HIL Keeps ideas open? Research/planning depth Cost to me
GSD Autonomous orchestrator Auto-commit + auto-advance Locks requirements early High (4 parallel researchers + plan-check + verifier) Fork = high
OpenSpec Spec layer You drive; no auto-commit Yes, per-change proposals Lean Adopt = low
Spec Kit Spec layer Human review each phase Yes, per-feature Medium (/clarify/plan) Adopt = low
Agent OS Standards layer "Complements, doesn't replace" Encodes your standards Low Adopt = low
BMAD Multi-agent orchestrator Approves docs, but heavy PRD + arch + epics locked High Adopt = medium; ceremony enormous
Taskmaster Task layer You drive Incremental Low Adopt = low

The two that fit a senior engineer who likes ceremony but wants the wheel are GitHub Spec Kit and OpenSpec. So I went deep on both.

Spec Kit vs OpenSpec, for real

The difference between them is conceptual, not a matter of commands. In one line each:

  • Spec Kit = "specify the feature, then build it." The unit is the feature. You start from a constitution (project principles). Each feature flows spec → plan → tasks → implement, on its own git branch.
  • OpenSpec = "propose a change as a delta to a living spec." The unit is the change. There's a specs/ folder (current truth) and a changes/ folder (proposals). Each proposal folds into specs/ only when you apply and archive it.

That's feature-centric vs change-centric, and nearly everything else derives from it.

Axis GitHub Spec Kit OpenSpec
Mental model Spec the feature → build Delta change over a living spec
Structure constitution.mdspecs/<feature>/{spec,plan,tasks}.md specs/ (truth) + changes/<name>/{proposal,specs,design,tasks}.md
Git Creates a branch per feature (001-...) You manage git; the "change" is the review unit
Ceremony Higher (constitution, /clarify, checklists) Deliberately lean ("no personas, fluid")
Greenfield vs brownfield Both, greenfield-leaning Brownfield-first
Encode "my way" Constitution every spec inherits Conventions in agent files (lighter)
Commit behavior Yours (scaffolds a branch, no per-task auto-commit) Yours
Ecosystem GitHub-backed, 105 extensions, 22 presets, 200+ contributors Fission-AI, ~54k★, healthy but smaller

The same task through both, to feel it, "add dark mode to an existing app":

# Spec Kit
/speckit.specify   → branch 002-dark-mode + spec.md  (you review)
/speckit.plan      → plan.md + research.md            (you review)
/speckit.tasks     → tasks.md
/speckit.implement → builds; you commit

# OpenSpec
/opsx:propose dark mode → changes/add-dark-mode/ (proposal + tasks)  (you review)
/opsx:apply             → implements; you validate locally
/opsx:archive           → the delta folds into specs/
Enter fullscreen mode Exit fullscreen mode

In both, the commit and the PR are mine.

About benchmarks (and why they lie a little)

You'll see a benchmark passed around: one CRM dashboard built three ways: OpenSpec ~12 min, Spec Kit ~90 min, BMAD ~5.5 h. Useful for order-of-magnitude, but it's one task and one reviewer. So I ran my own measurement against something real.

The method is simple and reproducible: read the git commit timestamps. On a real production project,, I took three consecutive GSD phases, each wiring one external integration, and measured wall-clock from the first to the last commit of each phase.

Phase Span (first → last commit) Duration Plans
A small, single adapter ~31 min 1
B larger, 4 plans / 3 waves ~44 min 4
C medium (+ a mid-phase re-scope) ~50 min ~3

Average ~40 minutes per phase. But the revealing part is the breakdown. In the most complete phase, code generation itself was the small part. The bulk was discuss + plan + research + verify. In another, the pure execution was about 3 minutes for three plans. The rest was context-gathering and a scope decision I made.

The takeaway: GSD's wall-clock isn't the machine grinding. It's the ceremony, the part I actually value.

That reframes "is it slow?" entirely. On a rough scale it lands between OpenSpec and BMAD, but who cares, because the time goes where I want it to. The real lesson for choosing a replacement: a tool with similar ceremony (Spec Kit) will feel about the same. A leaner tool (OpenSpec) is faster mostly because it cuts the comprehension gate I like. Speed was never my axis.

The verdict

For my posture (a senior engineer, existing projects, a backlog in triage, someone who likes ceremony as a comprehension gate, whose only hard "no" is atomic per-task commits), I'm going with GitHub Spec Kit. Mapping it to what I actually weighed:

My criterion Winner Why
Ceremony as a comprehension gate Spec Kit Heavier on purpose, OpenSpec strips it by philosophy
Research + planning depth Spec Kit /clarify/planresearch.md, data-model.md, contracts/
Branch-per-feature Spec Kit Native; OpenSpec leaves git to you
Roadmap / phases Spec Kit Not built in, but the extension ecosystem can add whole phases
No atomic commits (my one "no") Tie Neither auto-commits per task
Encode my own standards Spec Kit constitution.md is the native lever
One standard, longevity Spec Kit GitHub-backed, MIT, biggest ecosystem
Brownfield + in-tool triage OpenSpec But I already triage in Notion, so it's partly redundant for me

OpenSpec is the runner-up, and an excellent tool: leaner, brownfield-first, with a genuinely elegant change-as-delta model. The only thing that should flip my decision is a hands-on test where Spec Kit's ceremony feels like friction instead of comprehension on my existing repos. That's the test I'll run next.

Sources & further reading

  • GSD: open-gsd/gsd-core · original gsd-build/get-shit-done · issue #745
  • GitHub Spec Kit · Spec Kit docs
  • OpenSpec (Fission-AI)
  • BMAD-METHOD · Agent OS · claude-task-master
  • Spec-Driven Development: a map of 30+ frameworks (2026)
  • BMAD vs Spec Kit vs OpenSpec (Reenbit)
  • 9 Best Spec-Driven Tools 2026 (MarkTechPost)

Top comments (0)