Vitor Norton

Posted on Jun 15

I tried to fork GSD, or: it's for vibe coders, not real devs

#ai #agentskills #productivity #opensource

As a senior engineer I deep dive into spec-driven AI development: the architecture, the maintenance math, a 250-fork graveyard, a rejected GitHub issue, and how I chose between GSD, GitHub Spec Kit, and OpenSpec.

TL;DR

I genuinely like GSD. Its research, planning, and roadmap/phase model are some of the best I've used. It still doesn't fit how I work.

My problem was never a missing setting. It was the category of tool: GSD is an autonomous orchestrator. I'm a senior engineer who wants a human-in-the-loop assistant.

I considered forking it into a "lite" version. The maintenance math says a divergent fork is the worst available option, and a scan of all 250 forks confirms it: not one is a successful start point to me.

I'm switching to GitHub Spec Kit, with OpenSpec as a close runner-up. This is the entire reasoning, including the parts that argue against me.

Loving a tool that doesn't fit you

There's a specific kind of frustration that doesn't get written about much: when a tool is good, you can see the craft in it, and it still fights you on every project.

That's GSD for me. GSD ("Git. Ship. Done.") is a spec-driven, context-engineering framework for AI coding agents. It drives an agent through a disciplined loop (discuss → plan → execute → verify → ship) running the heavy research and planning work in fresh-context subagents so your main session never rots. The phase/roadmap model is excellent. The research step is excellent. I like the ceremony. For me it's a comprehension gate, the thing that forces the agent (and me) to actually understand the work before touching code.

And yet, every time I ran it on a real project, it worked against the way I work. I'm a senior engineer. My job is to be helped, not replaced. GSD is built to take the wheel.

I almost did the obvious engineer thing: fork it, file off the parts I don't like, ship my own "GSD lite". This post is what I found when I actually did the homework first, and why I'm doing something else entirely.

What GSD actually is, under the hood

Before you can reason about forking or replacing a tool, you have to know how it's wired. So I read the repo.

GSD is three layers stacked on top of each other:

Layer	What it is	Do I care about it?	Cost to maintain in a fork
Markdown "brain"	`commands/` → `workflows/` → `agents/` → `references/` → `templates/`: the prompts that drive the whole loop	Yes, this is what I love	High, these are the most-churned files upstream
CJS tooling + installer	`gsd-tools.cjs` • ~104 library modules, plus a ~10,700-line multi-runtime installer and a set of hooks	Almost none	Medium/high, and it's work that gives me nothing
Config	`.planning/config.json`	This is where the cheap wins live	~Zero

Two facts fall out of this that matter for any customization decision:

The workflows aren't standalone prompts. They call gsd-tools constantly, to load context, update state, and commit. So "just lift the good prompts into my own tool" isn't a copy-paste. The prompts assume a 104-module CLI sitting next to them. Lifting the brain means lifting the whole nervous system.

Most of GSD's complexity is infrastructure I'd never touch. The installer alone supports 16+ runtimes (Claude Code, Codex, Gemini CLI, Cursor, Windsurf, Copilot, and more). There are 33 agents, ~67 skills, 11 hooks, a capability registry, and a supply-chain gate against hallucinated packages. It's impressive engineering. It's also 90% surface area I don't need but would inherit the moment I forked.

There's a name for what GSD is, and it's the crux of this entire post: GSD is an autonomous orchestrator. It spawns subagents, commits per task, and advances on its own. The human approves at checkpoints. That's by design, and its made for a different operator than me.

The one hard "no": atomic per-task commits

If I strip my complaints down to the irreducible core, it's this: GSD commits automatically, one commit per task, as it executes. Here's the literal guidance from its git-integration.md:

| Task completed | YES | Atomic unit of work (1 commit per task) |

# How GSD stages and commits during execution
git add src/api/auth.ts src/types/user.ts
git commit -m "feat(08-02): create user registration endpoint"

The workflow I actually want is human-in-the-loop:

the agent does the work, locally, uncommitted
it stops. I validate on my machine
I give the order: now test it, run the checks, do the code review, open the PR
the commit happens under my command, late, not per task, not automatically

This is not a niche preference. The 2026 consensus on agentic workflows says the same thing in plain terms: "code review is the step most agentic workflows quietly drop," and the healthy pattern is "a human reviews the plan before execution begins, not after." GSD's auto-commit removes exactly the control points a senior engineer needs.

Plus, I cannot touch the code. If I change one line, if I do anything, it will just mess the entire commit history and it would spend a few minutes (and lots of tokens) to understand that: “yeah, I can do my one code, thank you”.

So I opened an issue…

I filed issue #745: add a config option to defer all commits during execution, leave the working tree dirty, and let me review the whole phase as one diff before anything lands.

The maintainer's response was fast and unambiguous:

"this is how its designed, not interested in changing the design at this time. the commits are there to protect against loss of context or other. this would be a redesign of how it works, not an enhancement."

I want to be fair here, because this is the part that actually clarified everything: the maintainer is right, from inside GSD's design. If your tool is an autonomous agent that may lose its context window mid-run, then committing every task is a feature, a crash-recovery and context-recovery mechanism. My request wasn't a tweak to that design. It was a request for a different philosophy. We don't disagree about quality. We disagree about who holds the wheel.

But the practical consequence is sharp: I can't upstream my way out of this. The thing I most want to change is precisely what the maintainer defends as the core of the design, and will keep elaborating with every release, rightfully so. Any customization I keep would be fighting an actively-maintained opposite opinion.

The subtler tax: it looks like collaboration, right up until it isn't

Auto-commit is the complaint I can point at. The one that actually cost me is harder to name, because it wears the disguise of good work.

Here's the shape of it. I had a tiny, one-off task on a small personal repo: pull a handful of scattered notes into a single file. Twenty minutes by hand. I ran it through the proper loop anyway: discuss → plan → execute. Roughly two hours later it handed me a polished, completely wrong deliverable: a reusable, unit-tested tool for what was a one-time edit. Something I never asked for, never wanted, and now had to unwind.

The maddening part is where it went wrong. The discuss step is the part I like most, the comprehension gate, the thing that's supposed to catch exactly this. And it didn't. Not because discussion is bad, but because it has a grain: it latches onto one reading of what you said and drives hard in that direction, confidently, past the point where it still makes sense. It keeps asking sharp questions (which feels like rigor) while quietly walking off the edge of the scope you actually gave it. By the time the drift is visible it's already downstream in a plan, then in code, and you're the one paying to reel it back.

And the whole time it looks productive. It's debating. It's planning. It's spinning up research. Every signal says "this is going well," so you trust it, and the bill only arrives at the end, as hours spent generating something that misses the nuance you could never get it to hold. That's the real failure mode of a highly-opinionated orchestrator: the opinions are load-bearing, and when your intent doesn't match them the tool can't bend. It can only drift, eloquently.

And this isn't one unlucky run. On a recent 70-phase project I had to redo or restructure at least 50 of them. The ones that survived only did because I'd front-loaded a lot of time organizing the code and kept each phase small and tightly scoped, and even then, every ~5 phases I had to stop and fix things in the code by hand so the next 5 wouldn't wander off. The review-and-re-prompt loop isn't the exception. It's the steady state. A huge share of my time on these projects goes to catching the tool before it commits to the wrong idea.

The maintenance math I did next

If you can't upstream it, your options are fork, patch, config, or build from scratch. I worked through all four.

Config can turn things off. It can't reshape the flow. GSD's config.json is genuinely deep. You can lean it out a lot:

{
  "mode": "interactive",
  "granularity": "coarse",
  "workflow": {
    "research": false,
    "plan_check": false,
    "verifier": false,
    "discuss_mode": "assumptions"
  },
  "planning": { "commit_docs": false },
  "parallelization": false
}

But no key reshapes the flow. You can't tell config "merge discuss into plan," "change how discussion works," or "commit my way." Reshaping behavior means editing the workflow Markdown, the hottest files in the repo. Which lands you in the maintenance trap.

The fork-vs-patch comparison is less flattering to patches than people think:

	Fork (tracking upstream)	Patch repo
Conflict frequency	Same, breaks when upstream edits your lines	Same
Conflict resolution	`git merge` is 3-way (knows the ancestor) → resolves better	`git apply` is 2-way → rejects more on moved code
Real advantage	"I own the tree"	Legibility: your delta is a small, separate artifact
As a public product	Installs cleanly	Bad UX: two-step install, version-coupled rejects hit your users

The only overlay pattern that survives upstream churn cleanly is additive files marked merge=ours: you only ever add files, never edit existing ones, so a periodic sync never conflicts. That's powerful, but it has a hard limit: it works for things you can add (your config, a launcher, your own skills, a host-side wrapper) and never for behavior edits. My change is a behavior edit. So it's stuck in the expensive lane no matter what.

The conclusion that reframed my whole decision: for a public tool that re-opinionates the flow, a fork is the worst quadrant. It makes you maintain 90% infrastructure you don't want and fight merge conflicts on exactly the files you customized.

The fork graveyard

I didn't want to argue this from theory, so I looked at the evidence: every fork of the repo.

250 forks. Not one is a successful divergent "lite." The breakdown:

The vast majority are stale snapshots carrying an old name the upstream itself used: untouched personal copies, not projects.
A few are README rebrands or translations, one commit of real work.
One is a runtime adapter with zero commits on its main branch despite an ambitious description.
The only serious one is an integration wrapper, and it's the most instructive object in the whole study.

That serious fork belongs to a company embedding GSD into a product. Their entire diff against upstream is this:

added     .gitattributes              (merge=ours rules)
added     scripts/sync-upstream.sh
added     NOTICE / CONTRIBUTING-UPSTREAM.md
modified  README.md                   (their banner only)

Their README states the strategy outright: "GSD itself runs unmodified." They wrap it host-side via the Claude Agent SDK and intercept the approval checkpoints in their own orchestrator. And their .gitattributes is the merge=ours trick in the wild:

# Keep "ours" on merge with upstream so periodic
# sync-upstream.sh runs don't conflict on files we maintain.
NOTICE                    merge=ours
scripts/sync-upstream.sh  merge=ours
.gitattributes            merge=ours

The reframe that changed everything

Here's the thing I wish I'd understood on day one. When you line GSD up against the rest of the field, a split appears that has nothing to do with features:

Autonomous orchestrators (GSD, BMAD): they take the wheel. Spawn subagents, commit on their own, advance on their own. The human approves occasionally.
Spec/planning layers (GitHub Spec Kit, OpenSpec, Agent OS, Taskmaster): they hand you structure and hold the door. They generate spec/plan/tasks, and execution runs through your normal agent, where you're already in the loop and commits are yours by default.

The landscape

Spec-driven development exploded in 2026. A community map tracks 30+ frameworks. Mapped against what I, as a senior engineer, actually care about:

Tool	Category	Commit / HIL	Keeps ideas open?	Research/planning depth	Cost to me
GSD	Autonomous orchestrator	Auto-commit + auto-advance	Locks requirements early	High (4 parallel researchers + plan-check + verifier)	Fork = high
OpenSpec	Spec layer	You drive; no auto-commit	Yes, per-change proposals	Lean	Adopt = low
Spec Kit	Spec layer	Human review each phase	Yes, per-feature	Medium (`/clarify` • `/plan`)	Adopt = low
Agent OS	Standards layer	"Complements, doesn't replace"	Encodes your standards	Low	Adopt = low
BMAD	Multi-agent orchestrator	Approves docs, but heavy	PRD + arch + epics locked	High	Adopt = medium; ceremony enormous
Taskmaster	Task layer	You drive	Incremental	Low	Adopt = low

The two that fit a senior engineer who likes ceremony but wants the wheel are GitHub Spec Kit and OpenSpec. So I went deep on both.

Spec Kit vs OpenSpec, for real

The difference between them is conceptual, not a matter of commands. In one line each:

Spec Kit = "specify the feature, then build it." The unit is the feature. You start from a constitution (project principles). Each feature flows spec → plan → tasks → implement, on its own git branch.
OpenSpec = "propose a change as a delta to a living spec." The unit is the change. There's a specs/ folder (current truth) and a changes/ folder (proposals). Each proposal folds into specs/ only when you apply and archive it.

That's feature-centric vs change-centric, and nearly everything else derives from it.

Axis	GitHub Spec Kit	OpenSpec
Mental model	Spec the feature → build	Delta change over a living spec
Structure	`constitution.md` • `specs/<feature>/{spec,plan,tasks}.md`	`specs/` (truth) + `changes/<name>/{proposal,specs,design,tasks}.md`
Git	Creates a branch per feature (`001-...`)	You manage git; the "change" is the review unit
Ceremony	Higher (constitution, `/clarify`, checklists)	Deliberately lean ("no personas, fluid")
Greenfield vs brownfield	Both, greenfield-leaning	Brownfield-first
Encode "my way"	Constitution every spec inherits	Conventions in agent files (lighter)
Commit behavior	Yours (scaffolds a branch, no per-task auto-commit)	Yours
Ecosystem	GitHub-backed, 105 extensions, 22 presets, 200+ contributors	Fission-AI, ~54k★, healthy but smaller

The same task through both, to feel it, "add dark mode to an existing app":

# Spec Kit
/speckit.specify   → branch 002-dark-mode + spec.md  (you review)
/speckit.plan      → plan.md + research.md            (you review)
/speckit.tasks     → tasks.md
/speckit.implement → builds; you commit

# OpenSpec
/opsx:propose dark mode → changes/add-dark-mode/ (proposal + tasks)  (you review)
/opsx:apply             → implements; you validate locally
/opsx:archive           → the delta folds into specs/

In both, the commit and the PR are mine.

About benchmarks (and why they lie a little)

You'll see a benchmark passed around: one CRM dashboard built three ways: OpenSpec ~12 min, Spec Kit ~90 min, BMAD ~5.5 h. Useful for order-of-magnitude, but it's one task and one reviewer. So I ran my own measurement against something real.

The method is simple and reproducible: read the git commit timestamps. On a real production project,, I took three consecutive GSD phases, each wiring one external integration, and measured wall-clock from the first to the last commit of each phase.

Phase	Span (first → last commit)	Duration	Plans
A	small, single adapter	~31 min	1
B	larger, 4 plans / 3 waves	~44 min	4
C	medium (+ a mid-phase re-scope)	~50 min	~3

Average ~40 minutes per phase. But the revealing part is the breakdown. In the most complete phase, code generation itself was the small part. The bulk was discuss + plan + research + verify. In another, the pure execution was about 3 minutes for three plans. The rest was context-gathering and a scope decision I made.

The takeaway: GSD's wall-clock isn't the machine grinding. It's the ceremony, the part I actually value.

That reframes "is it slow?" entirely. On a rough scale it lands between OpenSpec and BMAD, but who cares, because the time goes where I want it to. The real lesson for choosing a replacement: a tool with similar ceremony (Spec Kit) will feel about the same. A leaner tool (OpenSpec) is faster mostly because it cuts the comprehension gate I like. Speed was never my axis.

The verdict

For my posture (a senior engineer, existing projects, a backlog in triage, someone who likes ceremony as a comprehension gate, whose only hard "no" is atomic per-task commits), I'm going with GitHub Spec Kit. Mapping it to what I actually weighed:

My criterion	Winner	Why
Ceremony as a comprehension gate	Spec Kit	Heavier on purpose, OpenSpec strips it by philosophy
Research + planning depth	Spec Kit	`/clarify` • `/plan` → `research.md`, `data-model.md`, `contracts/`
Branch-per-feature	Spec Kit	Native; OpenSpec leaves git to you
Roadmap / phases	Spec Kit	Not built in, but the extension ecosystem can add whole phases
No atomic commits (my one "no")	Tie	Neither auto-commits per task
Encode my own standards	Spec Kit	`constitution.md` is the native lever
One standard, longevity	Spec Kit	GitHub-backed, MIT, biggest ecosystem
Brownfield + in-tool triage	OpenSpec	But I already triage in Notion, so it's partly redundant for me

OpenSpec is the runner-up, and an excellent tool: leaner, brownfield-first, with a genuinely elegant change-as-delta model. The only thing that should flip my decision is a hands-on test where Spec Kit's ceremony feels like friction instead of comprehension on my existing repos. That's the test I'll run next.

Sources & further reading

GSD: open-gsd/gsd-core · original gsd-build/get-shit-done · issue #745
GitHub Spec Kit · Spec Kit docs
OpenSpec (Fission-AI)
BMAD-METHOD · Agent OS · claude-task-master
Spec-Driven Development: a map of 30+ frameworks (2026)
BMAD vs Spec Kit vs OpenSpec (Reenbit)
9 Best Spec-Driven Tools 2026 (MarkTechPost)

Top comments (1)

Adam Lewis • Jun 17

Agree it's the category and not a missing setting. An autonomous orchestrator takes out the review step, and reading the diff is the part you can't skip now an agent writes a week of code in an afternoon. I keep mine human-in-the-loop for that - plan, approve, let it run, read the diff before it merges. A better planning model doesn't move where the human has to stand.