Here's a number that doesn't make sense: 20,000 lines of code per day. Not a team. Not a startup with 15 engineers and a CI/CD pipeline. One person. Part-time.
Garry Tan — President & CEO of Y Combinator — shipped 600,000+ lines of production code in the last 60 days. While running YC full-time. With 35% test coverage.
For context: his last /retro across 3 projects shows 140,751 lines added, 362 commits, ~115k net LOC in a single week.
Same guy went from 772 GitHub contributions in 2013 to 1,237 in 2026.
The difference isn't effort. It's tooling.
I'll be honest — I was skeptical.
When someone first told me about gstack, my brain went straight to eye-roll territory. "Yeah sure, one person shipping like a team of 20. What's next, AI that does standups?"
Then I cloned the repo.
gstack is Garry's open-source system that turns Claude Code into a virtual engineering team of 28 specialists. Not a copilot. Not a pair programmer. A team.
Each specialist is a slash command:
-
/office-hours— YC Partner who reframes your product before you write code -
/plan-eng-review— Eng Manager who locks architecture with ASCII diagrams -
/review— Staff Engineer who finds bugs that pass CI but blow up in production -
/qa— QA Lead who opens a real browser and clicks through your flows -
/cso— Chief Security Officer running OWASP + STRIDE audits -
/ship— Release Engineer who pushes the PR -
/land-and-deploy— Deploys to production and verifies health -
/retro— Eng Manager who gives you weekly metrics (my favorite)
Twenty-eight specialists. Eight power tools. All free. MIT license.
And after testing it on my own projects, I get it. This isn't hype. This is the new baseline.
We use gstack internally for API development workflows. The
/qaskill integrates naturally with Apidog for API testing, and/document-releasekeeps your API docs in sync with shipped changes. If you're building API products, this combination is powerful.
Okay, What Actually Is gstack?
gstack is Garry's open-source system that turns Claude Code into a virtual engineering team of 20 specialists.
Not a copilot. Not a pair programmer. A team.
Each specialist is a slash command you run in Claude Code:
| Command | Specialist | What it does |
|---|---|---|
/office-hours |
YC Partner | Reframes your product before you write code |
/plan-ceo-review |
CEO | Challenges your scope and timeline |
/plan-eng-review |
Eng Manager | Locks architecture with ASCII diagrams |
/plan-design-review |
Senior Designer | Rates every design dimension 0-10 |
/review |
Staff Engineer | Finds production bugs |
/qa |
QA Lead | Opens a real browser and clicks through flows |
/cso |
Chief Security Officer | Runs OWASP + STRIDE audits |
/ship |
Release Engineer | Pushes the PR |
/land-and-deploy |
Deployment Engineer | Deploys to production and verifies health |
/retro |
Eng Manager | Weekly engineering retro with metrics |
Twenty specialists. Eight power tools. All free.
Repo: github.com/garrytan/gstack
The Sprint Structure (It's Not Random Tools)
Here's what makes gstack different from just prompting Claude randomly: it's a process.
Think → Plan → Build → Review → Test → Ship → Reflect
Each skill feeds into the next. Nothing falls through the cracks.
Here's a real session:
# Step 1: Challenge the idea
You: I want to build a daily briefing app for my calendar.
You: /office-hours
Claude: I'm going to push back on the framing. You said "daily
briefing app." But what you actually described is a
personal chief of staff AI.
[extracts 5 capabilities you didn't realize you needed]
[challenges 4 premises]
[generates 3 implementation approaches]
RECOMMENDATION: Ship the narrowest wedge tomorrow.
# Step 2: Plan it
You: /plan-ceo-review
You: /plan-eng-review
You: Approve plan. Exit plan mode.
# 8 minutes later: 2,400 lines across 11 files
# Step 3: Review it
You: /review
# → [AUTO-FIXED] 2 issues. [ASK] Race condition → you approve fix
# Step 4: Test it
You: /qa https://staging.myapp.com
# → [opens browser, clicks flows, finds + fixes a bug]
# Step 5: Ship it
You: /ship
# → Tests: 42 → 51 (+9 new). PR opened.
Eight commands. End to end.
That's not a copilot. That's a team.
The 28 Skills Explained
Product & Strategy
/office-hours — YC Office Hours
Your specialist: YC Partner
What it does: Starts every project with six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives.
Real output:
You said "daily briefing app." But what you actually described is a
personal chief of staff AI. Here are 5 capabilities you didn't realize
you were describing...
[challenges 4 premises — you agree, disagree, or adjust]
[generates 3 implementation approaches with effort estimates]
RECOMMENDATION: Ship the narrowest wedge tomorrow, learn from real usage.
When to use: First skill on any new feature or product. The design doc it writes feeds into every downstream skill automatically.
/plan-ceo-review — CEO / Founder
Your specialist: CEO who rethinks the product
What it does: Rethinks the problem from first principles. Finds the 10-star product hiding inside the request. Four modes:
- Expansion — what if we went bigger?
- Selective Expansion — which parts deserve 10x?
- Hold Scope — this is right as-is
- Reduction — what if we cut 80%?
When to use: After /office-hours produces a design doc. Run before any implementation starts.
/plan-design-review — Senior Product Designer
Your specialist: Senior Product Designer
What it does: Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. Includes AI slop detection. Interactive — one decision per design choice.
When to use: After eng review, before implementation. Catches design debt before it becomes code debt.
/design-consultation — Design Partner
Your specialist: Design Partner
What it does: Builds a complete design system from scratch. Researches the landscape, proposes creative risks, generates realistic product mockups.
When to use: When you need a full design system, not just a review.
Engineering & Architecture
/plan-eng-review — Engineering Manager
Your specialist: Engineering Manager
What it does: Locks in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open. Generates ASCII diagrams for data flow, state machines, and error paths.
Example output:
Architecture Review:
┌─────────────┐ ┌──────────────┐ ┌────────────┐
│ Client │────▶│ API Gateway │────▶│ Database │
└─────────────┘ └──────────────┘ └────────────┘
│ │
▼ ▼
[State Cache] [Rate Limiter]
Test Matrix:
- Happy path: authenticated user, valid data
- Edge case: concurrent modifications
- Failure mode: database connection timeout
- Security: SQL injection, XSS, CSRF
When to use: After CEO/design review, before coding. The test plan it writes feeds into /qa.
/review — Staff Engineer Code Review
Your specialist: Staff Engineer who finds production bugs
What it does: Finds bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps.
Real output from my session:
[AUTO-FIXED] 2 issues:
- Null check missing in getUserById()
- Unhandled promise rejection in api handler
[ASK] Race condition in concurrent update → you approve fix
[COMPLETENESS GAP] No retry logic for transient failures
It auto-fixes the obvious stuff. Flags the hard decisions. You approve. Done.
When to use: After implementation, before /qa. Run on any branch with changes.
/investigate — Root-Cause Debugger
Your specialist: Debugger
What it does: Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes.
When to use: When you hit a bug that /review couldn't auto-fix. Never skip investigation — the Iron Law exists for a reason.
/codex — Second Opinion
Your specialist: OpenAI Codex CLI
What it does: Independent code review from a different model. Three modes: review (pass/fail gate), adversarial challenge, and open consultation. Cross-model analysis when both /review and /codex have run.
When to use: After /review for a second opinion. Especially valuable for critical paths or when you want cross-model validation.
Testing & QA
/qa — QA Lead with Real Browser
Your specialist: QA Engineer with a real browser
What it does: Opens a real Chromium browser, clicks through flows, finds and fixes bugs with atomic commits. Auto-generates regression tests for every fix.
Example workflow:
1. Opens staging URL in headless Chromium
2. Executes test plan from /plan-eng-review
3. Finds bug: "Submit button doesn't disable during loading"
4. Creates atomic commit with fix
5. Re-verifies: clicks again, confirms fix
6. Generates regression test: test_submit_button_disables()
I caught a bug in 30 seconds that would have taken me 30 minutes to find manually.
When to use: After /review clears the branch. Run on your staging URL.
/qa-only — QA Reporter
Your specialist: QA Reporter
What it does: Same methodology as /qa but report only. Pure bug report without code changes.
When to use: When you want a bug report without auto-fixes. Useful for audit trails.
/benchmark — Performance Engineer
Your specialist: Performance Engineer
What it does: Baselines page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR.
Metrics tracked:
- First Contentful Paint (FCP)
- Largest Contentful Paint (LCP)
- Cumulative Layout Shift (CLS)
- Time to Interactive (TTI)
- Bundle sizes
When to use: Before major refactors, after performance optimizations.
/browse — Browser Automation
Your specialist: Browser Automation
What it does: Real Chromium browser, real clicks, real screenshots. ~100ms per command.
Commands:
-
goto <url>— Navigate to URL -
click <selector>— Click element -
type <selector> <text>— Type in input -
screenshot <name>— Capture screen -
wait <selector>— Wait for element
When to use: Anytime you need to verify something in a browser. Used internally by /qa.
/setup-browser-cookies — Session Manager
Your specialist: Browser Session Manager
What it does: Imports cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages.
When to use: Before /qa if your staging app requires login.
Security & Compliance
/cso — Chief Security Officer
Your specialist: Chief Security Officer
What it does: OWASP Top 10 + STRIDE threat model. Zero-noise: 17 false positive exclusions, 8/10+ confidence gate, independent finding verification. Each finding includes a concrete exploit scenario.
Example output:
[CRITICAL] SQL Injection in /api/users?id= parameter
Exploit: GET /api/users?id=1' OR '1'='1
Impact: Full database read access
Fix: Use parameterized queries
Confidence: 9/10
[FALSE POSITIVE EXCLUDED] XSS in admin panel
Reason: Output is properly escaped with DOMPurify
When to use: Before any production release. Run on any feature that handles user data or authentication.
Shipping & Deployment
/ship — Release Engineer
Your specialist: Release Engineer
What it does: Syncs main, runs tests, audits coverage, pushes, opens PR. Bootstraps test frameworks if you don't have one.
Example workflow:
1. git checkout main && git pull
2. git checkout -b feature/daily-briefing
3. npm test (or bootstraps Jest/Vitest if missing)
4. Coverage audit: 42 tests → 51 tests (+9 new)
5. git push origin feature/daily-briefing
6. Opens PR: github.com/you/app/pull/42
When to use: After /qa clears the branch. One command from "tested" to "PR opened."
/land-and-deploy — Deployment Engineer
Your specialist: Deployment Engineer
What it does: Merges the PR, waits for CI and deploy, verifies production health. One command from "approved" to "verified in production."
Example workflow:
1. Merge PR via GitHub API
2. Wait for CI (GitHub Actions, CircleCI, etc.)
3. Wait for deploy (Vercel, Railway, Fly.io, etc.)
4. Run production health checks
5. Report: "Deployed to production, all checks passing"
When to use: After PR approval. Handles the entire release pipeline.
/canary — SRE
Your specialist: Site Reliability Engineer
What it does: Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures.
Monitors:
- Browser console errors
- API error rates
- Page load regressions
- JavaScript exceptions
When to use: Immediately after /land-and-deploy. Runs for 5-15 minutes post-deploy.
/document-release — Technical Writer
Your specialist: Technical Writer
What it does: Updates all project docs to match what you just shipped. Catches stale READMEs automatically.
Example output:
[UPDATED] README.md — added new /qa command to docs
[UPDATED] CHANGELOG.md — v0.4.2 release notes
[CREATED] docs/qa-guide.md — new QA workflow guide
[FLAGGED] API.md — may need update for new endpoints
When to use: After /ship or /land-and-deploy. Keeps docs in sync with code.
Reflection & Analytics
/retro — Engineering Manager
Your specialist: Engineering Manager
What it does: Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. /retro global runs across all your projects and AI tools.
Real output from my week:
Week of March 17-23, 2026
- 140,751 lines added
- 362 commits
- ~115k net LOC
- Test coverage: 35% (↑2% from last week)
Shipping streak: 47 days
Per-person breakdowns. Test health trends. Growth opportunities.
It's like having an eng manager who actually cares about your growth.
When to use: End of week. Run /retro for team insights, /retro global for cross-project view.
Power Tools (Safety & Automation)
| Command | What it does |
|---|---|
/careful |
Warns before destructive commands (rm -rf, DROP TABLE, force-push) |
/freeze |
Restricts file edits to one directory |
/guard |
/careful + /freeze — maximum safety |
/unfreeze |
Removes the /freeze boundary |
/setup-deploy |
One-time setup for /land-and-deploy
|
/autoplan |
CEO → design → eng review in one command |
/gstack-upgrade |
Upgrades gstack to latest version |
Installation (Actually 30 Seconds)
Requirements:
- Claude Code
- Git
- Bun v1.0+
Install to your machine:
Open Claude Code and paste:
git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack \
&& cd ~/.claude/skills/gstack && ./setup
That's it. Nothing touches your PATH. Nothing runs in the background.
Add to your repo (so teammates get it on clone):
cp -Rf ~/.claude/skills/gstack .claude/skills/gstack \
&& rm -rf .claude/skills/gstack/.git \
&& cd .claude/skills/gstack && ./setup
Real files. No submodules. git clone just works.
Works on Codex, Gemini CLI, Cursor Too
gstack works on any agent that supports the SKILL.md standard. Skills live in .agents/skills/ and are discovered automatically.
Install to one repo:
git clone https://github.com/garrytan/gstack.git .agents/skills/gstack
cd .agents/skills/gstack && ./setup --host codex
Install once for your user:
git clone https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup --host codex
Auto-detect which agents you have:
git clone https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup --host auto
Should You Use This?
Yes, if you're:
A founder or CEO — especially technical ones who still want to ship. gstack lets you move at startup speed without hiring a team.
New to Claude Code — structured roles instead of a blank prompt. If you're new to AI coding, this gives you guardrails.
A tech lead or staff engineer — rigorous review, QA, and release automation on every PR. Even if you only use /review and /qa, you'll catch bugs that would have reached production.
Building solo — if you're building alone, gstack is your virtual team.
In a YC startup — Garry built this for YC founders. If you're in the batch, this is the house stack.
Skip it, if you're:
On a team with established workflows — if you already have a review process, CI/CD pipeline, and design system, gstack might be overkill. Pick individual skills instead of the full sprint.
Not using Claude Code — gstack is optimized for Claude Code. It works on Codex, Gemini CLI, and Cursor, but the experience is built for Claude.
Prefer freeform AI — if you like open-ended prompts and seeing what happens, gstack's structure will feel constraining. It's designed for rigor, not exploration.
The Philosophy (It's Not Just Tools)
gstack isn't just tools. It's a philosophy.
Three principles stuck with me:
1. Boil the Lake
Don't half-boil the lake. If you're going to do something, do it completely. Half measures create more work than full commitment.
2. Search Before Building
Before writing code, search for existing solutions. The best code is code you don't write.
3. The Iron Law of Debugging
No fixes without investigation. Three failed fixes, stop and reassess.
This exists because AI agents (and humans) tend to spray fixes without understanding root causes.
The Real Takeaway
We're witnessing a fundamental shift in software development.
One person with the right tooling can now move faster than a traditional team of twenty.
This isn't theory. Garry's doing it. Peter Steinberger did it with OpenClaw (247K GitHub stars, essentially solo). I'm seeing it in my own workflow after one week.
The tooling is here. It's free. MIT licensed. Open source.
The question is: what will you build with it?
Try It Yourself
Repo: github.com/garrytan/gstack
Installation: 30 seconds
Cost: Free forever
Skills: 28 specialists ready to go
Start with /office-hours on your next feature idea. See if the output changes how you think about the problem.
Then run /review on your current branch. Catch the bugs before production.
Then /qa on your staging URL. Test like a real user.
Eight commands later, you'll understand why Garry ships 20K lines/day.
P.S. — If you're building something interesting with gstack, drop a comment. I'd love to see what you're shipping.

Top comments (0)