DEV Community

Cover image for This Free Tool Lets You Ship Like a 20-Person Team
Ash Inno
Ash Inno

Posted on • Originally published at apidog.com

This Free Tool Lets You Ship Like a 20-Person Team

Here's a number that doesn't make sense: 20,000 lines of code per day. Not a team. Not a startup with 15 engineers and a CI/CD pipeline. One person. Part-time.

Garry Tan — President & CEO of Y Combinator — shipped 600,000+ lines of production code in the last 60 days. While running YC full-time. With 35% test coverage.

For context: his last /retro across 3 projects shows 140,751 lines added, 362 commits, ~115k net LOC in a single week.

Same guy went from 772 GitHub contributions in 2013 to 1,237 in 2026.

The difference isn't effort. It's tooling.

I'll be honest — I was skeptical.

When someone first told me about gstack, my brain went straight to eye-roll territory. "Yeah sure, one person shipping like a team of 20. What's next, AI that does standups?"

Then I cloned the repo.

gstack is Garry's open-source system that turns Claude Code into a virtual engineering team of 28 specialists. Not a copilot. Not a pair programmer. A team.

Each specialist is a slash command:

  • /office-hours — YC Partner who reframes your product before you write code
  • /plan-eng-review — Eng Manager who locks architecture with ASCII diagrams
  • /review — Staff Engineer who finds bugs that pass CI but blow up in production
  • /qa — QA Lead who opens a real browser and clicks through your flows
  • /cso — Chief Security Officer running OWASP + STRIDE audits
  • /ship — Release Engineer who pushes the PR
  • /land-and-deploy — Deploys to production and verifies health
  • /retro — Eng Manager who gives you weekly metrics (my favorite)

Twenty-eight specialists. Eight power tools. All free. MIT license.

And after testing it on my own projects, I get it. This isn't hype. This is the new baseline.

We use gstack internally for API development workflows. The /qa skill integrates naturally with Apidog for API testing, and /document-release keeps your API docs in sync with shipped changes. If you're building API products, this combination is powerful.

Apidog Implement REAL API Design-first

Okay, What Actually Is gstack?

gstack is Garry's open-source system that turns Claude Code into a virtual engineering team of 20 specialists.

Not a copilot. Not a pair programmer. A team.

Each specialist is a slash command you run in Claude Code:

Command Specialist What it does
/office-hours YC Partner Reframes your product before you write code
/plan-ceo-review CEO Challenges your scope and timeline
/plan-eng-review Eng Manager Locks architecture with ASCII diagrams
/plan-design-review Senior Designer Rates every design dimension 0-10
/review Staff Engineer Finds production bugs
/qa QA Lead Opens a real browser and clicks through flows
/cso Chief Security Officer Runs OWASP + STRIDE audits
/ship Release Engineer Pushes the PR
/land-and-deploy Deployment Engineer Deploys to production and verifies health
/retro Eng Manager Weekly engineering retro with metrics

Twenty specialists. Eight power tools. All free.

Repo: github.com/garrytan/gstack

The Sprint Structure (It's Not Random Tools)

Here's what makes gstack different from just prompting Claude randomly: it's a process.

Think → Plan → Build → Review → Test → Ship → Reflect
Enter fullscreen mode Exit fullscreen mode

Each skill feeds into the next. Nothing falls through the cracks.

Here's a real session:

# Step 1: Challenge the idea
You: I want to build a daily briefing app for my calendar.
You: /office-hours

Claude: I'm going to push back on the framing. You said "daily
        briefing app." But what you actually described is a
        personal chief of staff AI.

        [extracts 5 capabilities you didn't realize you needed]
        [challenges 4 premises]
        [generates 3 implementation approaches]

        RECOMMENDATION: Ship the narrowest wedge tomorrow.

# Step 2: Plan it
You: /plan-ceo-review
You: /plan-eng-review
You: Approve plan. Exit plan mode.

# 8 minutes later: 2,400 lines across 11 files

# Step 3: Review it
You: /review
# → [AUTO-FIXED] 2 issues. [ASK] Race condition → you approve fix

# Step 4: Test it
You: /qa https://staging.myapp.com
# → [opens browser, clicks flows, finds + fixes a bug]

# Step 5: Ship it
You: /ship
# → Tests: 42 → 51 (+9 new). PR opened.
Enter fullscreen mode Exit fullscreen mode

Eight commands. End to end.

That's not a copilot. That's a team.

The 28 Skills Explained

Product & Strategy

/office-hours — YC Office Hours

Your specialist: YC Partner

What it does: Starts every project with six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives.

Real output:

You said "daily briefing app." But what you actually described is a
personal chief of staff AI. Here are 5 capabilities you didn't realize
you were describing...

[challenges 4 premises — you agree, disagree, or adjust]
[generates 3 implementation approaches with effort estimates]

RECOMMENDATION: Ship the narrowest wedge tomorrow, learn from real usage.
Enter fullscreen mode Exit fullscreen mode

When to use: First skill on any new feature or product. The design doc it writes feeds into every downstream skill automatically.

/plan-ceo-review — CEO / Founder

Your specialist: CEO who rethinks the product

What it does: Rethinks the problem from first principles. Finds the 10-star product hiding inside the request. Four modes:

  • Expansion — what if we went bigger?
  • Selective Expansion — which parts deserve 10x?
  • Hold Scope — this is right as-is
  • Reduction — what if we cut 80%?

When to use: After /office-hours produces a design doc. Run before any implementation starts.

/plan-design-review — Senior Product Designer

Your specialist: Senior Product Designer

What it does: Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. Includes AI slop detection. Interactive — one decision per design choice.

When to use: After eng review, before implementation. Catches design debt before it becomes code debt.

/design-consultation — Design Partner

Your specialist: Design Partner

What it does: Builds a complete design system from scratch. Researches the landscape, proposes creative risks, generates realistic product mockups.

When to use: When you need a full design system, not just a review.

Engineering & Architecture

/plan-eng-review — Engineering Manager

Your specialist: Engineering Manager

What it does: Locks in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open. Generates ASCII diagrams for data flow, state machines, and error paths.

Example output:

Architecture Review:
┌─────────────┐     ┌──────────────┐     ┌────────────┐
│   Client    │────▶│  API Gateway │────▶│  Database  │
└─────────────┘     └──────────────┘     └────────────┘
       │                    │
       ▼                    ▼
  [State Cache]      [Rate Limiter]

Test Matrix:
- Happy path: authenticated user, valid data
- Edge case: concurrent modifications
- Failure mode: database connection timeout
- Security: SQL injection, XSS, CSRF
Enter fullscreen mode Exit fullscreen mode

When to use: After CEO/design review, before coding. The test plan it writes feeds into /qa.

/review — Staff Engineer Code Review

Your specialist: Staff Engineer who finds production bugs

What it does: Finds bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps.

Real output from my session:

[AUTO-FIXED] 2 issues:
- Null check missing in getUserById()
- Unhandled promise rejection in api handler

[ASK] Race condition in concurrent update → you approve fix

[COMPLETENESS GAP] No retry logic for transient failures
Enter fullscreen mode Exit fullscreen mode

It auto-fixes the obvious stuff. Flags the hard decisions. You approve. Done.

When to use: After implementation, before /qa. Run on any branch with changes.

/investigate — Root-Cause Debugger

Your specialist: Debugger

What it does: Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes.

When to use: When you hit a bug that /review couldn't auto-fix. Never skip investigation — the Iron Law exists for a reason.

/codex — Second Opinion

Your specialist: OpenAI Codex CLI

What it does: Independent code review from a different model. Three modes: review (pass/fail gate), adversarial challenge, and open consultation. Cross-model analysis when both /review and /codex have run.

When to use: After /review for a second opinion. Especially valuable for critical paths or when you want cross-model validation.

Testing & QA

/qa — QA Lead with Real Browser

Your specialist: QA Engineer with a real browser

What it does: Opens a real Chromium browser, clicks through flows, finds and fixes bugs with atomic commits. Auto-generates regression tests for every fix.

Example workflow:

1. Opens staging URL in headless Chromium
2. Executes test plan from /plan-eng-review
3. Finds bug: "Submit button doesn't disable during loading"
4. Creates atomic commit with fix
5. Re-verifies: clicks again, confirms fix
6. Generates regression test: test_submit_button_disables()
Enter fullscreen mode Exit fullscreen mode

I caught a bug in 30 seconds that would have taken me 30 minutes to find manually.

When to use: After /review clears the branch. Run on your staging URL.

/qa-only — QA Reporter

Your specialist: QA Reporter

What it does: Same methodology as /qa but report only. Pure bug report without code changes.

When to use: When you want a bug report without auto-fixes. Useful for audit trails.

/benchmark — Performance Engineer

Your specialist: Performance Engineer

What it does: Baselines page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR.

Metrics tracked:

  • First Contentful Paint (FCP)
  • Largest Contentful Paint (LCP)
  • Cumulative Layout Shift (CLS)
  • Time to Interactive (TTI)
  • Bundle sizes

When to use: Before major refactors, after performance optimizations.

/browse — Browser Automation

Your specialist: Browser Automation

What it does: Real Chromium browser, real clicks, real screenshots. ~100ms per command.

Commands:

  • goto <url> — Navigate to URL
  • click <selector> — Click element
  • type <selector> <text> — Type in input
  • screenshot <name> — Capture screen
  • wait <selector> — Wait for element

When to use: Anytime you need to verify something in a browser. Used internally by /qa.

/setup-browser-cookies — Session Manager

Your specialist: Browser Session Manager

What it does: Imports cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages.

When to use: Before /qa if your staging app requires login.

Security & Compliance

/cso — Chief Security Officer

Your specialist: Chief Security Officer

What it does: OWASP Top 10 + STRIDE threat model. Zero-noise: 17 false positive exclusions, 8/10+ confidence gate, independent finding verification. Each finding includes a concrete exploit scenario.

Example output:

[CRITICAL] SQL Injection in /api/users?id= parameter
Exploit: GET /api/users?id=1' OR '1'='1
Impact: Full database read access
Fix: Use parameterized queries
Confidence: 9/10

[FALSE POSITIVE EXCLUDED] XSS in admin panel
Reason: Output is properly escaped with DOMPurify
Enter fullscreen mode Exit fullscreen mode

When to use: Before any production release. Run on any feature that handles user data or authentication.

Shipping & Deployment

/ship — Release Engineer

Your specialist: Release Engineer

What it does: Syncs main, runs tests, audits coverage, pushes, opens PR. Bootstraps test frameworks if you don't have one.

Example workflow:

1. git checkout main && git pull
2. git checkout -b feature/daily-briefing
3. npm test (or bootstraps Jest/Vitest if missing)
4. Coverage audit: 42 tests → 51 tests (+9 new)
5. git push origin feature/daily-briefing
6. Opens PR: github.com/you/app/pull/42
Enter fullscreen mode Exit fullscreen mode

When to use: After /qa clears the branch. One command from "tested" to "PR opened."

/land-and-deploy — Deployment Engineer

Your specialist: Deployment Engineer

What it does: Merges the PR, waits for CI and deploy, verifies production health. One command from "approved" to "verified in production."

Example workflow:

1. Merge PR via GitHub API
2. Wait for CI (GitHub Actions, CircleCI, etc.)
3. Wait for deploy (Vercel, Railway, Fly.io, etc.)
4. Run production health checks
5. Report: "Deployed to production, all checks passing"
Enter fullscreen mode Exit fullscreen mode

When to use: After PR approval. Handles the entire release pipeline.

/canary — SRE

Your specialist: Site Reliability Engineer

What it does: Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures.

Monitors:

  • Browser console errors
  • API error rates
  • Page load regressions
  • JavaScript exceptions

When to use: Immediately after /land-and-deploy. Runs for 5-15 minutes post-deploy.

/document-release — Technical Writer

Your specialist: Technical Writer

What it does: Updates all project docs to match what you just shipped. Catches stale READMEs automatically.

Example output:

[UPDATED] README.md — added new /qa command to docs
[UPDATED] CHANGELOG.md — v0.4.2 release notes
[CREATED] docs/qa-guide.md — new QA workflow guide
[FLAGGED] API.md — may need update for new endpoints
Enter fullscreen mode Exit fullscreen mode

When to use: After /ship or /land-and-deploy. Keeps docs in sync with code.

Reflection & Analytics

/retro — Engineering Manager

Your specialist: Engineering Manager

What it does: Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. /retro global runs across all your projects and AI tools.

Real output from my week:

Week of March 17-23, 2026

- 140,751 lines added
- 362 commits
- ~115k net LOC
- Test coverage: 35% (↑2% from last week)

Shipping streak: 47 days
Enter fullscreen mode Exit fullscreen mode

Per-person breakdowns. Test health trends. Growth opportunities.

It's like having an eng manager who actually cares about your growth.

When to use: End of week. Run /retro for team insights, /retro global for cross-project view.

Power Tools (Safety & Automation)

Command What it does
/careful Warns before destructive commands (rm -rf, DROP TABLE, force-push)
/freeze Restricts file edits to one directory
/guard /careful + /freeze — maximum safety
/unfreeze Removes the /freeze boundary
/setup-deploy One-time setup for /land-and-deploy
/autoplan CEO → design → eng review in one command
/gstack-upgrade Upgrades gstack to latest version

Installation (Actually 30 Seconds)

Requirements:

  • Claude Code
  • Git
  • Bun v1.0+

Install to your machine:

Open Claude Code and paste:

git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack \
  && cd ~/.claude/skills/gstack && ./setup
Enter fullscreen mode Exit fullscreen mode

That's it. Nothing touches your PATH. Nothing runs in the background.

Add to your repo (so teammates get it on clone):

cp -Rf ~/.claude/skills/gstack .claude/skills/gstack \
  && rm -rf .claude/skills/gstack/.git \
  && cd .claude/skills/gstack && ./setup
Enter fullscreen mode Exit fullscreen mode

Real files. No submodules. git clone just works.

Works on Codex, Gemini CLI, Cursor Too

gstack works on any agent that supports the SKILL.md standard. Skills live in .agents/skills/ and are discovered automatically.

Install to one repo:

git clone https://github.com/garrytan/gstack.git .agents/skills/gstack
cd .agents/skills/gstack && ./setup --host codex
Enter fullscreen mode Exit fullscreen mode

Install once for your user:

git clone https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup --host codex
Enter fullscreen mode Exit fullscreen mode

Auto-detect which agents you have:

git clone https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup --host auto
Enter fullscreen mode Exit fullscreen mode

Should You Use This?

Yes, if you're:

A founder or CEO — especially technical ones who still want to ship. gstack lets you move at startup speed without hiring a team.

New to Claude Code — structured roles instead of a blank prompt. If you're new to AI coding, this gives you guardrails.

A tech lead or staff engineer — rigorous review, QA, and release automation on every PR. Even if you only use /review and /qa, you'll catch bugs that would have reached production.

Building solo — if you're building alone, gstack is your virtual team.

In a YC startup — Garry built this for YC founders. If you're in the batch, this is the house stack.

Skip it, if you're:

On a team with established workflows — if you already have a review process, CI/CD pipeline, and design system, gstack might be overkill. Pick individual skills instead of the full sprint.

Not using Claude Code — gstack is optimized for Claude Code. It works on Codex, Gemini CLI, and Cursor, but the experience is built for Claude.

Prefer freeform AI — if you like open-ended prompts and seeing what happens, gstack's structure will feel constraining. It's designed for rigor, not exploration.

The Philosophy (It's Not Just Tools)

gstack isn't just tools. It's a philosophy.

Three principles stuck with me:

1. Boil the Lake

Don't half-boil the lake. If you're going to do something, do it completely. Half measures create more work than full commitment.

2. Search Before Building

Before writing code, search for existing solutions. The best code is code you don't write.

3. The Iron Law of Debugging

No fixes without investigation. Three failed fixes, stop and reassess.

This exists because AI agents (and humans) tend to spray fixes without understanding root causes.

The Real Takeaway

We're witnessing a fundamental shift in software development.

One person with the right tooling can now move faster than a traditional team of twenty.

This isn't theory. Garry's doing it. Peter Steinberger did it with OpenClaw (247K GitHub stars, essentially solo). I'm seeing it in my own workflow after one week.

The tooling is here. It's free. MIT licensed. Open source.

The question is: what will you build with it?

Try It Yourself

Repo: github.com/garrytan/gstack

Installation: 30 seconds

Cost: Free forever

Skills: 28 specialists ready to go

Start with /office-hours on your next feature idea. See if the output changes how you think about the problem.

Then run /review on your current branch. Catch the bugs before production.

Then /qa on your staging URL. Test like a real user.

Eight commands later, you'll understand why Garry ships 20K lines/day.

P.S. — If you're building something interesting with gstack, drop a comment. I'd love to see what you're shipping.

Top comments (0)