DEV Community: Muhammad Ali Kazmi

The Complete Beginner to Advanced Guide to ChatGPT Codex

Muhammad Ali Kazmi — Thu, 02 Apr 2026 15:13:46 +0000

Most people use Codex like a smarter autocomplete.

That is usually where the frustration starts.

Codex works much better when you treat it like a teammate with access to your repo, tools, tests, and instructions. Once that clicks, the quality of the output changes fast.

This guide is for both camps:

the person opening Codex for the first time
the person who already tried it, got mixed results, and wants a workflow they can trust

I verified this guide against official OpenAI Codex docs and Help Center material on April 2, 2026. Since Codex moves fast, that matters.

What Codex Actually Is

Codex is OpenAI's coding agent.

It can work in your terminal, inside supported IDEs, inside the Codex app, and in cloud-backed workflows. According to OpenAI's Help Center, you can use it to write code, review changes, run commands, execute tests, and delegate work in isolated sandboxes.

That means the right mental model is not:

Ask for code and hope for the best.

It is:

Give Codex the task, the context, the rules, and the definition of done.

That one shift fixes a surprising number of bad Codex sessions.

Quick Start in 5 Minutes

As of April 2, 2026, OpenAI's Help Center says Codex is included with ChatGPT Plus, Pro, Business, and Enterprise/Edu plans, and temporarily also included with Free and Go.

If you want the CLI, the official install command is:

npm i -g @openai/codex

Then run:

codex

The first run prompts you to sign in with your ChatGPT account or an API key.

OpenAI's CLI docs also note:

CLI support is available on macOS and Linux
Windows support is experimental
WSL is the recommended path on Windows for the best experience

If you prefer GUI-heavy workflows, the Codex app is worth a look. OpenAI positions it as the place for multiple parallel agents, worktrees, automations, and built-in git flows. If you already live in VS Code, Cursor, or Windsurf, the IDE extension is the most natural entry point.

My recommendation is simple:

Start with CLI if you want to learn the fundamentals
Use the IDE extension if most of your work is file-by-file editing
Use the Codex app when you want multiple concurrent threads, worktrees, or automations

The Mental Model That Makes Codex Better

OpenAI's best practices guide gives a simple default structure for prompts. Use these four parts every time:

Part	What it means
Goal	What exactly should change?
Context	Which files, folders, docs, errors, or examples matter?
Constraints	What rules, architecture, safety limits, or conventions must be followed?
Done when	How do we know the task is complete?

That is the whole game.

Most bad Codex output comes from missing one of these.

For example, this is weak:

Fix auth.

This is much better:

Goal: Fix the login redirect loop for authenticated users.
Context: The issue is in `src/middleware.ts`, `src/lib/auth.ts`, and the `/login` flow. It started after the session cookie rename.
Constraints: Do not change the database schema. Keep the current JWT strategy. Avoid unrelated refactors.
Done when: Logged-in users can refresh `/dashboard` without being redirected to `/login`, and the auth test suite passes.

That prompt is not fancy. It is just clear.

Codex rewards clarity much more than clever prompting.

Your First Good Prompts

If you are new, save these.

1. Fix a Bug

Goal: Fix a bug in [feature].
Context: The problem appears in [files]. The current behavior is [bad behavior]. The expected behavior is [expected behavior]. Relevant error/output: [paste it].
Constraints: Keep the change minimal. Do not rename public APIs. No unrelated formatting changes.
Done when: The bug no longer reproduces, the relevant tests pass, and you explain the root cause in plain English.

2. Build a Feature

Goal: Implement [feature].
Context: Follow the existing patterns in [files/components]. Use [library/tool] if possible. The UI/API should match [reference].
Constraints: Keep this scoped to [files or directories]. Add tests. Do not change unrelated modules.
Done when: The feature works end to end, tests pass, and the final diff is limited to the intended files.

3. Refactor Safely

Goal: Refactor [module] for maintainability.
Context: Start by understanding the current flow in [files]. Identify risks before editing. There are existing callers in [paths].
Constraints: Preserve behavior. No public API breaks. Update tests if needed.
Done when: The code is simpler, behavior is unchanged, and you summarize what changed and what was intentionally left alone.

4. Debug Before Editing

Do not edit anything yet.
First, inspect the relevant files, explain the likely root causes, rank them by confidence, and propose the smallest safe fix.
Only after that, ask for confirmation or proceed with the smallest fix if confidence is high.

That last one is underrated.

A lot of Codex frustration comes from asking for implementation before understanding.

Beginner Workflow: What To Do on Real Tasks

If you are just starting, use this sequence.

Step 1: Ask for a plan on anything non-trivial

OpenAI explicitly recommends planning first for difficult or ambiguous tasks. In Codex, Plan mode exists for exactly this reason.

If the task is bigger than a quick bug fix, do not start with code generation. Start with:

Use plan mode. Inspect the relevant files, ask clarifying questions if needed, then propose the implementation plan before writing code.

That reduces wasted edits.

Step 2: Point Codex at the right files

Large repos are where vague prompting gets expensive.

Tell Codex where to look:

exact files
exact folder
exact failing test
exact error message
exact screenshot or spec

Do not make it guess the neighborhood.

Step 3: Tell it what not to touch

This is where many people lose control of the diff.

If you do not want a refactor, say so.

If you do not want renamed files, say so.

If you do not want dependency changes, say so.

A lot of Codex quality is really scope control.

Step 4: Tell it how to verify the work

OpenAI's best practices guide is very clear here: do not stop at asking Codex to make a change. Ask it to create tests when needed, run checks, confirm the behavior, and review the result.

A solid line to add is:

Run the relevant tests, lint/type checks if applicable, and review the diff for regressions before you consider the task complete.

Step 5: Review like a teammate wrote it

Codex is good. It is not exempt from review.

You should still check:

did it solve the actual problem?
did it change more than necessary?
did it preserve existing behavior?
did it add the right tests?
did it quietly break something adjacent?

If you accept output without review, the problem is not Codex. The problem is process.

The First Two Files Serious Users Set Up

Once you get one or two good sessions, stop repeating yourself manually.

OpenAI's docs say the next step is reusable guidance through AGENTS.md and durable configuration through config.toml.

1. `AGENTS.md`

OpenAI describes AGENTS.md as an open-format README for agents. It is the best place to encode how Codex should work in a repo.

A practical starter file should cover:

repo layout
build, test, and lint commands
engineering conventions
do-not rules
what done means
how to verify work

A minimal starter looks like this:

# AGENTS.md

## Repo map
- App code: `src/`
- Tests: `tests/`
- Shared utilities: `src/lib/`

## Commands
- Dev: `pnpm dev`
- Test: `pnpm test`
- Lint: `pnpm lint`
- Typecheck: `pnpm typecheck`

## Rules
- Keep diffs minimal.
- Do not rename public APIs without explicit instruction.
- Follow existing folder conventions.
- Add or update tests for behavior changes.

## Done when
- Relevant tests pass.
- Lint and typecheck pass.
- Final diff is reviewed for regressions.

OpenAI also recommends keeping AGENTS.md short and practical.

That is correct.

Do not turn it into a manifesto.

If Codex makes the same mistake twice, update AGENTS.md. That is a much better loop than rewriting the same instruction in every prompt.

The CLI also has /init to scaffold a starter AGENTS.md.

2. `config.toml`

A sane starting point for local work is:

model = "gpt-5.4"
approval_policy = "on-request"
sandbox_mode = "workspace-write"
web_search = "cached"
model_reasoning_effort = "high"

[features]
multi_agent = true
shell_snapshot = true

Why these matter:

approval_policy = "on-request" keeps you in control without constant friction
sandbox_mode = "workspace-write" is a good default for normal repo work
web_search = "cached" uses OpenAI's web search cache by default, which the docs position as safer than blindly fetching arbitrary live pages
model_reasoning_effort should go up as tasks get more complex

OpenAI's guidance here is sensible: keep approvals and sandboxing tight by default, then loosen only for trusted repos and specific workflows.

That is the right default attitude.

Intermediate Workflow: How People Move Past One-Off Prompts

This is where Codex starts becoming a real system instead of a novelty.

Use one thread per task

OpenAI explicitly warns against using one giant thread per project.

That leads to bloated context and worse results over time.

Use one thread per coherent task. If the work branches, fork the thread.

Useful session controls from the docs:

/fork to branch the conversation while keeping context
/compact when the thread is getting long
/resume to pick work back up
/status to inspect the current session state

Review inside the workflow

Codex supports review loops, not just code generation.

OpenAI documents /review for reviewing a branch, commit, or uncommitted changes. That is worth using, especially after larger diffs.

A strong pattern is:

Ask Codex to implement
Ask Codex to run checks
Ask Codex to review its own diff against your repo rules
Then do your human review

Keep verification explicit

If your project has specific commands, put them in AGENTS.md and repeat them in the prompt when needed.

For example:

After the change, run `pnpm test auth`, then `pnpm lint`, then review the diff for auth regressions.

Codex is much more reliable when it can see the finish line.

Advanced Workflow: Worktrees, Subagents, MCP, Skills, Automations

This is where Codex gets genuinely powerful.

1. Worktrees for parallel work

OpenAI's worktree docs are one of the most practical parts of the whole Codex stack.

Worktrees let Codex run multiple independent tasks in the same project without interfering with each other. The docs frame Local as your foreground workspace and Worktree as a background workspace.

That matters because the common mistake is obvious:

one active task in your local checkout
another Codex task editing the same branch
confusion, collisions, messy git state

Worktrees fix that.

OpenAI's documented advantages are straightforward:

work in parallel without disturbing local setup
queue background work while you stay focused on the foreground
hand work back into local later when you are ready to inspect or test

If you want Codex doing one larger task while you keep shipping something else, use a worktree.

2. Subagents for truly parallel tasks

OpenAI says Codex can spawn specialized agents in parallel and then consolidate the results.

This is useful when the task is actually parallel, not just large.

Good examples:

one agent reviews security
one agent reviews bugs
one agent inspects flaky tests
one agent maps the codebase around a subsystem

Bad example:

splitting a tightly coupled change across five agents that all want the same files

OpenAI also notes two important constraints:

subagents only run when you explicitly ask for them
they cost more tokens than a single-agent run

Use them when parallelism is real, not because it sounds advanced.

3. MCP when the context lives outside the repo

OpenAI's docs are crisp on MCP: use it when the context Codex needs lives outside the repo and changes frequently.

That means things like:

internal docs
ticketing systems
dashboards
design systems
external APIs
runbooks

If you keep pasting the same outside context into prompts, that is usually an MCP smell.

OpenAI's warning here is also important: do not wire in every tool on day one. Start with one or two tools that remove a real manual loop.

That is good advice. Tool sprawl makes agents worse, not better.

4. Skills when you repeat the same workflow

OpenAI's rule of thumb is excellent:

If you keep reusing the same prompt or correcting the same workflow, it should probably become a skill.

Skills are great for:

incident summaries
release note drafting
repeated debugging flows
checklist-based PR reviews
migration planning

Keep each skill narrow. One job. Clear input. Clear output.

5. Automations when the workflow is stable

OpenAI puts this nicely: skills define the method, automations define the schedule.

That is the right order.

Do not automate a workflow that still needs a lot of steering.

Once it is predictable, automations are useful for things like:

CI failure summaries
recent commit summaries
scheduled repo health checks
standup drafts
recurring analysis jobs

Why Codex Gets Stuck, And How To Recover Fast

This is the section you will come back to.

Symptom: Codex makes a big messy diff

Likely cause:

the prompt was too open-ended
no file scope was given
no constraints were stated

Fix:

Redo this with the smallest safe change.
Only edit the files directly involved.
Do not refactor unrelated code.
Explain the exact files you plan to change before editing.

Symptom: Codex edits before understanding the bug

Likely cause:

you asked for implementation too early

Fix:

Stop coding.
Inspect the relevant files and logs first.
List the most likely root causes, rank them by confidence, and propose the smallest fix.
Do not edit anything until that analysis is complete.

Symptom: Codex solves the wrong problem

Likely cause:

the goal was vague
done criteria were missing

Fix:

Reset the task.
Goal: [one sentence]
Context: [files, errors, references]
Constraints: [rules]
Done when: [testable outcomes]
Repeat your understanding of the task before making changes.

Symptom: Codex keeps repeating the same mistakes across sessions

Likely cause:

repo rules live only in your head

Fix:

Move the rule into AGENTS.md.

This is exactly what OpenAI recommends. Repeated friction should become reusable guidance.

Symptom: Codex is good on small tasks and weak on bigger ones

Likely cause:

you skipped planning
the thread is too broad
the task should be broken down

Fix:

use Plan mode
split the work into smaller tasks
use one thread per task
fork when the work branches

Symptom: Two Codex tasks step on each other

Likely cause:

multiple live threads on the same files or branch

Fix:

Use git worktrees.

OpenAI explicitly warns against running live threads on the same files without worktrees.

Symptom: Codex cannot verify the result well

Likely cause:

the repo does not expose clear commands
test and lint steps were not provided

Fix:

Put the actual commands in AGENTS.md, then restate them in the task.

Symptom: Codex needs information from outside the repo

Likely cause:

missing external context

Fix:

Use MCP for repeatable outside context. If it is a one-off research task, use web search carefully. OpenAI's config docs note that cached web search is the default and should still be treated as untrusted.

The Copy-Paste Rescue Prompts

These are the ones I would actually keep in a note.

Rescue Prompt 1: Plan Before Touching Code

Use plan mode.
Inspect the relevant files first, ask any clarifying questions, and propose the implementation plan.
Do not write code yet.

Rescue Prompt 2: Smallest Safe Fix

Fix this with the smallest safe change.
No unrelated refactors.
No dependency changes.
No file moves unless absolutely necessary.
Explain the intended diff before editing.

Rescue Prompt 3: Understand Before Editing

First explain:
1. what the current code is doing
2. where the bug most likely is
3. what the smallest fix is
4. what could regress if we change it
Then implement only after that analysis.

Rescue Prompt 4: Tight Review Loop

After making the change:
- add or update tests if needed
- run the relevant test, lint, and typecheck commands
- review the diff for regressions
- summarize what changed, what was verified, and any remaining risk

Rescue Prompt 5: Stay Inside the Lane

Only work in these files:
- `src/...`
- `tests/...`
If you believe another file must change, stop and justify it first.

Rescue Prompt 6: Turn This Into Durable Guidance

You made the same mistake twice on this repo.
Write a short retrospective and propose the exact `AGENTS.md` update that would prevent it next time.

That last prompt is how you stop paying the same tax repeatedly.

What Advanced Users Usually Figure Out

After enough Codex usage, the lessons are consistent.

1. Better repos get better Codex output

If your project has:

clear structure
real tests
reliable commands
a useful AGENTS.md
stable conventions

Codex looks much smarter.

That is not magic. The environment is simply legible.

2. Planning is not overhead

People skip planning because they want speed.

Then they burn that time back in rework.

OpenAI's documentation leans hard toward planning for difficult tasks, and I think that is the correct default.

3. Reusability beats prompt gymnastics

A reusable AGENTS.md, one or two good skills, and a sane config file will outperform heroic one-off prompts over time.

4. Parallelism only helps when the task is actually parallel

Use worktrees and subagents when the work can truly branch.

Do not force parallelism into tightly coupled edits.

5. Codex is strongest when it can inspect, change, run, and verify

The more artificial the environment, the worse the results.

If Codex cannot see the repo properly, cannot run the right commands, or cannot verify success, you are leaving a big part of the product unused.

Final Thoughts

If you only remember three things from this guide, make it these:

Use Goal + Context + Constraints + Done when in every serious prompt.
Put recurring rules into AGENTS.md instead of repeating yourself forever.
Ask Codex to plan, verify, and review, not just generate code.

That is the difference between random AI help and a workflow you can actually depend on.

Codex is not hard to use.

But it is easy to use badly.

Once you stop treating it like a code slot machine and start treating it like an engineer with tools, it becomes much more useful.

Official Sources

If OpenAI changes the product behavior after April 2, 2026, treat the links above as the source of truth and update your workflow accordingly.

The Complete Beginner's Guide to GSD (Get Shit Done) Framework for Claude Code

Muhammad Ali Kazmi — Tue, 17 Mar 2026 21:36:51 +0000

If you've been using Claude Code, you've probably hit that wall. You know the one, everything is going great, Claude is writing perfect code, and then suddenly around the 50% context mark, things start to fall apart. Code gets sloppy. Requirements get forgotten. Claude starts "being more concise" (translation: cutting corners).

This is called context rot, and it's the #1 reason vibecoding has a bad reputation.

The GSD (Get Shit Done) framework solves this. Let me walk you through everything.

What is GSD?

GSD is an open-source, lightweight meta-prompting and spec-driven development system built specifically for Claude Code. Created by Lex Christopherson (aka glittercowboy), it has exploded to 31,000+ GitHub stars and is trusted by engineers at Amazon, Google, Shopify, and Webflow.

At its core, GSD does one thing: it keeps Claude Code operating at peak quality throughout your entire project, no matter how large or complex.

How? By breaking your project into small, well-defined tasks -- each executed in a fresh 200K-token context window by specialized sub-agents. Your main session stays lean at 30-40% context usage while the heavy lifting happens in isolated, pristine environments.

"It's not magic. It's just really good context engineering wrapped in a workflow that doesn't get in your way."
-- Lex Christopherson, GSD Creator

Why Should You Care?

Here's the reality of Claude Code without a framework:

Context Utilization	Quality
0-30%	Peak quality, thorough work
30-50%	Good but starting to rush
50-70%	Corner-cutting, missed requirements
70%+	Hallucinations, forgotten context

GSD ensures every task runs in the 0-30% sweet spot. Every. Single. Time.

The Before vs After

Without GSD:

Start a project in Claude Code
First 3-4 tasks go great
Context fills up
Quality degrades silently
You restart sessions, lose context, repeat
Code inconsistencies pile up

With GSD:

Define your project once
GSD creates a roadmap with phases
Each task runs in a fresh sub-agent context
Quality stays consistent from task 1 to task 100
Clean git history with atomic commits
Everything is documented and traceable

Installation (2 Minutes)

Getting started is dead simple:

npx get-shit-done-cc@latest

The installer will ask you two things:

Runtime -- Choose Claude Code (or OpenCode, Gemini CLI, Codex, Copilot, Antigravity if you use those)
Location -- Global (all projects) or Local (current project only)

For beginners, I recommend Claude Code + Global.

Verify Installation

Open Claude Code and run:

/gsd:help

If you see the help menu, you're good to go.

Recommended: Skip Permissions Mode

GSD spawns multiple agents that run commands. Getting prompted to approve every git commit and date command defeats the purpose:

claude --dangerously-skip-permissions

Alternatively, add granular permissions to .claude/settings.json:

{
  "permissions": {
    "allow": [
      "Bash(date:*)",
      "Bash(echo:*)",
      "Bash(cat:*)",
      "Bash(ls:*)",
      "Bash(mkdir:*)",
      "Bash(git add:*)",
      "Bash(git commit:*)",
      "Bash(git status:*)",
      "Bash(git log:*)",
      "Bash(git diff:*)"
    ]
  }
}

The GSD Workflow (Step by Step)

GSD follows a disciplined 6-step cycle. Think of it like a mini software development lifecycle, but without the enterprise BS:

New Project -> Discuss Phase -> Plan Phase -> Execute Phase -> Verify Work -> Complete Milestone

Let's walk through each step.

Step 1: Initialize Your Project

/gsd:new-project

This is where the magic starts. GSD will:

Interview you -- Ask questions until it fully understands your project (goals, constraints, tech preferences, edge cases)
Research -- Spawn parallel agents to investigate the domain (libraries, best practices, pitfalls)
Extract requirements -- Separate what's v1, v2, and out of scope
Create a roadmap -- Map requirements to executable phases

You approve the roadmap, and you're ready to build.

Files created:

PROJECT.md -- Your project vision (always loaded for context)
REQUIREMENTS.md -- Scoped requirements with phase traceability
ROADMAP.md -- Phases mapped to requirements
STATE.md -- Living state document for decisions and blockers
.planning/research/ -- Research findings

Pro tip: Come prepared with a detailed description. The more specific you are upfront, the fewer follow-up questions GSD needs to ask. Include goals, target users, core features, constraints, and tech stack preferences.

Step 2: Discuss the Phase

/gsd:discuss-phase 1

This is the step most beginners skip -- don't. This is where you shape the implementation.

Your roadmap has a sentence or two per phase. That's not enough context to build something the way you imagine it. GSD analyzes the phase and identifies gray areas:

Visual features? -- Layout, density, interactions, empty states
APIs/CLIs? -- Response format, error handling, flags
Content systems? -- Structure, tone, depth

For each gray area, it asks targeted questions. Your answers go into CONTEXT.md, which feeds directly into planning and execution.

Why this matters: Without this step, Claude makes assumptions. With it, Claude builds exactly what you envisioned.

"Plan twice. Prompt once." -- Mauvis Ledford

Step 3: Plan the Phase

/gsd:plan-phase 1

GSD now:

Researches -- How to implement this phase (guided by your CONTEXT.md)
Plans -- Creates 2-3 atomic task plans in XML structure
Verifies -- Checks plans against requirements, loops until they pass

Each plan is small enough to execute in a fresh context window. This is the secret sauce -- no single task ever gets degraded context.

Here's what an atomic plan looks like:

<task type="auto">
  <name>Create login endpoint</name>
  <files>src/app/api/auth/login/route.ts</files>
  <action>
    Use jose for JWT (not jsonwebtoken - CommonJS issues).
    Validate credentials against users table.
    Return httpOnly cookie on success.
  </action>
  <verify>curl -X POST localhost:3000/api/auth/login returns 200</verify>
  <done>Valid credentials return cookie, invalid return 401</done>
</task>

Precise instructions. No guessing. Verification built in.

Step 4: Execute the Phase

/gsd:execute-phase 1

This is where you sit back and watch GSD work. It:

Groups plans into waves -- Independent tasks run in parallel, dependent tasks wait
Spawns fresh sub-agents -- Each gets 200K tokens purely for implementation
Commits per task -- Every completed task gets its own atomic git commit
Verifies against goals -- Checks the codebase delivers what the phase promised

  WAVE 1 (parallel)          WAVE 2 (parallel)          WAVE 3
  +-----------+ +-----------+  +-----------+ +-----------+  +-----------+
  | User      | | Product   |  | Orders    | | Cart      |  | Checkout  |
  | Model     | | Model     |  | API       | | API       |  | UI        |
  +-----------+ +-----------+  +-----------+ +-----------+  +-----------+
       |            |               ^            ^               ^
       +------------+---------------+------------+               |
           Dependencies flow forward through waves

Walk away, come back to completed work with clean git history.

Step 5: Verify Your Work

/gsd:verify-work 1

Automated tests check that code compiles and passes. But does the feature work the way you expected? GSD:

Extracts testable deliverables -- What you should be able to do now
Walks you through each one -- "Can you log in with email?" Yes/no.
Diagnoses failures -- Spawns debug agents to find root causes
Creates fix plans -- Ready for re-execution

If everything passes, move on. If something's broken, run /gsd:execute-phase again.

Step 6: Rinse and Repeat

/gsd:discuss-phase 2
/gsd:plan-phase 2
/gsd:execute-phase 2
/gsd:verify-work 2
...
/gsd:complete-milestone

Loop through all phases. When done, /gsd:complete-milestone archives everything and tags the release.

Want to build more? /gsd:new-milestone starts the next version.

Quick Mode: For Smaller Tasks

Not everything needs the full workflow. For ad-hoc tasks:

/gsd:quick
> What do you want to do? "Add dark mode toggle to settings"

Quick mode gives you GSD guarantees (atomic commits, state tracking) without the full research and planning overhead.

You can add flags for more thorough work:

--discuss -- Gather your preferences first
--research -- Investigate approaches before planning
--full -- Add plan-checking and post-execution verification

Flags are composable: /gsd:quick --discuss --research --full gives you the full experience in a lighter package.

Essential Commands Cheat Sheet

Command	What It Does
`/gsd:new-project`	Initialize a new project with full planning
`/gsd:discuss-phase N`	Capture implementation decisions
`/gsd:plan-phase N`	Research + plan + verify for a phase
`/gsd:execute-phase N`	Execute plans in parallel waves
`/gsd:verify-work N`	Manual user acceptance testing
`/gsd:quick`	Ad-hoc task with GSD guarantees
`/gsd:progress`	Where am I? What's next?
`/gsd:map-codebase`	Analyze existing codebase (brownfield)
`/gsd:debug`	Systematic debugging with persistent state
`/gsd:pause-work`	Save state for later
`/gsd:resume-work`	Restore from last session
`/gsd:help`	Show all commands

Configuration: Model Profiles

GSD lets you control which Claude model each agent uses. This is crucial for managing costs:

Profile	Planning	Execution	Verification
`quality`	Opus	Opus	Sonnet
`balanced` (default)	Opus	Sonnet	Sonnet
`budget`	Sonnet	Sonnet	Haiku

Switch profiles:

/gsd:set-profile budget

Common Pitfalls (And How to Avoid Them)

Based on hundreds of user reports across Reddit, Hacker News, and Twitter, here are the mistakes every beginner makes:

1. Using GSD for Tiny Tasks

GSD's full workflow spawns multiple agents. For a color change or a typo fix, that's massive overkill.

Fix: Use /gsd:quick for small tasks, or just prompt Claude directly.

2. Rushing Through the Discussion Phase

/gsd:discuss-phase exists because Claude makes assumptions when you don't specify preferences. Skipping it means you'll iterate more later.

Fix: Spend 5-10 minutes in the discussion phase. It saves hours of back-and-forth.

3. Not Mapping Existing Codebases

Starting GSD on an existing project without context leads to conflicts with existing patterns.

Fix: Always run /gsd:map-codebase first for brownfield projects.

4. Ignoring Token Costs

GSD is token-heavy. Multiple users reported burning through Pro plan ($20/mo) limits quickly. One user reported a 4:1 overhead ratio -- for every 1 token writing code, 4 tokens went to orchestration.

Fix: The Max plan ($100-200/mo) is strongly recommended for regular GSD usage. Use the budget model profile for less critical phases.

5. Vague Project Descriptions

Vague input at /gsd:new-project triggers excessive follow-up questions and unfocused research.

Fix: Prepare a detailed description before starting. Include: goals, target users, core features, constraints, and tech preferences.

6. Not Clearing Context Between Phases

Letting context accumulate across phases defeats GSD's purpose.

Fix: Use clear between major phases to keep your session lean.

GSD vs Other Frameworks

The spec-driven development space has exploded. Here's how GSD compares:

Framework	Philosophy	Best For
GSD	Lightweight spec-driven, context-rot prevention	Solo devs, multi-phase projects
BMAD	Enterprise SDLC, 21+ agents	Teams, enterprise projects
Ralph Loop	Self-iterating autonomy	Bulk refactors, overnight runs
Superpowers	Skills + guardrails	Speed-focused workflows
Spec Kit	Static markdown specs	Vendor-independent workflows

GSD's sweet spot: You're a solo developer or small team building something non-trivial, and you want consistent quality without enterprise ceremony.

As one Hacker News user put it:

"I love the focus on defining what needs to be done and the criteria for completion. These are great practices with or without AI."

Real-World Results

Here's what actual users are reporting:

Mauvis Ledford (LinkedIn): Spent 8 hours testing GSD, compressed 2-3 days of work into ~1 day
Steve Adams (Hacker News): Successfully delegated Effect pipeline refactoring, DuckDB error parsing, and test suite auditing
Max Buckley (LinkedIn): Completed a 6-month research project (comparing GLiNER, Mistral 7B, and Claude Haiku for NER) in days
Esteban Torres (Blog): GSD produced functionally correct code on first execution for a BlogWatcher UI, requiring only minor stylistic tweaks

Not everyone loves it though. Some users find it "so slow, too detailed" for their workflow, and others report that the overhead doesn't pay off for simple projects. Know your use case.

When to Use GSD vs When to Skip It

Use GSD When:

Building a multi-page app or complex feature
Working on a project spanning multiple sessions
You need consistent quality across dozens of tasks
Refactoring large existing codebases
You want clean, traceable git history

Skip GSD When:

Quick bug fixes or one-liner changes
Initial prototyping/exploration
You're on a tight token budget (Pro plan)
The task is simple enough for a single prompt

My Workflow Tips

After researching extensively and talking to GSD power users, here are the productivity hacks that stand out:

Use separate git worktrees for quick tasks while maintaining your primary branch
Test uncertain features with /gsd:quick --research before committing to your roadmap
Design tokens first, components second, screens third -- prevents cascading redesigns
Capture stray ideas with /gsd:add-todo instead of derailing your current phase
Use /gsd:stats to track your project progress and git metrics
For brownfield projects, always /gsd:map-codebase first -- this prevents Claude from fighting your existing patterns

Getting Help

Official Docs: The GSD Mintlify documentation is excellent
Discord: Run /gsd:join-discord to join the community (active and helpful)
X Community: 1,200+ members sharing tips and workflows
GitHub Issues: Report bugs and feature requests

Final Thoughts

GSD doesn't make Claude Code smarter. It makes Claude Code reliable. By solving context rot through disciplined context engineering, every task gets the full power of Claude's 200K context window.

The learning curve is minimal -- install with one command, follow the 6-step workflow, and let GSD handle the complexity behind the scenes. The creator said it best:

"The complexity is in the system, not in your workflow."

If you're serious about building with AI, GSD is worth the investment. Not because it's magic, but because it's engineering discipline applied to AI workflows. And in a world of vibecoding chaos, discipline is the superpower.

Now go get shit done.

Have questions about GSD? Found a workflow tip I missed? Connect with me on LinkedIn -- I'd love to hear about your experience with the framework.

I Built an Email Validation Library and Published It on npm. Here's Everything I Learned

Muhammad Ali Kazmi — Fri, 26 Dec 2025 05:57:12 +0000

A Quick Intro

Hey, I'm Muhammad Ali Kazmi, a Full Stack Developer from Karachi, Pakistan with 5+ years of experience building web applications. I've worked on everything from AI-powered SaaS platforms to fintech apps, mostly using React, Node.js, and TypeScript.

You can find me on GitHub, LinkedIn, or check out my work at developerkazmi.com.

Now, onto the story...

So I spent the last few weeks building an email validation library from scratch and publishing it on npm. Not because the world desperately needed another npm package, but because I wanted to deeply understand how email validation actually works. And honestly, the existing solutions frustrated me.

In this post, I'll share everything: why I built it, what I learned, the technical decisions that made it 3x faster than alternatives, and how you can use this same approach to build your own npm packages.

Let's dive in...

What I'm Going To Cover:

• Why I started this project
• The 5 layers of email validation most developers don't know about
• How I made it 3x faster than the competition
• The tech stack decisions (and why)
• How to publish your first npm package
• What I'd do differently next time

Why Build Another Email Validator?

Here's the thing. I was using deep-email-validator in a project. It worked fine, but every time I dug into the code, I found things that bothered me:

• The TypeScript types were outdated (still using 3.8)
• No bulk validation support
• No caching (validating the same email twice meant doing all the work again)
• Error messages were generic strings like "Invalid email"
• The regex validation wasn't even RFC 5322 compliant

I thought to myself: "How hard can it be to build something better?"

Famous last words.

Turns out, email validation is way more complex than most people think. But that complexity is exactly what made this project worth building.

The 5 Layers of Email Validation

Most developers think email validation is just a regex check. Run a pattern, get a boolean, done.

That's maybe 20% of proper email validation.

Here's what a production-ready email validator actually needs to check:

Layer 1: Regex (Format Validation)

Does the email look like an email? This needs to follow RFC 5322, the actual standard for email formats.

Fun fact: these are all technically valid emails:
• "john doe"@example.com (quoted strings)
• user+tag@example.com (plus addressing)
• user@[192.168.1.1] (IP address domains)

Most regex patterns people use from Stack Overflow would reject half of these.

Layer 2: Typo Detection

Is user@gmial.com a valid email format? Yes.
Does Gmail actually exist at gmial.com? No.

This layer catches common typos and suggests corrections. "Did you mean gmail.com?"

Layer 3: Disposable Email Detection

Mailinator. Guerrillamail. 10minutemail.

There are over 40,000 disposable email services out there. If you're not blocking these, enjoy your database full of fake users.

Layer 4: MX Record Validation

Even if the domain looks legit, does it actually have mail servers configured? This requires DNS lookups to check for MX records.

No MX records = domain can't receive emails = invalid.

Layer 5: SMTP Verification

The final boss. Actually connect to the mail server and ask: "Hey, does this mailbox exist?"

This is slow (network calls), unreliable (some servers block verification), and optional. But when you need to know for sure, this is how you do it.

Making It 3x Faster

Here's where it got interesting.

The original deep-email-validator validates emails sequentially. One at a time. No caching. Every validation hits the network for DNS and SMTP.

I wanted something that could process 100 emails in under 5 seconds.

Solution 1: Concurrent Processing

Instead of validating emails one by one, I built a bulk processor that runs validations in parallel with configurable concurrency.

const result = await validateBulk(emails, {
  concurrency: 10,
  onProgress: (completed, total) => {
    console.log(`Progress: ${completed}/${total}`);
  }
});

10 concurrent validations means roughly 10x faster bulk processing. Simple math, big impact.

Solution 2: Rate Limiting Built-In

But wait. If you're hammering mail servers with verification requests, you'll get blacklisted. Fast.

I implemented a token bucket rate limiter that controls requests per domain and globally:

await validateBulk(emails, {
  rateLimit: {
    perDomain: { requests: 10, window: 60 },
    global: { requests: 100, window: 60 }
  }
});

This prevents abuse while still keeping things fast.

Solution 3: Early Exit

If the regex check fails, why bother checking MX records? I added an early exit option that stops validation on first failure.

With early exit enabled, invalid emails get rejected in < 1ms instead of waiting for network calls.

Solution 4: Lazy Loading

The disposable email list has 40,000+ domains. Loading that into memory on every import is wasteful.

I implemented lazy loading. The dataset only loads when you actually try to validate against it. Faster startup, lower memory footprint.

The Tech Stack

Here's what I used and why:

TypeScript 5.3+ (strict mode)
• Modern type system with template literals
• Better inference than older versions
• Strict mode catches bugs at compile time

Node.js 20+
• Latest LTS with native ES modules
• Better performance
• Built-in fetch (finally)

Vitest (not Jest)
• 10x faster test runs
• Native ESM support
• Same API as Jest, so easy migration

tsup (not tsc directly)
• Zero config bundling
• Dual ESM + CommonJS output
• Proper tree-shaking

Custom Validation Utils (not Zod)
• Zero dependencies for runtime validation
• Smaller bundle size
• Zod-compatible API for easy migration later

The Final Results

After weeks of work, here's what I shipped:

644 tests passing with 90%+ coverage

5 validators:
• Regex (RFC 5322 compliant)
• Typo detection (suggests corrections)
• Disposable email blocking (40,000+ domains)
• MX record validation (with retry logic)
• SMTP verification (optional, for when you really need it)

Performance:
• Single validation: < 150ms (without SMTP)
• Bulk 100 emails: < 5 seconds
• Package size: ~31KB gzipped

Developer Experience:
• Full TypeScript support
• Three presets (strict, balanced, permissive)
• Detailed error messages with suggestions
• Reputation scoring (0-100)

Publishing to npm

This was my first npm package, so here's what I learned:

Step 1: Get Your package.json Right

{
  "name": "@mailtester/core",
  "version": "1.0.0",
  "main": "./dist/index.cjs",
  "module": "./dist/index.js",
  "types": "./dist/index.d.ts",
  "files": ["dist", "README.md", "LICENSE"]
}

The files field is crucial. It controls what actually gets published.

Step 2: Test With npm pack

Before publishing, run npm pack to see exactly what will be in your package. Check the size. Make sure no secrets or dev files are included.

Step 3: Publish Beta First

npm publish --tag beta --access public

Publish as beta, test in a real project, fix any issues, then publish stable.

Step 4: Create GitHub Release

Tag your release, write release notes, push to GitHub. This gives users confidence that the package is maintained.

What I'd Do Differently

Start with the API design, not the implementation.

I spent a lot of time refactoring because I didn't nail down the public API first. Next time, I'd write the README before writing any code.

Write tests for the public API first.

This forces you to think about how users will actually use your library. My best tests were the ones I wrote before implementing the feature.

Don't over-engineer v1.

I had plans for plugins, caching layers, browser builds, machine learning... I cut most of it. Ship something that works, then iterate.

Try It Yourself

The package is live on npm:

npm install @mailtester/core

Basic usage:

import { validate, validateBulk } from '@mailtester/core';

// Single email
const result = await validate('user@example.com');
console.log(result.valid); // true/false
console.log(result.score); // 0-100 reputation score

// Bulk validation
const results = await validateBulk(['email1@test.com', 'email2@test.com'], {
  concurrency: 10
});

Links

📦 npm: npmjs.com/package/@mailtester/core

💻 GitHub: github.com/kazmiali/mailtester

📚 Docs: kazmiali.github.io/mailtester

What's Next?

I'm planning to add:
• In-memory LRU caching (v1.1)
• Enhanced reputation scoring with configurable weights
• CLI tool for quick validations
• Maybe a browser build

If you found this useful, star the repo on GitHub. It helps more than you'd think.

And if you build something cool with it, let me know. I'd love to see what you create.

DEV Community: Muhammad Ali Kazmi

The Complete Beginner to Advanced Guide to ChatGPT Codex

What Codex Actually Is

Quick Start in 5 Minutes

The Mental Model That Makes Codex Better

Your First Good Prompts

1. Fix a Bug

2. Build a Feature

3. Refactor Safely

4. Debug Before Editing

Beginner Workflow: What To Do on Real Tasks

Step 1: Ask for a plan on anything non-trivial

Step 2: Point Codex at the right files

Step 3: Tell it what not to touch

Step 4: Tell it how to verify the work

Step 5: Review like a teammate wrote it

The First Two Files Serious Users Set Up

1. AGENTS.md

2. config.toml

Intermediate Workflow: How People Move Past One-Off Prompts

Use one thread per task

Review inside the workflow

Keep verification explicit

Advanced Workflow: Worktrees, Subagents, MCP, Skills, Automations

1. Worktrees for parallel work

2. Subagents for truly parallel tasks

3. MCP when the context lives outside the repo

4. Skills when you repeat the same workflow

5. Automations when the workflow is stable

Why Codex Gets Stuck, And How To Recover Fast

Symptom: Codex makes a big messy diff

Symptom: Codex edits before understanding the bug

Symptom: Codex solves the wrong problem

Symptom: Codex keeps repeating the same mistakes across sessions

Symptom: Codex is good on small tasks and weak on bigger ones

Symptom: Two Codex tasks step on each other

Symptom: Codex cannot verify the result well

Symptom: Codex needs information from outside the repo

The Copy-Paste Rescue Prompts

Rescue Prompt 1: Plan Before Touching Code

Rescue Prompt 2: Smallest Safe Fix

Rescue Prompt 3: Understand Before Editing

Rescue Prompt 4: Tight Review Loop

Rescue Prompt 5: Stay Inside the Lane

Rescue Prompt 6: Turn This Into Durable Guidance

What Advanced Users Usually Figure Out

1. Better repos get better Codex output

2. Planning is not overhead

3. Reusability beats prompt gymnastics

4. Parallelism only helps when the task is actually parallel

5. Codex is strongest when it can inspect, change, run, and verify

Final Thoughts

Official Sources

The Complete Beginner's Guide to GSD (Get Shit Done) Framework for Claude Code

What is GSD?

Why Should You Care?

The Before vs After

Installation (2 Minutes)

Verify Installation

Recommended: Skip Permissions Mode

The GSD Workflow (Step by Step)

Step 1: Initialize Your Project

Step 2: Discuss the Phase

Step 3: Plan the Phase

Step 4: Execute the Phase

Step 5: Verify Your Work

Step 6: Rinse and Repeat

Quick Mode: For Smaller Tasks

Essential Commands Cheat Sheet

Configuration: Model Profiles

Common Pitfalls (And How to Avoid Them)

1. Using GSD for Tiny Tasks

2. Rushing Through the Discussion Phase

3. Not Mapping Existing Codebases

4. Ignoring Token Costs

5. Vague Project Descriptions

6. Not Clearing Context Between Phases

GSD vs Other Frameworks

Real-World Results

When to Use GSD vs When to Skip It

1. `AGENTS.md`

2. `config.toml`