DEV Community: Nishil Bhave

Claude Code Save Conversation: Find & Export Transcripts

Nishil Bhave — Tue, 02 Jun 2026 16:40:46 +0000

Claude Code Save Conversation: Where Transcripts Live

Claude Code hit a $1B annualized run-rate six months after public launch: Anthropic's "fastest-growing product in the company's history" (Anthropic, 2025). I've used it daily across 27 project directories on this laptop. As of this morning, my ~/.claude/projects/ folder holds 122 JSONL transcripts for the blog repo alone, going back roughly four weeks.

That last number is the catch. Claude Code keeps your conversations locally, and deletes them after 30 days by default (Claude Code Docs, 2026). If you've ever closed a terminal and wished you could go back to "how did I solve that Postgres migration last month," that's the window you're losing.

This is the practical guide I wish I'd had: where transcripts actually live, the JSON schema, the built-in /resume and /export commands, five open-source tools for searching and exporting, and the redaction workflow I use before sharing a session externally.

the broader Claude Code production hardening guide that pairs with this archiving setup

Key Takeaways

Claude Code stores every session as plaintext JSONL at ~/.claude/projects/<encoded-cwd>/<session-uuid>.jsonl and auto-purges after 30 days unless you set cleanupPeriodDays higher (Claude Code Docs, 2026).

Native commands cover resume and export: /resume opens the session picker, /export <file> writes the current conversation, /insights analyzes your history.

Trust in AI tools dropped to 29% in 2025, its lowest ever (Stack Overflow, 2025). Your own transcripts are the only ground truth about what the model actually did for you.

Five OSS tools turn raw JSONL into searchable, shareable archives: ccusage (14.2k★), claude-code-transcripts, claude-code-log, claude-conversation-extractor, and claude-history.

Why Should You Save Claude Code Conversations at All?

84% of developers now use AI tools but only 29% trust their accuracy, the widest gap the Stack Overflow survey has ever recorded (Stack Overflow, 2025). If the model is wrong, the only audit trail you have is the transcript. There's no "git blame" for an agent's reasoning unless you keep the JSONL.

Three concrete reasons matter more than the privacy paranoia people usually lead with:

Debugging your own agent loops. When a subagent goes sideways (wrong tool, weird argument, runaway plan), the transcript shows the exact stdin/stdout that hooks saw. The first time I had a subagent silently loop on a git status call, replaying the JSONL line by line was the only way I caught it.
Learning from your own patterns. Simon Willison reported personally accumulating 379 MB of JSONL (Simon Willison, 2025). At that volume, the transcripts become a personal prompt library: the prompts that actually worked, not the ones you think worked. A METR randomized trial of 16 experienced devs found they believed AI made them 20-24% faster while measurements showed they were 19% slower ((https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/), 2025). Reviewing your transcripts is one of the few honest ways to close that gap.
Audit and review. Stack Overflow's 2025 data shows 72% of developers say "vibe coding" is not part of their professional work (Stack Overflow, 2025). Most of us review what the model produced. When a PR review asks "why this way?" the JSONL is the receipt.

Our finding: Across 122 saved sessions in this single project, I can grep exactly when I first wired up the Hashnode adapter, what error message convinced me to drop it, and the verbatim prompt that fixed a stale-job recovery bug, none of which I'd remember without the JSONL on disk.

the subagent debugging context that benefits most from saved transcripts

Where Does Claude Code Save Conversations on Disk?

Claude Code writes every session as a JSON-Lines file under ~/.claude/projects/<encoded-cwd>/<session-uuid>.jsonl, with a sibling directory of the same UUID for any sidecar attachments (Claude Code Docs, 2026). The encoded working directory is just your full cwd with / replaced by -, so /Users/nishil/Documents/work/blogs becomes -Users-nishil-Documents-work-blogs. That naming is how the CLI knows which transcripts belong to the project you're standing in when you run claude or /resume.

Here's the actual layout from my machine right now:

~/.claude/projects/
├── -Users-nishil-Documents-work-blogs/          ← this blog's project dir
│   ├── 02a0ea76-3694-4307-b1da-65c17cee00a4.jsonl   ← one full session
│   ├── 02a0ea76-3694-4307-b1da-65c17cee00a4/        ← sidecar dir (attachments)
│   ├── 02cf8a57-d0dc-4d37-b0fc-b242c2f46a1b.jsonl
│   ├── 035df874-52bf-43b7-b458-43b69bcf987f.jsonl
│   └── … (122 sessions total)
├── -Users-nishil-Documents-work-ats-resume-tailor/
├── -Users-nishil-Documents-work-claude-skills/
└── … (27 project directories)

~/.claude/
├── settings.json                  ← cleanupPeriodDays lives here
├── settings.local.json            ← project-local overrides
├── hooks/                         ← your PreToolUse/PostToolUse scripts
└── skills/                        ← installed skills

Each .jsonl file is append-only: every user message, model response, tool call, hook attachment, and snapshot lands as one JSON object per line. That structure is what makes the format trivially grep-able and trivially streamable: no parser required for a first pass.

The 30-day cleanup is enforced by Claude Code itself, not your OS. To keep transcripts longer, edit ~/.claude/settings.json:

{
  "cleanupPeriodDays": 365
}

Set it to a year if you want a real archive. Set it to 0 to disable auto-cleanup entirely (your filesystem becomes the only janitor).

A subtle thing most guides miss: /feedback transcripts have a separate, longer retention (Anthropic keeps them for 5 years to improve the product) (Claude Code Docs, 2026). Submitting feedback from inside a session is itself a form of "save," just one you don't control.

What's Inside a Claude Code Transcript File?

A Claude Code session file is JSON-Lines: each line is a self-contained JSON object with the metadata needed to replay the conversation. Sampling the first session in my blog project, I see five distinct line types, and every line carries sessionId, timestamp, cwd, gitBranch, and a UUID-chained parentUuid linking it to the previous turn. The chain is what makes the transcript a graph, not just a log.

The top-level keys you'll see most often:

Field	Type	What it tells you
`type`	string	`user`, `assistant`, `attachment`, `permission-mode`, or `summary`
`sessionId`	UUID	Matches the filename
`parentUuid`	UUID \| null	Points to the previous turn (null for the first message)
`timestamp`	ISO 8601	Server-side time of the turn
`cwd`	path	Working directory when the turn happened
`gitBranch`	string	Git branch at the time — invaluable for retracing context
`version`	string	Claude Code CLI version (`2.1.119`, etc.)
`message`	object	The full Anthropic message payload (role, content blocks)
`attachment`	object	Hook output, snapshots, or tool sidecar data
`userType`	string	`external` for you, `internal` for agent-spawned subagents
`isSidechain`	boolean	`true` if this turn is from a dispatched subagent

A minimal user-turn line looks like this:

{
  "type": "user",
  "sessionId": "02a0ea76-3694-4307-b1da-65c17cee00a4",
  "parentUuid": "e2363e23-702d-4df6-9c39-036dd00f5d8b",
  "uuid": "f7a2c1b3-...-...",
  "timestamp": "2026-04-24T10:16:23.118Z",
  "cwd": "/Users/nishil/Documents/work/blogs",
  "gitBranch": "main",
  "version": "2.1.119",
  "message": {
    "role": "user",
    "content": [{"type": "text", "text": "Refactor the publish orchestrator…"}]
  }
}

Knowing this schema is what unlocks the rest of the workflow. Once you can identify a type: "user" line and pull message.content[0].text, you can rebuild any session into Markdown with three lines of jq.

A few non-obvious wrinkles. The attachment lines carry hook output (stdin payload, stdout, stderr, exit code, duration), so for hook failures, that's where you look. The sibling directory next to each .jsonl (same UUID, no extension) holds binary attachments like image uploads and diffs. Subagent-spawned turns set isSidechain: true, which separates the main thread from delegated work. There's no compaction either: even a 200-turn session stays on disk as the full append-only log. My largest blog session is 4.8 MB; the project directory across 27 codebases is 612 MB, which compresses to 51 MB gzipped.

How Do You Use the Built-In `/resume` and `/export` Commands?

The native commands cover 80% of what most people need, and they ship with the CLI, so there's no install step. Claude Code's official command reference lists /resume, /continue (alias), /branch, /export, /insights, /rewind, /clear, and /compact (Claude Code Docs, 2026). The first two are how you walk back into a saved conversation.

The four that matter for save-and-reuse:

# Open the session picker for this project — arrow keys, Enter to resume
/resume

# Resume a specific session by UUID or name
/resume 02a0ea76-3694-4307-b1da-65c17cee00a4

# Branch the current conversation — original stays reachable via /resume
/branch experimenting-with-langgraph

# Export the current session to a file (Markdown by default)
/export ~/Desktop/blog-refactor-2026-05-15.md

# Surface patterns across your saved sessions — what tools you use most,
# where loops happened, where you re-prompted
/insights

/resume without an argument is the one I use most. It opens a TUI list of every saved session in the current cwd, with the first user message as the preview, exactly enough context to pick the right one.

The /branch command is the underrated sibling: it forks a conversation at the current turn so you can explore an alternative direction without losing the trunk. I use it when I want to try a riskier refactor that might burn the agent's context — branch, fail, return to the trunk. Cheaper than git stash because no files change.

/export writes the conversation as Markdown by default, ready to drop into a PR description or a postmortem. The output preserves tool calls, which is more than you get from copy-pasting the terminal.

/insights is the newest and most surprising: it runs an analysis pass over your saved sessions and surfaces patterns. The first time I ran it, it told me I was reflexively asking for "a quick fix" 38 times across one project, which was the exact prompt pattern producing the worst output.

How Do You grep Your Own Transcript History?

Because every transcript is a plain JSONL file, the shell is already enough. The patterns below are the ones I run weekly — copy them as a starting kit:

# Count saved sessions in the current project
ls ~/.claude/projects/$(pwd | sed 's#/#-#g')/*.jsonl | wc -l

# Search every transcript ever for a specific phrase
grep -l "Hashnode adapter" ~/.claude/projects/*/*.jsonl

# Pull just the user messages from one session, as plain text
jq -r 'select(.type=="user") | .message.content[0].text // empty' \
  ~/.claude/projects/-Users-nishil-Documents-work-blogs/02a0ea76*.jsonl

# Find every session where you touched a specific file
grep -l "lib/publish-orchestrator.ts" ~/.claude/projects/*/*.jsonl \
  | xargs -I{} basename {} .jsonl

# Sort sessions by total token usage (rough proxy: file size)
du -h ~/.claude/projects/*/*.jsonl | sort -h | tail -10

# Reconstruct a session as plain Markdown in five lines of jq
jq -r '
  select(.type=="user" or .type=="assistant")
  | "### " + .type + " (" + .timestamp + ")\n\n"
    + ((.message.content[]?
        | select(.type=="text") | .text) // "[tool call]")
' SESSION_UUID.jsonl > session.md

The pwd | sed 's#/#-#g' trick is the cheap way to find your current project's transcript folder without leaving the terminal. Pin those one-liners as shell aliases and you have a personal observability layer for Claude Code that costs zero dollars.

the broader pattern of using transcripts as audit evidence during agent code review

Which Open-Source Tools Turn Transcripts Into Real Archives?

Five OSS tools cover the gap between raw grep and a real searchable archive. All five are actively maintained as of May 2026, all five are MIT/Apache-licensed, and all five operate on the same ~/.claude/projects/ directory.

According to a 2026 GitHub stars snapshot, ccusage leads the category with ~14,200 stars (GitHub, 2026), an order of magnitude ahead of the other contenders. That gap reflects the practical priority most teams hit first: cost tracking. Once you've spent a month on Claude Code, you want to know where the tokens went.

What each one is actually for:

ccusage (TypeScript, 14.2k★) — Token usage and cost reports per day, per project, per session. The first install for anyone on a Pro/Team plan. npx ccusage daily is the entire onboarding.
claude-code-transcripts (Python, 1.5k★) — Converts JSONL into clean, paginated HTML. Mobile-friendly, deterministic output. Good for sharing a session as a link.
claude-code-log (Python, 1.0k★) — JSONL → readable HTML + Markdown with filtering and token tracking. The best out-of-the-box "make my transcripts browsable" tool.
claude-conversation-extractor (Python, 563★) — Pulls conversations out of ~/.claude/projects/ and writes Markdown (or JSON/HTML). Lightweight, no dependencies beyond stdlib.
claude-history (Rust, 267★) — Fuzzy search with a built-in TUI. The closest thing to "Spotlight for your Claude Code history."

My setup runs three of them: ccusage weekly for cost reporting, claude-code-log monthly to dump everything into HTML for offline review, and claude-history daily as a TUI when I need to find a specific past prompt.

A short selection guide. ccusage is operational: it answers "where did $400 of API spend go this month" but doesn't show content. claude-code-log and claude-code-transcripts overlap on the export side; both produce HTML, both work fine. claude-conversation-extractor is the right pick if you want zero dependencies and a one-shot Markdown dump. claude-history is the only one with a real TUI and fuzzy search.

All four read the same plaintext JSONL. The data is the moat, not the tools.

How Should You Redact a Transcript Before Sharing It?

Around 100,000+ LLM share-link conversations across ChatGPT, Claude, Copilot, and others were publicly indexed by search engines in 2024-2025 before vendors shut the experiments down (AI Incident Database #1186, 2025). Anthropic stopped Claude share-link transcripts from appearing in Google around September 10, 2025 (Obsidian Security, 2025) — but the lesson stands: a transcript is leaky. Before you paste one into a PR, a Slack channel, or a public gist, redact.

What to scrub:

cwd and absolute paths that reveal your username or project layout
API keys (look for sk-, ghp_, AKIA, anything 40+ chars with no spaces)
Email addresses in user messages
Internal repo names, customer IDs, ticket numbers
The gitBranch field if it leaks unreleased project names

A pragmatic one-pass redaction with jq plus a regex stage:

# Step 1: extract user + assistant turns only (drop hook/permission noise)
jq -c 'select(.type=="user" or .type=="assistant")' session.jsonl > clean.jsonl

# Step 2: strip filesystem and git metadata, scrub obvious secrets
jq -c '. + {cwd: "[redacted]", gitBranch: "[redacted]"}' clean.jsonl \
  | sed -E 's/(sk-[A-Za-z0-9]{20,}|ghp_[A-Za-z0-9]{20,})/[REDACTED-KEY]/g' \
  | sed -E 's/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/[REDACTED-EMAIL]/g' \
  > redacted.jsonl

# Step 3: convert to Markdown for the actual share
jq -r 'select(.type=="user" or .type=="assistant")
  | "### " + .type + "\n\n"
    + ((.message.content[]? | select(.type=="text") | .text) // "")' \
  redacted.jsonl > shareable.md

If you share transcripts often, wrap this in a shell function. The five minutes it takes once is worth more than the recovery effort after you paste a key into a public issue.

One more thing worth checking before you publish a redacted transcript: the assistant's responses sometimes echo your secrets back at you. If you pasted a DATABASE_URL mid-session and the model quoted it in a summary later, regex-scrubbing the input lines isn't enough. Grep the assistant content too, ideally with the same pattern set. A safer habit is to never paste real credentials into a session in the first place — use environment variable names as placeholders and let the agent reason about them abstractly.

My Own Setup: Archive, Index, and Never Lose a Session

I run a four-line cron job and one shell function. That's the entire system.

# crontab -e  — nightly archive of yesterday's transcripts to an external drive
0 2 * * * rsync -a --delete \
  ~/.claude/projects/ /Volumes/Archive/claude-transcripts/

# ~/.zshrc — quick session search in the current project
ccsearch() {
  local dir="$HOME/.claude/projects/$(pwd | sed 's#/#-#g')"
  grep -l "$1" "$dir"/*.jsonl 2>/dev/null | while read -r f; do
    echo "  $(basename "$f" .jsonl)"
    grep -o "\"text\":\"[^\"]*$1[^\"]*\"" "$f" | head -1
    echo
  done
}

# ~/.claude/settings.json — extend local retention from 30 days to one year
{
  "cleanupPeriodDays": 365
}

The cleanupPeriodDays: 365 setting alone is the highest-leverage change in this entire post. Most people don't realize the 30-day default is destroying their data until they go looking for a session from last month and find an empty directory.

For monthly review, I run npx claude-code-log ~/.claude/projects/ and open the generated HTML index. That's when I learn things — which prompts I repeated, which subagents looped, which tools I never actually used. The HTML output is faster to scan than the TUI because you can Ctrl-F across every session at once.

The cron uses rsync rather than tar deliberately: incremental syncs are cheap for unchanged sessions, and the destination stays browsable. --delete mirrors source deletions, which is fine because cleanupPeriodDays: 365 means nothing is deleted for a year anyway. If you rotate laptops, point the same rsync at a VPS over SSH for an offsite copy.

Our finding: Of my 122 saved sessions in this blog project over four weeks, 19 of them were re-prompts of the same core question phrased three different ways. That ratio — roughly 15% wasted re-prompting — only became visible because I had the JSONL on disk to count.

Frequently Asked Questions

Does Anthropic see my Claude Code transcripts if I'm on a Pro plan?

Yes, for service operation. Anthropic stores conversation data server-side for up to 30 days after you delete a chat, retains opt-in training data for up to 5 years de-identified, and keeps policy-violation conversations for up to 2 years (Anthropic Privacy Center, 2026). Commercial plans (Work, Enterprise, Edu, Gov) are excluded from training by default (Anthropic, 2025). API logs were reduced to a 7-day retention window in September 2025 and are never used for training (Anthropic Privacy Center, 2025).

Where does Claude Code save conversations on Windows?

The cross-platform default is %USERPROFILE%\.claude\projects\ on Windows and ~/.claude/projects/ on macOS and Linux. The encoded-cwd directory naming is identical across platforms: slashes (and backslashes on Windows) get replaced with hyphens. Everything else in this guide (JSONL schema, /resume, cleanupPeriodDays) works the same.

How do I disable Claude Code's 30-day auto-delete?

Set cleanupPeriodDays in ~/.claude/settings.json to a higher number (365 for a year) or to 0 to disable cleanup entirely. The setting is documented in the official Data Usage reference (Claude Code Docs, 2026). Restart your session for the change to apply.

Can I export every saved session at once, not just the current one?

Yes. /export only handles the active session, but claude-code-log or claude-conversation-extractor (both linked above) walk every JSONL file in ~/.claude/projects/ and produce one Markdown or HTML file per session. Run them as a monthly cron job for a rolling archive.

What's the difference between `/resume` and `/continue`?

They're aliases. /continue is identical to /resume (Claude Code Docs, 2026). Use whichever feels more natural — the muscle memory matters more than the spelling.

Do hooks have access to the transcript file?

Yes. Every hook payload includes a transcript_path field pointing to the active session's JSONL (Claude Code Docs, 2026). A PostToolUse audit hook can append rich context to its own log, and a Stop hook can summarize the session before ending — the transcript is written in real time, so your hook reads everything up to the current event.

Conclusion

Claude Code's local-first transcript design is one of the most underused features in the CLI. Every session lands as plaintext JSONL, the format is grep-friendly, the schema is stable, and there's a healthy OSS ecosystem turning that data into searchable archives.

The whole workflow is four moves:

Save by setting cleanupPeriodDays past 30 so transcripts survive long enough to be useful.
Search with grep, jq, or claude-history when you need to find a past prompt.
Export with /export for one session or claude-code-log for the whole archive.
Reuse via /resume to jump back into prior context, or /branch to fork it.

Trust in AI tools is at an all-time low and adoption is at an all-time high (Stack Overflow, 2025). The transcript on your disk is the only ground truth you have. Set the retention up tonight; thank yourself in three months.

the next layer of the Claude Code stack to wire into your saved-session workflow

{
"@context": "https://schema.org",
"@graph": [
{
"@type": "BlogPosting",
"headline": "Claude Code Save Conversation: Find & Export Transcripts",
"description": "Where Claude Code saves your conversations, the JSONL schema, and 5 OSS tools to grep, export, and reuse them before the 30-day auto-delete wipes them.",
"datePublished": "2026-06-02",
"dateModified": "2026-06-02",
"author": {
"@type": "Person",
"name": "Nishil Bhave"
},
"image": "https://maketocreate.com/images/generated/claude-code-save-conversation-export-guide-hero-v1-scattered.png",
"url": "https://maketocreate.com/claude-code-save-conversation-export-guide/",
"keywords": ["claude code save conversation", "claude code conversation history", "where does claude code save conversations", "claude code export conversation", "claude code transcripts", "claude code jsonl"]
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Does Anthropic see my Claude Code transcripts if I'm on a Pro plan?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes, for service operation. On consumer plans Anthropic stores conversation data server-side for up to 30 days after you delete a chat, retains opt-in training data for up to 5 years de-identified, and keeps policy-violation conversations for up to 2 years. Commercial plans (Work, Enterprise, Edu, Gov) are excluded from training by default, and API logs use a 7-day retention window and are never used for training."
}
},
{
"@type": "Question",
"name": "Where does Claude Code save conversations on Windows?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Claude Code saves conversations to %USERPROFILE%\.claude\projects\ on Windows and ~/.claude/projects/ on macOS and Linux. The encoded working-directory folder naming is identical across platforms: path separators are replaced with hyphens. The JSONL schema, /resume, and cleanupPeriodDays all work the same way."
}
},
{
"@type": "Question",
"name": "How do I disable Claude Code's 30-day auto-delete?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Set cleanupPeriodDays in ~/.claude/settings.json to a higher number, such as 365 for a year, or to 0 to disable cleanup entirely. The default is 30 days, after which Claude Code itself purges old transcripts. Restart your session for the change to apply."
}
},
{
"@type": "Question",
"name": "Can I export every saved session at once, not just the current one?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes. The built-in /export command only handles the active session, but open-source tools like claude-code-log and claude-conversation-extractor walk every JSONL file in ~/.claude/projects/ and produce one Markdown or HTML file per session. Run them as a monthly cron job for a rolling archive."
}
},
{
"@type": "Question",
"name": "What's the difference between /resume and /continue?",
"acceptedAnswer": {
"@type": "Answer",
"text": "They are aliases for the same command. /continue is identical to /resume, opening the saved-session picker for the current project directory. Use whichever you prefer."
}
},
{
"@type": "Question",
"name": "Do hooks have access to the transcript file?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes. Every hook payload includes a transcript_path field pointing to the active session's JSONL file. A PostToolUse hook can append context to its own log, and a Stop hook can summarize the session, since the transcript is written in real time up to the current event."
}
}
]
}
]
}

Claude Code Router: Cut Your Claude Bill 21x

Nishil Bhave — Mon, 01 Jun 2026 14:24:35 +0000

Claude Code Router: Cut Your Claude Bill 21x

Anthropic's annualized run rate crossed $44 billion in May 2026, and Claude Code alone hit $2.5B annualized by February (Sacra, 2026). Most of that revenue comes from one tab in one terminal. And almost all of it could route somewhere cheaper — if you knew the switch existed.

Claude code router is that switch. It's a small TypeScript proxy from a developer named musistudio that intercepts the requests Claude Code makes to api.anthropic.com and forwards them to whichever provider you want — DeepSeek, Gemini, GLM-4.5, OpenRouter, a local Ollama instance, anything that speaks OpenAI- or Gemini-format. The agent thinks it's still talking to Claude. The bill says otherwise.

I've been running it for four months. This guide is the architecture, the cost math, the config I actually ship, and the tradeoffs nobody mentions in the README.

the broader multi-model workflow this router fits into

Key Takeaways

claude-code-router is a localhost proxy that swaps Claude Code's backend model for any OpenAI- or Gemini-compatible API. The repo is at ~34,000 GitHub stars and MIT-licensed (musistudio/claude-code-router, 2026).

DeepSeek-V4-Flash costs $0.14/M input and $0.28/M output — 21x cheaper on input, 53x cheaper on output than Claude Sonnet 4.6 (DeepSeek, 2026). One Substack writeup tracked $1,200/yr to $60/yr after the swap (John Rodrigues, 2026).

It also unblocks Claude Code in mainland China, Russia, Iran, and the other ~15 regions Anthropic doesn't serve directly (Anthropic, 2026).

The tradeoff is real: prompt caching, native computer use, and tooluse strictness degrade outside Anthropic. The fix is the Router config — keep Claude for code, route everything else.

What Is Claude Code Router and How Does the Proxy Architecture Work?

Claude code router is a localhost HTTP server you install via npm install -g @musistudio/claude-code-router, started with the ccr code command, that exports ANTHROPIC_BASE_URL=http://127.0.0.1:3456 before Claude Code spawns (musistudio/claude-code-router, 2026). Claude Code makes its usual Anthropic Messages API calls; the router rewrites them in flight to whichever provider format the destination wants, then translates the response back.

That's the whole trick. Anthropic's CLI doesn't pin certificates or check the server identity beyond the env var. Set the variable, point it somewhere local, and the agent will happily stream tokens from DeepSeek thinking it's chatting with Claude.

The clever part is the routing layer. The Router config block in ~/.claude-code-router/config.json has five keys: default, background, think, longContext, and webSearch. Each maps a request shape to a <provider>,<model> pair. When a request's prompt token count exceeds longContextThreshold (default 60,000), it goes to longContext. When Claude Code marks a request as "background" — file reads, status checks, summarization between tool calls — it goes to background. Everything else hits default.

The reason this works at all is that Claude Code's tool-use protocol is a thin layer over a generic chat API. Once you can pass JSON-schema tool definitions and parse tool_use blocks in the response, almost any modern model can play. The router's transformer plugins (deepseek, gemini, openrouter, tooluse, maxtoken, reasoning) handle the dialect differences — DeepSeek's reasoning tokens, Gemini's functionCall shape, OpenRouter's quirks around streaming.

why deterministic layers like the router beat in-prompt steering

How Much Money Can Claude Code Router Actually Save You?

The honest number depends on what you do, but the spread is brutal. Claude Sonnet 4.6 charges $3 per million input tokens and $15 per million output (Anthropic, 2026). DeepSeek-V4-Flash charges $0.14 and $0.28 — 21x and 53x cheaper respectively (DeepSeek, 2026). One developer documented dropping from $1,200/yr to $60/yr after routing routine work to DeepSeek (John Rodrigues, 2026). An independent review tracked $150–200/mo bills falling to $30–50/mo, a 75–80% reduction (AI Tool Analysis, 2026).

The output-token spread is where the savings live. Coding agents are output-heavy — they emit diffs, write files, generate plans, summarize after every tool call. One eight-month Claude Code marathon by a developer named Reddit user u/_atomicbomb burned 10 billion tokens, roughly $15,000 at Sonnet list price (Morph, 2026). The same token volume on DeepSeek-V4-Flash would have cost about $300 — still a lot, but a different conversation with your finance team.

Source: Anthropic, Google AI, DeepSeek, Z.ai pricing pages, May 2026.

A common objection here is that you get what you pay for, and the cheap models are dumb. That used to be true. On SWE-bench Verified in May 2026, DeepSeek-V4-Pro scores 73%, well behind Sonnet 4.6 at 79.6% but ahead of GPT-4o and last year's Claude Opus. For 80% of what a coding agent actually does — file reads, regex finds, formatting fixes, doc lookups, dependency bumps — that's enough. The router lets you reserve the expensive model for the 20% that needs it.

My own bill, four months in: $213 in March 2026 on direct Anthropic API → $41 in April after routing background, longContext, and "write a commit message" type calls to DeepSeek-V4-Flash. Default code edits still go to Sonnet. Quality on the work I actually ship hasn't moved. The diff is entirely "stuff Claude was doing in the background that nobody needed to be Claude."

my deeper take on DeepSeek's place in a coding stack

Why Geographic Restrictions Make Claude Code Router Essential Outside the US

Anthropic's API is unavailable in roughly fifteen countries and territories — mainland China, Russia, Iran, North Korea, Belarus, Cuba, Syria, Crimea, Donetsk, Luhansk, Kherson, Zaporizhzhia, and a handful of other African and South Asian nations (Anthropic Supported Countries, 2026). Anthropic tightened the rules in September 2025 to block entities more than 50% owned by parties headquartered in unsupported regions, regardless of where those entities physically operate (Anthropic, 2025).

For a developer in Shanghai or Tehran, that's the end of the conversation with Claude Code via the official path. A VPN doesn't fix it — Anthropic terminates accounts that trip its fraud heuristics, and many corporate environments forbid VPNs anyway. The Claude Code repo on GitHub has open issues from Chinese developers hitting this wall on every fresh install (anthropics/claude-code#2656, 2025).

Claude code router solves it cleanly by routing through a provider that does serve the user's region:

Mainland China: GLM-4.5 from Zhipu AI (Z.ai) or Qwen from Alibaba Cloud — domestic providers, no VPN required, RMB billing.
UAE, Saudi Arabia, Russia: OpenRouter (Singapore-domiciled) acts as a meta-aggregator that often accepts payment methods Anthropic doesn't, and exposes Anthropic's own models among others.
Iran: DeepSeek directly, or local Ollama for everything that doesn't need a frontier model.

The legal nuance matters. Routing through a third-party provider that itself has access to the model is different from circumventing Anthropic. Many enterprises in restricted regions still need an AI coding workflow; the router lets them have one without forcing the user to perjure themselves on a signup form.

According to Menlo Ventures' 2025 enterprise survey, 60% of enterprises now deploy three or more foundation models in production, and the share spent on Anthropic models rose to 40% globally (Menlo Ventures, 2025). The router is what makes those numbers reachable for the parts of the world where the official Anthropic path is closed.

How Do You Install and Configure Claude Code Router?

The install is two commands and a config file. From the repo's current README (musistudio/claude-code-router, 2026):

# Prereq: Claude Code already installed
npm install -g @anthropic-ai/claude-code

# Install the router itself
npm install -g @musistudio/claude-code-router

# Launch Claude Code through the router
ccr code

That last command does three things: starts the local proxy on port 3456, exports ANTHROPIC_BASE_URL=http://127.0.0.1:3456 plus a dummy ANTHROPIC_API_KEY, then spawns Claude Code with those env vars. If you've never run ccr before, the first launch creates ~/.claude-code-router/config.json with a placeholder.

The config file has two top-level sections that matter: Providers (an array of upstream endpoints) and Router (the rules that map request shapes to providers). Here's a minimal working version:

{
  "Providers": [
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/v1/chat/completions",
      "api_key": "$DEEPSEEK_API_KEY",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    },
    {
      "name": "gemini",
      "api_base_url": "https://generativelanguage.googleapis.com/v1beta/models/",
      "api_key": "$GEMINI_API_KEY",
      "models": ["gemini-2.5-pro", "gemini-2.5-flash"],
      "transformer": { "use": ["gemini"] }
    }
  ],
  "Router": {
    "default": "deepseek,deepseek-chat",
    "background": "deepseek,deepseek-chat",
    "longContext": "gemini,gemini-2.5-pro",
    "longContextThreshold": 60000,
    "think": "deepseek,deepseek-reasoner"
  }
}

The $ENV_VAR syntax for api_key is a 2026 addition — it pulls from your shell environment so the config file itself stays safe to commit (assuming you don't commit the env vars). The transformer.use array is the dialect plugin; it reshapes each request body to whatever the upstream API expects.

Inside Claude Code, the /model deepseek,deepseek-reasoner slash command switches the default route for the current session. There's also a <CCR-SUBAGENT-MODEL>provider,model</CCR-SUBAGENT-MODEL> prefix you can drop into a prompt to override the route for a single subagent call — useful when a subagent's job is "summarize this PR" and you want it cheap.

For a visual editor, ccr ui opens a localhost web UI for managing providers, models, and routes — added in v1.0.30. For production usage, ccr start | stop | restart runs the proxy as a daemon so ccr code reuses it across sessions.

patterns for routing different subagents to different models

What Routing Rules Actually Work in Practice?

The default config that ships with the router treats every request the same. That's the wrong move — you'll either burn money on the cheap path or get bad output on the expensive one. The rule I've landed on after four months is to split by what the request is for, not what model the user asked for.

Here's the actual Router block from my config:

"Router": {
  "default": "anthropic,claude-sonnet-4-6",
  "background": "deepseek,deepseek-chat",
  "longContext": "gemini,gemini-2.5-pro",
  "longContextThreshold": 80000,
  "think": "deepseek,deepseek-reasoner",
  "webSearch": "openrouter,perplexity/sonar-pro"
}

The reasoning:

default → Sonnet 4.6. This is the hot path for code edits, where output quality matters and Anthropic's lead on SWE-bench is real. Worth the $3/$15.
background → DeepSeek-V4-Flash. Claude Code marks file-read responses, between-tool summarization, and "what file should I look at next" reasoning as background. Roughly 60% of total token volume by my measurement. None of it needs a frontier model.
longContext → Gemini 2.5 Pro. Above 80K tokens of context, Gemini's 1M-token window is the only game in town that doesn't degrade. Anthropic's own long-context performance drops past 200K, and the per-token cost gets ugly. (If you're considering dropping Claude Code entirely and switching to Gemini's official CLI for the free tier alone, I broke down that exact tradeoff in Gemini CLI vs Claude Code.)
think → DeepSeek-R1. When Claude Code requests thinking: true (extended reasoning), routing to DeepSeek-R1 gets you visible chain-of-thought at $0.55/$2.19 — about a fifth of what Sonnet thinking costs.
webSearch → Perplexity Sonar via OpenRouter. Sonar has live web grounding built in; Claude doesn't.

Personal data, week of May 5–11, 2026, exported via ccr's status-line monitoring beta.

The interesting inversion is in the second line of the figcaption: 58% of requests go to DeepSeek but it accounts for 8% of spend. Anthropic gets 27% of requests and 71% of spend. The router isn't doing the savings work by being clever — it's doing it by stopping the expensive model from getting requests it didn't need to see.

why request shape matters more than model choice for cost

What Tradeoffs Should You Expect When Routing Around Anthropic?

The router does break things, and the README undersells it. Three real problems show up in production.

Prompt caching dies on most non-Anthropic routes. Anthropic's cache is 90% off on cached input — Sonnet 4.6 becomes $0.30 instead of $3 on repeat reads (Anthropic, 2026). DeepSeek has its own cache (the V4-Flash $0.0028 cache-hit rate is published), but the router's transformer doesn't always preserve the cache breakpoints Claude Code sets. If your repo is small and your sessions are long, this can wipe out the headline savings.

Computer use, vision, and PDF support degrade. Anthropic's computer-use tool is a model-trained capability; DeepSeek doesn't have it. Gemini has its own vision but the request shape is different. If you use Claude Code for browser automation or PDF analysis, the router either drops those calls or returns errors the agent doesn't know how to parse.

Tool-use strictness varies. Anthropic's models are aggressively trained to emit valid tool_use JSON. DeepSeek and GLM are looser — they sometimes emit partial JSON, malformed name fields, or hallucinated tool names. The tooluse transformer plugin in the router papers over the worst of it, but I still see "Claude Code stuck because the model said it called a tool that doesn't exist" maybe twice a week.

A 2026 community guide on tokenmix.ai also notes that Opus 4.7's tokenizer counts roughly 35% more tokens than competitors for the same prompt, which means cost comparisons in the router's favor are actually understated — but only if you actually move that workload, not if you let Claude Code keep routing default (Finout, 2026).

What I learned the hard way: my first config sent everything to DeepSeek. Three commits in, the model invented a function name and Claude Code happily called it across four files before crashing on the test step. I rolled back, narrowed DeepSeek to background and think, and the failure rate dropped to noise. The savings dropped 15% and the regret dropped 100%.

Is It Safe to Route API Traffic Through a Local Proxy?

This is the question nobody else writing about claude-code-router seems to answer, and it has three real surfaces.

Supply-chain risk on the install. Sonatype tracked 454,648 malicious npm packages published in 2025, with the npm registry hosting more than 99% of all OSS malware (Sonatype 2026 State of the Software Supply Chain, 2026). The Shai-Hulud worm in September 2025 was the first self-replicating npm worm and hit 500+ packages (Sonatype, 2025). claude-code-router is npm install -g, which means it runs install scripts as your user. The repo is open, the maintainer is reputable, and 34,000 stars is some signal — but you should pin the version, audit the dependency tree at least once, and ideally install into a per-project node prefix rather than globally.

Blast radius on your provider keys. The config file at ~/.claude-code-router/config.json holds plaintext API keys for every provider you've added. If you've got DeepSeek, Gemini, Anthropic, OpenRouter, and a Z.ai key in one file, one machine compromise hands an attacker the lot. The 2026 env-var interpolation ($VAR_NAME) helps — store the keys in your shell environment or a secret manager and let the router read them at startup. Don't commit the file.

Data residency and prompt flow. Every prompt you send goes to the provider you routed it to. If you're a US developer routing background calls to DeepSeek, your code snippets and context are flowing through a Chinese-domiciled API. If your employer has any data classification policy more serious than "don't paste passwords," you should read it before turning this on. For non-confidential personal projects, it's a non-issue. For client work, talk to your compliance lead first.

A useful framing: the router's security posture is roughly that of any other developer SaaS proxy you've already installed (Vercel CLI, Supabase CLI, the Firebase tools). It's not worse than those. It's also not better, and the keys it holds are more valuable.

comparing security postures across the AI coding agent stack

Frequently Asked Questions

Does claude-code-router work with Claude Code's plan mode and subagents?

Yes. Plan mode is just a request flag; the router forwards it. Subagents work too, and the router supports a <CCR-SUBAGENT-MODEL>provider,model</CCR-SUBAGENT-MODEL> prefix that lets you override the route per subagent call. I route my "summarize this PR" subagent to DeepSeek-V4-Flash and keep my "review this code" subagent on Sonnet 4.6.

Will the router break when Claude Code updates?

Sometimes briefly. The router translates Claude Code's request shape, so when Anthropic changes the protocol (the September 2025 PostToolBatch event broke it for two days), there's a lag while the maintainer ships a fix. Pinning to a known-good version and watching the GitHub issues is the practical mitigation.

Can I use Ollama locally and route everything offline?

Yes for some workloads. Add Ollama as a provider with api_base_url: http://localhost:11434/v1/chat/completions, point default and background at it, and Claude Code works fully offline. Quality on Qwen2.5-Coder or Llama 3.3 is solid for simple edits but degrades fast on complex multi-file refactors. Best as a fallback when your internet's down, not a daily driver.

Does claude-code-router work with MCP servers?

Yes — and the confusion is understandable, because it sounds like two routing layers fighting. They don't overlap. MCP configuration lives inside Claude Code itself (your .mcp.json and the servers Claude Code spawns); claude-code-router only swaps the model backend the request is sent to. The router sits between Claude Code and the model API, so your MCP tools, their tool_use calls, and the results all pass through it untouched. The one real caveat is tool-use strictness: if you route an MCP-heavy session to a looser model like DeepSeek or GLM, you'll see more malformed tool_use JSON than you would on Sonnet, so keep MCP-heavy work on the default (Anthropic) route. If you haven't set your servers up yet, start with my Claude Code MCP configuration guide — the router changes nothing about how MCP is wired.

Is the project legitimate or a stealthy way to harvest API keys?

The musistudio/claude-code-router source is fully open, MIT-licensed, and has 34,000 GitHub stars with hundreds of contributors as of May 2026. Outbound traffic only goes to the providers you configure. The bigger risk is install-time supply-chain compromise via the npm registry — pin the version, audit the lockfile, and re-audit on every upgrade.

Will Anthropic ban my account for using a router?

The router doesn't touch Anthropic's API unless you route to it, and Claude Code is the official client either way. The Terms of Service don't prohibit routing requests through a local proxy. The actual risk is using a non-Anthropic provider whose ToS forbids competing with their own coding assistant — read the DeepSeek and Z.ai terms carefully if your usage is commercial.

Conclusion

Claude code router is a small piece of software that changes the economics of agentic coding. For US developers, it's a 60–90% cost cut on the workloads that don't need Anthropic's lead. For developers outside the supported regions, it's the only way to use Claude Code at all without leaving the laws on the table. The cost is real — degraded caching, looser tool use, real-but-bounded security surface — but the cost is also manageable if you split the routes by what the request is actually for.

The version I run today routes 58% of requests to DeepSeek and pays Anthropic for the 27% that matters. The bill dropped from $213 to $41 in one month. The shipped code didn't change.

If you've been using Claude Code at scale and the bill is starting to bite, install the router, point background at DeepSeek, and watch what happens. The full config from this post is in my multi-model workflow guide. The next thing I'd read is the hook patterns post if you want the deterministic side of the same control story.

Author: Nishil Bhave — solo developer, four-month claude-code-router user, runs the maketocreate.com publishing stack on a mix of Anthropic, DeepSeek, and Gemini.

Claude Code Installation Guide: Every Platform, Every Gotcha

Nishil Bhave — Sat, 30 May 2026 15:38:42 +0000

Claude Code Installation Guide: Every Platform, Every Gotcha (2026)

I've installed Claude Code on six machines in the last year. Three Macs (Intel, M1, M3), two Windows laptops, one Ubuntu box, plus the VS Code extension and the JetBrains plugin on top of those. Each install taught me a different lesson the docs don't quite spell out.

The official setup page is good. It's also exhaustive, dry, and structured for the docs team, not for the developer who just wants the right command for their machine and a heads-up on what's going to break. So here's the whole map: every install method, every supported platform, every IDE, the auth flow, the update story, and the clean uninstall paths. With the annotations you'd get from a friend who's done it more times than is healthy.

if you haven't picked a plan yet, start with the cost breakdown

Key Takeaways

Native installer is the recommended path on every OS as of 2026; the npm route still works but moved to "advanced" status in the official docs (code.claude.com, 2026).

Claude Code reached a $1B run-rate inside 9 months of GA, and the VS Code extension passed 13.97M installs by April 2026 (Anthropic, 2025).

Uninstalling the CLI alone won't clean your machine; VS Code, JetBrains, and Desktop all write to ~/.claude/ and will recreate it.

Which Claude Code Install Method Should I Pick in 2026?

The native installer is the right answer for ~95% of developers in 2026, per the official setup page (code.claude.com, 2026). It ships a standalone binary, handles its own auto-updates, doesn't need Node.js, and works the same on macOS, Linux, Windows, and WSL. Pick another method only when you have a specific reason.

The other methods exist for specific use cases:

Homebrew: you already live in brew, want everything in one upgrade queue, and don't mind running brew upgrade yourself.
WinGet: same logic on Windows. Centralised app management beats auto-updates.
apt / dnf / apk: Linux servers and CI runners where a signed package repo is non-negotiable.
npm -g: JavaScript shops standardising tooling through package.json, monorepos, or Dockerfiles. Functional, just no longer the default path.

Source: Stack Overflow Developer Survey 2025

Claude Code hit 10% adoption among developers in the 2025 Stack Overflow survey, behind Cursor at 18% but ahead of Windsurf at 5% (Stack Overflow, 2025). That number is from before the October 2025 web launch and the desktop app push. The real install base is materially higher now. The point: which method you pick matters because there's a lot of tooling layered on top of the CLI.

Here's the platform-support matrix I wish someone had handed me on day one:

Method	macOS	Linux	Windows	WSL	Auto-updates	Needs Node.js
Native installer	Yes	Yes	Yes	Yes	Yes	No
Homebrew cask	Yes	Yes	No	No	No (manual)	No
WinGet	No	No	Yes	No	No (manual)	No
apt / dnf / apk	No	Yes	No	No	No (system update)	No
npm `-g`	Yes	Yes	Yes	Yes	No (manual)	Yes (18+)

What I learned the hard way: I started on npm because I was already in Node land and npm i -g @anthropic-ai/claude-code felt like the obvious move. Three months later I migrated everything to the native installer because the npm path doesn't auto-update and I kept landing on old versions when a new model dropped. Pick the path that updates itself unless you have a reason not to.

how Claude Code stacks up against Codex CLI on install and developer experience

How Do I Install Claude Code on macOS?

The fastest macOS install in 2026 is a single line: curl -fsSL https://claude.ai/install.sh | bash (code.claude.com, 2026). It drops a signed binary into ~/.local/bin/claude, registers an auto-updater, and asks you to log in on first run. No Node, no Homebrew, no sudo. macOS 13.0 or newer is the minimum.

The Homebrew route is for the brew-curious:

# Stable channel (about a week behind latest)
brew install --cask claude-code

# Rolling channel (latest builds)
brew install --cask claude-code@latest

The catch: Homebrew doesn't trigger Claude Code's built-in auto-updater. You'll need brew upgrade claude-code (or claude-code@latest) on your usual upgrade cadence. If you forget for a month, you'll get version skew complaints when a new model lands.

Verify the install with claude --version and then claude doctor. The doctor command was the most useful thing I learned to run after a fresh setup. It catches PATH issues, broken auth tokens, and missing shell completions in one shot.

On Apple Silicon, the native installer ships a universal binary. You don't need a Rosetta install just because you're on an M-series chip. I've seen at least four developers on Reddit chase a Rosetta dependency that was never there. The binary is already universal, and the install script picks the right slice. If you're on an Intel Mac running macOS 13+, the same install works without modification.

Citation capsule: According to Anthropic's official setup documentation, the native installer supports macOS 13.0+, Ubuntu 20.04+, Debian 10+, Alpine 3.19+, and Windows 10 1809+ (code.claude.com, 2026). All install methods produce the same claude binary; only update behaviour and package management differ.

How Do I Install Claude Code on Windows?

Windows developers have three real options in 2026, and the right one depends on which shell you actually live in. The PowerShell native installer is fastest: irm https://claude.ai/install.ps1 | iex from any PowerShell terminal (code.claude.com, 2026). Windows 10 build 1809 or later is the floor; earlier builds aren't supported.

CMD users get the same path with a slightly different invocation:

curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

WinGet works if you're already standardised on the Windows package manager:

winget install Anthropic.ClaudeCode

The thing nobody tells Windows users: Claude Code has historically expected a Unix-like shell environment for some operations, which is why Git Bash matters here. If you installed Git for Windows, set CLAUDE_CODE_GIT_BASH_PATH in your environment so Claude can call into Bash for shell-aware commands. Without it, you'll get cryptic errors on tools that shell out internally. The setting lives in %USERPROFILE%\.claude\settings.json or as an env var.

Stack Overflow's 2025 data puts Windows at 47.6% of professional developer environments, well ahead of macOS at 31.8% and Linux at 27.7% (Stack Overflow, 2025). That's the audience whose Claude Code experience tends to get the least testing love, so plan to have claude doctor open in another window.

the full Claude Code errors and troubleshooting reference if your Windows install misbehaves

Source: Stack Overflow Developer Survey 2025 — Technology

How Do I Install Claude Code on Linux?

Linux gets the deepest install support of any platform, with four official paths: native installer, apt repo, dnf repo, and apk repo, all signed with the same Anthropic GPG key (code.claude.com, 2026). Use curl -fsSL https://claude.ai/install.sh | bash if you want auto-updates. Use the package manager if you want your config management tool to own upgrades.

For Debian and Ubuntu:

curl -fsSL https://downloads.claude.ai/claude-code/apt/stable/pubkey.asc \
  | sudo tee /etc/apt/keyrings/claude-code.asc > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/claude-code.asc] \
  https://downloads.claude.ai/claude-code/apt stable main" \
  | sudo tee /etc/apt/sources.list.d/claude-code.list
sudo apt update && sudo apt install claude-code

For Fedora and Rocky, use the dnf repo with the same key. For Alpine, swap to apk. The GPG fingerprint to verify against is 31DD DE24 DDFA B679 F42D 7BD2 BAA9 29FF 1A7E CACE (code.claude.com, 2026). If you're putting Claude Code into a CI runner or a security-conscious environment, this is the path that survives an audit.

The setting nobody mentions on Linux: env.USE_BUILTIN_RIPGREP in settings.json. Claude Code searches your codebase with ripgrep. On Linux it tries the system rg first, which is the right default, but on a clean Debian box, ripgrep may not be installed at all, and the failure mode is silent. Set "env": { "USE_BUILTIN_RIPGREP": "1" } in your ~/.claude/settings.json and you get the bundled binary unconditionally. I lost twenty minutes to this on a fresh DigitalOcean droplet before I figured it out.

Citation capsule: Anthropic signs all Linux packages with GPG fingerprint 31DD DE24 DDFA B679 F42D 7BD2 BAA9 29FF 1A7E CACE and serves them from downloads.claude.ai/claude-code/{apt,rpm,apk} (code.claude.com, 2026). Verifying the fingerprint before adding the repo is the single defence against a supply-chain swap on the package source.

When Should You Use npm to Install Claude Code?

Use npm when Claude Code is one of many JavaScript tools you're version-pinning across a team, or when your Dockerfile already runs npm ci and adding a curl-pipe-bash step feels wrong. Otherwise, the official docs now call it the "advanced" option, and the GitHub README is even blunter: it labels the npm path as deprecated for typical use (anthropics/claude-code, 2026).

The command itself:

npm install -g @anthropic-ai/claude-code

Three things to know:

Don't use sudo. sudo npm install -g writes to root-owned paths and creates permission headaches every time the package updates. Fix your npm global prefix instead: npm config set prefix ~/.npm-global and add ~/.npm-global/bin to your PATH.
Node 18 or later. The npm install is the only path with a Node version requirement, because the package downloads platform-specific binaries via optional dependencies. The native installer doesn't need Node at all.
Updates are manual. npm install -g @anthropic-ai/claude-code@latest is the upgrade command. npm update -g won't always pull the newest version because of how npm resolves the @latest tag. I learned this when I noticed three of my machines were a major version behind.

GitHub Octoverse 2025 counted over 1.1 million public repos using an LLM SDK, with 693,867 created in the last 12 months alone, a 178% year-over-year jump (GitHub, 2025). The npm path makes Claude Code easy to pin alongside those SDKs in a Node project, which is the strongest reason it sticks around in the toolbox.

How Do I Install the Claude Code VS Code Extension?

Open VS Code, hit the Extensions panel, search for "Claude Code," and install the one published by Anthropic (the publisher field matters; there are unofficial forks). The Marketplace listing shows 13,974,890 installs and a 4.0-star rating across 678 reviews as of late April 2026 (Visual Studio Marketplace, 2026). That's by far the largest extension in the AI-coding category.

The extension is not a standalone product; it's a UI on top of the CLI. If you haven't installed the claude binary first, the extension will prompt you to. The cleanest order is:

Install the CLI (native installer).
Run claude once and log in.
Then install the VS Code extension.
Open the Claude panel; it picks up your existing auth.

If you reverse step 1 and step 3, the extension will trigger its own CLI bootstrap, which works but tends to install into a different path than you'd pick by hand. I've cleaned this up on three machines, and it's never been pleasant.

the subagent patterns I use inside the VS Code extension to parallelize work

The auth gotcha: the first time I installed the extension on a second machine, I got an auth loop because the OAuth callback tried to bind to a port that was already in use by my Next.js dev server. The fix was to close the dev server, run claude /login in the terminal first, then open VS Code. The extension reuses the same token. If the extension still misbehaves, delete ~/.claude/credentials.json (the file is auto-recreated on next login) and try again. That path is documented in the troubleshooting docs but not surfaced in the extension UI.

Citation capsule: The Claude Code VS Code extension passed 13.97M installs on the Visual Studio Marketplace by April 2026, making it the highest-installed AI coding assistant from any major vendor in the extension catalog (Marketplace, 2026). The extension depends on the CLI binary being installed and authenticated, not as a standalone product.

Sources: Anthropic news posts, GitHub repo, Visual Studio Marketplace

How Do I Install the Claude Code JetBrains Plugin?

The JetBrains plugin lives at marketplace plugin ID 27310 ("Claude Code [Beta]"), published by Anthropic, and works across IntelliJ IDEA, PyCharm, WebStorm, RubyMine, GoLand, PhpStorm, Rider, and Android Studio (JetBrains Marketplace, 2026). It's still labelled Beta as of May 2026, which is worth knowing before you wire it into a daily workflow.

Install from inside your JetBrains IDE: Settings → Plugins → Marketplace → search "Claude Code". Install, restart the IDE, and a Claude Code tool window appears in the sidebar. Same dependency rule as VS Code: the plugin calls into the CLI binary, so install the CLI first.

JetBrains' own State of Developer Ecosystem 2025 surveyed 24,534 developers across 194 countries between April and June 2025, the broadest dataset on IDE usage that exists (JetBrains, 2025). It also tracked the rise of Cursor (mentions jumped from 135 in 2024 to over 2,300 in 2025) and Claude Code right alongside it (The Register, 2025). If you're a JetBrains shop wondering whether the plugin is mature enough, the answer in mid-2026 is "yes for daily use, no for unattended automation." Beta means breaking changes are still acceptable.

The auth quirk on JetBrains: the plugin tries to find your claude binary via PATH, and on macOS it sometimes can't see PATH entries you set in .zshrc because IDEs launched from Spotlight inherit a stripped environment. If the plugin can't find Claude, open the IDE from a terminal with open -a "IntelliJ IDEA" or set the binary path explicitly in plugin settings. This is the single most common JetBrains setup issue I've seen reported on the Anthropic forums.

the hook patterns that let me automate Claude Code workflows across JetBrains and the CLI

Is There a Claude Code Desktop App?

Yes — Anthropic shipped a graphical desktop app for users who'd rather not live in a terminal (code.claude.com, 2026). macOS users grab the .dmg from claude.ai/api/desktop/darwin/universal/dmg/latest; Windows users get the installer through claude.com/download. Linux desktop isn't listed in the official setup docs as of May 2026, so Linux users stick with the CLI or the IDE plugins for now.

The desktop app and the CLI share the same ~/.claude/ config directory, the same auth token, the same sessions. Whichever you log into first, the other picks up. That's why "uninstall the CLI" without uninstalling the desktop app doesn't really uninstall anything; the desktop app recreates the config files the moment you launch it.

Anthropic also launched Claude Code on the web on October 20, 2025, plus an iOS app the same day (Anthropic, 2025). The web version doesn't need any install at all. The iOS app is a companion, not a replacement: useful for reviewing sessions on the go, less useful for serious coding.

Citation capsule: Anthropic launched Claude Code's web interface and iOS app on October 20, 2025, expanding the product from a CLI-first tool to a multi-surface platform across terminal, IDE plugins, desktop, web, and mobile (Anthropic, 2025). All surfaces share the same authentication and configuration, which simplifies setup but complicates clean uninstalls.

How Do I Log Into Claude Code After Install?

Run claude in your terminal once and the auth flow opens a browser window for OAuth against your Claude.com account (code.claude.com, 2026). Pro, Max, Team, Enterprise, and Console accounts all work. The free Claude.com tier does not include Claude Code access, which is the most common "why isn't this working" question on the forums.

There are three login paths, depending on how you bill:

Claude.com subscription: Pro, Max, Team, or Enterprise. OAuth flow, no API key needed, usage counts against your subscription's rate limits.
Anthropic Console API key: for pay-as-you-go billing. Set ANTHROPIC_API_KEY in your environment or run claude /login and pick "API key."
Cloud provider keys: Amazon Bedrock, Google Vertex AI, or Microsoft Foundry. Same claude binary, different backend. Useful when your company has its own contract with the cloud provider.

The /login slash command works inside an active session if you need to switch accounts mid-flight. The first launch handles login automatically, so most users never type /login at all.

GitHub Copilot crossed 4.7M paid subscribers by January 2026, with 80% of new GitHub developers using it in their first week (GitHub Octoverse, 2025). That's the comparison number to keep in mind: AI coding tools are now table stakes for new developers, and Claude Code's auth flow is intentionally designed for fast on-ramp from a Claude.com account most users already have.

the router pattern for swapping backends without changing your auth flow

How Do I Update Claude Code Without Losing My Config?

Native installs auto-update by default in 2026, with two channels exposed via the autoUpdatesChannel setting in ~/.claude/settings.json: latest (rolling) or stable (about a week behind, recommended for teams) (code.claude.com, 2026). Manual updates are a one-liner: claude update. Your config, sessions, and auth tokens survive every update; Anthropic versions the config schema separately from the binary.

Per install method:

# Native (auto-updates by default; manual trigger)
claude update

# Homebrew
brew upgrade claude-code         # or claude-code@latest

# WinGet
winget upgrade Anthropic.ClaudeCode

# Linux package managers
sudo apt update && sudo apt upgrade claude-code
sudo dnf upgrade claude-code
sudo apk update && sudo apk upgrade claude-code

# npm
npm install -g @anthropic-ai/claude-code@latest

Two settings nobody surfaces enough:

"minimumVersion": "1.2.3" in settings.json blocks the binary from launching if it's older than the value you set. Useful for teams pinning a feature floor.
"env": { "DISABLE_AUTOUPDATER": "1" } shuts off the background updater entirely. Useful in CI where you want reproducible builds.

The biggest config-preservation win on a multi-machine setup: log into your Anthropic account on each machine, and the model preferences, MCP server configs, and slash commands all sync through the account, not the local file. The local ~/.claude/ directory is more of a cache than a source of truth. Move it freely between machines, or don't; the next login fills it in.

how to preserve session history before a clean reinstall or migration

How Do I Cleanly Uninstall Claude Code from Every Platform?

The full uninstall has three layers: the binary, the config, and the IDE integrations. Skipping any of them leaves Claude Code partly resident on your machine. Here's the complete map by platform, taken from the official setup docs and the actual file paths the installer touches (code.claude.com, 2026).

Binary, by install method:

# Native (macOS / Linux)
rm -f ~/.local/bin/claude
rm -rf ~/.local/share/claude

# Native (Windows PowerShell)
Remove-Item "$env:USERPROFILE\.local\bin\claude.exe" -Force
Remove-Item "$env:USERPROFILE\.local\share\claude" -Recurse -Force

# Homebrew
brew uninstall --cask claude-code     # or claude-code@latest

# WinGet
winget uninstall Anthropic.ClaudeCode

# apt
sudo apt remove claude-code
sudo rm /etc/apt/sources.list.d/claude-code.list /etc/apt/keyrings/claude-code.asc

# dnf
sudo dnf remove claude-code
sudo rm /etc/yum.repos.d/claude-code.repo

# apk
sudo apk del claude-code

# npm
npm uninstall -g @anthropic-ai/claude-code

Config cleanup (do this after removing IDE plugins, see next paragraph):

rm -rf ~/.claude
rm -f ~/.claude.json
# plus any project-local .claude and .mcp.json files

IDE integration gotcha (this is the one that catches people): the VS Code extension, the JetBrains plugin, and the desktop app all write to ~/.claude/. If you remove the CLI and then later open VS Code with the extension still installed, the extension will recreate ~/.claude/ and reinstall the CLI behind the scenes (code.claude.com, 2026). To actually be Claude-free, uninstall in this order: IDE plugins first, desktop app second, CLI binary third, config last.

The hour I lost: I uninstalled the CLI on a work laptop before handing it back to IT, ran rm -rf ~/.claude, and felt good about it. Opened VS Code one last time to grab a setting I forgot, and the Claude extension cheerfully reinstalled everything: binary, config, the lot. The right order is plugin uninstalls first. Otherwise you're playing whack-a-mole with auto-reinstall.

Citation capsule: Anthropic documents three uninstall layers for Claude Code: the binary (per install method), the config files at ~/.claude/ and ~/.claude.json, and the IDE integrations (VS Code, JetBrains, Desktop) (code.claude.com, 2026). Removing only the CLI leaves the IDE plugins in place, and they will auto-reinstall Claude Code on next IDE launch.

Frequently Asked Questions

What Node.js version does Claude Code require?

Only the npm install path needs Node.js 18 or later. The native installer, Homebrew cask, WinGet, and Linux package managers ship a standalone binary that does not call Node at runtime; they install platform-specific binaries directly (code.claude.com, 2026). If you don't use npm, you can have no Node installed and Claude Code works fine.

Can I install Claude Code on multiple machines with the same account?

Yes. A single Claude.com Pro, Max, Team, or Enterprise account works across unlimited machines, with usage pooled against the same rate limit. Max $200 covers roughly 240-480 Sonnet hours plus 24-40 Opus hours per week (TechCrunch, 2025). Install on each machine, run claude /login once per machine, and you're done.

Is the npm install of Claude Code deprecated?

The official setup docs call it the "advanced" install path, and the GitHub README is sharper, labeling the npm route as deprecated for typical users (anthropics/claude-code, 2026). It still works and still receives updates. Most new installs in 2026 should default to the native installer; npm remains the right call for monorepo and Dockerfile use cases.

Does Claude Code work on Windows without WSL?

Yes. The PowerShell installer and the WinGet path both produce a native Windows binary that runs without WSL (code.claude.com, 2026). WSL is useful if you want a Unix-like shell environment, but it isn't required. Set CLAUDE_CODE_GIT_BASH_PATH if you have Git for Windows installed and Claude misbehaves on shell-heavy commands.

How do I check what version of Claude Code I'm running?

Run claude --version for the version number, then claude doctor for a full health report covering PATH, auth state, MCP servers, shell completion, and updater status. Anthropic's GitHub repo passed 124,000 stars by May 2026 (GitHub, 2026), and the doctor command is the single most-mentioned setup diagnostic across those issues.

What to Install After Claude Code Is Running

Claude Code on a fresh machine is the base layer. The interesting workflow happens once you start layering on subagents, hooks, MCP servers, and custom slash commands, the things that turn a CLI into a development environment that fits your specific brain. Pick one of the four guides below depending on what you want to build first.

start here if you want to parallelize complex tasks
start here if you want to enforce policies and automate decisions
start here if you're deciding between Skills and MCP for tooling
keep this open in another tab; you'll need it

If something in this guide broke for you in a way I didn't anticipate, that's the bit I want to hear about. Tooling docs get better when readers tell writers what they actually hit. Until then, may your installs be reproducible and your claude doctor reports be empty.

CLAUDE.md Best Practices: The Complete 2026 Guide

Nishil Bhave — Fri, 29 May 2026 15:16:09 +0000

CLAUDE.md Best Practices: What Actually Moves the Needle

A CLAUDE.md file is plain markdown that Claude Code reads at the start of every session, and it's the single biggest lever you have over output quality. Get it right and Claude stops guessing your conventions. Get it wrong and you bloat the context window before you've typed a word. The best practice that matters most: keep it short, specific, and universally true for the project.

TL;DR
CLAUDE.md is project memory loaded into every Claude Code session. The best CLAUDE.md files are under ~200 lines, contain only universally-applicable rules (commands, architecture, conventions), point to detailed docs with file references instead of pasting them, and never duplicate what a linter already enforces. Bloated files trigger context rot. Chroma's 2025 study found every one of 18 frontier models degrades as input grows, sometimes from 95% to 60% accuracy (Chroma, 2025). Treat CLAUDE.md like code: commit it, review it, and prune it.

I've been editing CLAUDE.md files daily for over a year, across decade-old Laravel codebases (the Sivon API among them) and the Next.js blog publisher that runs maketocreate.com. This is the deep-dive the rest of my real-world Claude Code workflow that CLAUDE.md plugs into keeps pointing back to. If CLAUDE.md is one of seven moving parts there, here it gets the whole article.

Key Takeaways

CLAUDE.md loads on every session, so every line spends context budget, and Anthropic's own guidance is to keep it concise (Claude Code docs, 2026).

Frontier models reliably follow only ~150–200 instructions; Claude Code's system prompt already uses ~50 (HumanLayer, 2025).

Use file:line references and pointers to agent_docs/, never pasted code blocks. Progressive disclosure keeps the always-loaded file small.

CLAUDE.md ≠ AGENTS.md ≠ .claude/agents/*.md. They're three distinct surfaces; mixing them up is the most common configuration mistake I see.

What Is CLAUDE.md and Where Does It Live?

CLAUDE.md is a markdown file Claude Code automatically loads into context at the start of every conversation, giving the model persistent, project-specific memory it can't infer from code alone (Claude Code docs, 2026). Think of it as the briefing you'd give a senior contractor on day one, except Claude reads it fresh, every single session, forever.

The claude md file lives in a small hierarchy, and the layering is the part most people miss. There are three locations Claude Code checks, in increasing scope:

Project memory — ./CLAUDE.md at the repo root. Committed to git, shared with the team. This is the one you'll spend 90% of your time on.
Local/project-private — ./CLAUDE.local.md (gitignored) for personal overrides you don't want to commit.
User memory — ~/.claude/CLAUDE.md, applied across every project on your machine. Good for personal preferences ("always explain before refactoring") that aren't repo-specific.

Claude Code also walks up the directory tree, so a CLAUDE.md in a subdirectory layers on top of the root one, which is useful in monorepos. The filename is case-sensitive and exact: it's CLAUDE.md, not claude.md or claude .md. If you've just installed the CLI and don't have one yet, see my guide to installing Claude Code if you haven't yet, then run /init inside any repo and it scaffolds a starter file from your project structure.

CLAUDE.md is loaded into context on every Claude Code session as persistent project memory, layered across three scopes (project ./CLAUDE.md, local ./CLAUDE.local.md, and user ~/.claude/CLAUDE.md), with narrower scopes overriding broader ones, exactly as documented in Anthropic's Claude Code best-practices guide (Anthropic, 2026).

Why CLAUDE.md Is the Most Underrated Claude Code Feature

Most developers install Claude Code, skip CLAUDE.md entirely, and then wonder why it keeps using the wrong test runner. That's a mistake: 84% of developers now use or plan to use AI coding tools, yet only 33% trust their accuracy versus 46% who actively distrust it (Stack Overflow Developer Survey, 2025). A precise CLAUDE.md is the cheapest way to close that trust gap on your own codebase.

Here's the mechanism. Claude Code is the most-loved AI coding tool of 2026, with a 46% "most loved" rating across 15,000 developers surveyed (Pragmatic Engineer via UncoverAlpha, 2026), and it crossed a $2.5B revenue run-rate by February 2026 (UncoverAlpha, 2026). But the model still doesn't know that your project deploys with make ship, that legacy/ is off-limits, or that you use Pest, not PHPUnit. Without CLAUDE.md it relearns (or mis-guesses) those facts every session.

The day it clicked for me was on the Sivon Laravel API. I'd been re-typing "use the repository pattern, not Eloquent in controllers" into every session for a week. I moved one line into CLAUDE.md and the correction stopped being necessary. That's the underrated part: CLAUDE.md doesn't make Claude smarter, it makes it stop forgetting. Every rule you write once is a correction you never type again.

My finding: Across roughly 40 repos I've configured, the single highest-ROI line in any CLAUDE.md is the exact test command. Claude defaults to the ecosystem-common runner (npm test, phpunit) and gets it wrong constantly. One line, Run tests with: npm run test:unit, eliminates an entire class of wasted turns.

How to Write a CLAUDE.md: The Step-by-Step Structure

Knowing how to write a CLAUDE.md comes down to one rule: include only what's true across every session, and structure it around three questions: what, why, and how (HumanLayer, 2025). Frontier models reliably follow only about 150–200 instructions, and Claude Code's system prompt already consumes roughly 50 of those (HumanLayer, 2025). Your budget is smaller than you think.

So what actually goes in the file? Run /init first to scaffold, then rewrite it by hand. A good structure, top to bottom:

One-line project description: what this repo is, in a sentence.
Tech stack: framework, language, versions, the database. Facts Claude can't reliably guess.
Commands: build, test, lint, run, deploy. The exact invocations, not the conventional ones.
Architecture: the 3–5 directories that matter and what each does. Point to files, don't describe them in prose.
Conventions: patterns the team enforces that a linter can't (repository pattern, error-handling style, naming).
Boundaries: what's off-limits (legacy/, generated files, vendored code).
Pointers: links to agent_docs/ or deeper specs for anything detailed.

That last one is the technique that keeps the file small: progressive disclosure. Store detailed guidance in separate files and reference them with file paths, so Claude pulls them only when relevant (HumanLayer, 2025). A CLAUDE.md should be a map, not the whole territory.

The strongest CLAUDE.md files answer three questions (what the project is, why its components exist, and how to build, test, and verify it) while pushing detailed guidance into referenced agent_docs/ files. This progressive-disclosure pattern keeps the always-loaded file under the ~150-instruction ceiling frontier models reliably follow (HumanLayer, 2025).

Real CLAUDE.md Examples: Laravel API, Next.js App, Python Pipeline

The fastest way to internalize CLAUDE.md best practices is to read good examples. Below are three condensed, real-shaped files from the stacks I work in. Notice what's absent: no code style rules (the linter owns those), no pasted code, no task-specific instructions.

Laravel API (CLAUDE.md):

# Sivon API

Laravel 11 REST API for booking management. PHP 8.3, MySQL 8, Pest for tests.

## Commands
- Test: `php artisan test` (Pest, NOT PHPUnit directly)
- Lint: `./vendor/bin/pint`
- Migrate fresh: `php artisan migrate:fresh --seed`

## Architecture
- Controllers stay thin — business logic lives in `app/Services/`
- DB access through `app/Repositories/` only. Never use Eloquent in controllers.
- API resources in `app/Http/Resources/` shape every JSON response.

## Conventions
- Form Requests for all validation. No inline `$request->validate()`.
- New endpoints: add a route, a Form Request, a Service method, a Pest feature test.

## Off-limits
- `app/Legacy/` — frozen, do not modify. See agent_docs/legacy-migration.md.

Next.js app (CLAUDE.md):

# maketocreate Blog Publisher

Next.js 16 App Router + React 19 + TypeScript (strict). Tailwind v4. File-based JSON state, no DB.

## Commands
- Dev: `npm run dev` | Build: `npm run build` | Lint: `npm run lint`

## Architecture
- Articles: markdown in `articles/<category>/`. Parser: `lib/parser.ts`.
- Publishing pipeline: `lib/publish-orchestrator.ts`. Adapters in `lib/adapters/`.
- State persisted as JSON via `lib/store.ts`. No database.

## Conventions
- Tailwind v4 configured via `@theme` in `app/globals.css` — there is NO tailwind.config.ts.
- Dark mode is the default-supported path; test both themes.

## Off-limits
- `config.json` is gitignored — never commit API keys.

Python data pipeline (CLAUDE.md):

# Ingest Pipeline

Python 3.13 ETL. uv for deps, Polars (not pandas), Prefect for orchestration.

## Commands
- Install: `uv sync` | Test: `uv run pytest` | Run flow: `uv run python -m flows.daily`

## Architecture
- Extractors: `src/extract/` | Transforms: `src/transform/` (pure functions, Polars)
- Schemas validated with Pydantic v2 in `src/schemas/`.

## Conventions
- All transforms are pure and unit-tested. No I/O inside transform functions.
- Use Polars expressions, not pandas. Reject any pandas import in review.

These claude.md examples share a shape: scannable, command-first, pointer-driven. Each is well under 60 lines. HumanLayer's own production CLAUDE.md is fewer than 60 lines, and that's not an accident (HumanLayer, 2025).

Illustrative estimate based on ~13 tokens/line and a 200K context window; Source: HumanLayer instruction-budget analysis, 2025

CLAUDE.md vs claude/agents.md: When to Use Which

This is where people get tangled, so let's be precise. "Claude agents.md" can mean two different things, and CLAUDE.md is a third. They aren't interchangeable; each loads at a different time for a different reason. The AGENTS.md open standard launched in 2025 backed by OpenAI, Google, Factory, Sourcegraph, and Cursor, and passed 20,000 adopting repositories by August 2025 (Harness, 2026).

Here's the distinction:

CLAUDE.md: Claude Code's native project memory. Loaded into the main session, always.
AGENTS.md: the cross-tool open standard, readable by Codex, Cursor, Copilot, and others. Schema-free markdown, same job as CLAUDE.md but vendor-neutral.
.claude/agents/*.md: subagent definitions. Separate agents you delegate to, each with its own isolated context window and its own instructions.

So which one do you actually write? If your repo only uses Claude Code, write CLAUDE.md and move on. If multiple agents touch the repo, the clean pattern is to keep one canonical file and symlink the other: ln -s CLAUDE.md AGENTS.md (or the reverse). One source of truth, every tool reads it. In December 2025, AGENTS.md was donated to the Linux Foundation's Agentic AI Foundation alongside Anthropic donating MCP, a signal the two ecosystems are converging, not competing (Harness, 2026).

Source: Claude Code documentation and AGENTS.md specification, 2026

The 12 CLAUDE.md Best Practices I Follow Daily

After more than a year of daily use, these are the Claude Code CLAUDE.md best practices I apply to every repo. They exist because each one fixed a real, repeated failure. The throughline: every line earns its context cost, because context rot is real. Chroma's 2025 benchmark showed all 18 frontier models tested, including Claude Opus 4, lose accuracy as input grows, some dropping from 95% to 60% past a threshold (Chroma, 2025).

Keep it under ~200 lines. Shorter is better. If it scrolls more than a screen or two, you're paying for it every session.
Lead with commands. Test, build, lint, run: the exact invocations. This is the highest-ROI section by a wide margin.
Never duplicate the linter. Don't write style rules a formatter enforces deterministically. LLMs are slow and expensive at jobs a linter does instantly (HumanLayer, 2025).
Use file:line references, not pasted code. See lib/parser.ts:42 beats a 30-line snippet that goes stale.
Only universal truths. No task-specific instructions. "How to design the new billing schema" doesn't belong here; it distracts every unrelated session.
Point to agent_docs/. Detailed guidance lives in referenced files, loaded on demand. Progressive disclosure keeps the core lean.
State what's off-limits. Frozen directories, generated files, vendored code. Boundaries prevent the most damaging mistakes.
Document domain jargon. Map your internal terms to the code so Claude edits the right files.
Mention your MCP setup. If the repo relies on specific servers, note them; see my MCP server config that CLAUDE.md should mention for the configs themselves.
Prefer hooks for must-happen actions. CLAUDE.md instructions are advisory; hooks are deterministic. If something must run every time, make it a hook, not a sentence.
Write imperatively. "Use Pest" beats "We generally prefer Pest when possible." Ambiguity costs adherence.
Revise it like code. When Claude does something wrong twice, that's a missing line. Add it. Prune what's gone stale.

The defining trait of a strong CLAUDE.md is ruthless economy: under 200 lines, command-first, linter-free, and built on file references rather than pasted code, because Chroma's 2025 study confirmed every frontier model, Claude Opus 4 included, degrades measurably as context fills, making each unnecessary line a tax on every future session (Chroma, 2025).

Common CLAUDE.md Mistakes (Before and After)

The most common CLAUDE.md mistake is treating it like documentation instead of working memory: dumping everything in, including the things that hurt. The "lost in the middle" effect means models attend poorly to the center of a long context, with accuracy drops exceeding 30% (Morph, 2025). A bloated CLAUDE.md doesn't just waste tokens, it buries the rules that matter.

Three before/after fixes I make constantly:

Mistake 1: Pasting code.

❌ Before: a 40-line code block showing "how our service layer works."
✅ After: Business logic lives in app/Services/. Pattern example: app/Services/BookingService.php

Mistake 2: Re-stating style rules.

❌ Before: "Use 4 spaces, single quotes, trailing commas, max line length 120…"
✅ After: delete it. Run ./vendor/bin/pint before committing. The linter is the source of truth.

Mistake 3: Task-specific noise.

❌ Before: "When building the new export feature, use streaming and chunk at 1000 rows…"
✅ After: delete it. That belongs in the prompt or a plan doc, not in every session's memory.

Here's the counterintuitive part: the worst CLAUDE.md files I've seen aren't the empty ones; they're the thorough ones. A diligent developer documents everything, the file hits 600 lines, and adherence quietly drops because the important rules are now diluted among the trivia. An empty CLAUDE.md costs you nothing. A bloated one actively makes Claude worse. Less really is more here, and it's measurable.

Team CLAUDE.md Workflows: Committing, Code Review, Monorepos

For teams, CLAUDE.md is shared infrastructure: commit it, review changes to it, and treat it as part of the codebase. With 84% of developers now using AI tools, an unmanaged CLAUDE.md means every engineer gets different agent behavior on the same repo (Stack Overflow Developer Survey, 2025). Consistency is the whole point.

The practices that scale to teams:

Commit CLAUDE.md, gitignore CLAUDE.local.md. The shared file is team policy; the local file is personal preference. Keep them separate.
Review CLAUDE.md changes in PRs. A new rule changes how Claude behaves for everyone. That deserves the same scrutiny as a code change, and it catches "rules" that are really one person's opinion.
Monorepos: layer, don't centralize. Put a lean root CLAUDE.md with org-wide conventions, then a focused CLAUDE.md inside each package. Claude Code reads up the tree and merges them, so each team owns its own context without one 1,000-line megafile.
Pair it with skills. Reusable workflows belong in skills, not CLAUDE.md, so see my guide to using skills, which reference CLAUDE.md project context.

Treated as shared infrastructure, CLAUDE.md should be committed to git, reviewed in pull requests, and layered per-package in monorepos so Claude Code merges a lean root file with focused subdirectory files, giving every engineer identical agent behavior on the same codebase, a real concern given 84% of developers now use AI coding tools (Stack Overflow Developer Survey, 2025).

The Three-File CLAUDE.md System: CLAUDE.md / Subagents / Per-Skill Markdown

The most effective Claude Code setups don't cram everything into CLAUDE.md; they split context across three files that load at three different times, a pattern built directly on progressive disclosure (HumanLayer, 2025). This is the architecture that finally fixed my bloat problem: stop asking "what should I add to CLAUDE.md" and start asking "when should this load."

The three tiers:

CLAUDE.md, always loaded. The lean briefing: what, why, how, and boundaries. Stays under ~200 lines because it pays context tax on every single turn.
.claude/agents/*.md, loaded on delegation. Subagent definitions. When you delegate a focused job (a code review, a research sweep), that subagent gets its own context window and its own instructions, so the detail never touches your main session.
SKILL.md (per skill), loaded on demand. A skill's full instructions load only when the skill is invoked. This is where multi-step workflows and their detailed steps belong.

The mental model that unlocked this for me: CLAUDE.md is RAM, subagents and skills are disk. You don't load your entire hard drive into memory on boot; you page in what you need. Treat your always-loaded CLAUDE.md as the precious, expensive tier it actually is, and push everything situational down into files that load lazily. The result is a smaller core file and deeper capability, not a trade-off between them. More on the token economics of getting this wrong in my breakdown of the cost implications of a bloated CLAUDE.md.

Source: Claude Code documentation; progressive-disclosure pattern per HumanLayer, 2025

Frequently Asked Questions

What is a CLAUDE.md file?

A CLAUDE.md file is plain markdown that Claude Code automatically loads at the start of every session, giving the model persistent project memory: commands, architecture, and conventions it can't infer from code. Anthropic's own guidance is to keep it concise (Claude Code docs, 2026), since every line consumes context budget on each turn.

How long should a CLAUDE.md be?

Aim for under ~200 lines; shorter is better. Frontier models reliably follow only about 150–200 instructions, and Claude Code's system prompt already uses roughly 50 of them (HumanLayer, 2025). Beyond a couple hundred lines you risk context rot, where the rules that matter get diluted and adherence quietly drops.

Is CLAUDE.md the same as AGENTS.md?

No, but they do the same job. CLAUDE.md is Claude Code's native memory file; AGENTS.md is the cross-tool open standard backed by OpenAI, Google, Cursor, and others, which passed 20,000 adopting repos by August 2025 (Harness, 2026). If multiple agents touch your repo, symlink one to the other so there's a single source of truth.

How do I create a CLAUDE.md file?

Run /init inside any repo and Claude Code scaffolds a starter file from your project structure, then rewrite it by hand. Structure it around three questions (what the project is, why its parts exist, and how to build and test it) per HumanLayer's guidance (2025), then point to agent_docs/ for anything detailed instead of pasting it inline.

Does a bloated CLAUDE.md hurt performance?

Yes, measurably. Chroma's 2025 study found all 18 frontier models tested, including Claude Opus 4, degrade as context grows, some dropping from 95% to 60% accuracy past a threshold (Chroma, 2025). A long CLAUDE.md spends context budget every turn and buries critical rules through the "lost in the middle" effect, a 30%+ accuracy drop (Morph, 2025).

Should I commit CLAUDE.md to git?

Yes. Commit CLAUDE.md so the whole team gets consistent agent behavior, and gitignore CLAUDE.local.md for personal overrides. Review changes to it in pull requests, since a new rule changes how Claude behaves for every engineer, which matters when 84% of developers now use AI coding tools (Stack Overflow Developer Survey, 2025).

The Bottom Line

CLAUDE.md is the cheapest, highest-leverage configuration in Claude Code, and almost nobody treats it that way. The whole discipline fits in a sentence: write only what's true every session, keep it under ~200 lines, point to detail instead of pasting it, and let linters and hooks do the deterministic work.

If you do nothing else after reading this:

Run /init in your main repo tonight, then cut the result in half.
Move your exact test and build commands to the top. That's the single highest-ROI section.
Delete every style rule your linter already enforces.
Push detail down into agent_docs/, subagents, and skills so the always-loaded file stays lean.

The teams getting real value from Claude Code aren't the ones with the longest CLAUDE.md. They're the ones who treat it like working memory: precious, expensive, and ruthlessly pruned. Open yours, and ask of every line: does Claude need this on every session? If not, it doesn't belong there.

{
"@context": "https://schema.org",
"@graph": [
{
"@type": "BlogPosting",
"headline": "CLAUDE.md Best Practices: The Complete 2026 Guide",
"description": "CLAUDE.md best practices that cut wasted tokens and boost Claude Code accuracy: real examples, the 12 rules I follow daily, and how to beat context rot.",
"datePublished": "2026-06-10",
"dateModified": "2026-06-10",
"author": {
"@type": "Person",
"name": "Nishil Bhave"
},
"image": "https://maketocreate.com/images/generated/claude-md-best-practices-complete-guide-hero-v1-scattered.png",
"url": "https://maketocreate.com/claude-md-best-practices-complete-guide/",
"keywords": ["claude.md best practices", "claude md file", "how to write claude.md", "claude.md examples", "claude agents.md", "claude code claude.md best practices"]
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is a CLAUDE.md file?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A CLAUDE.md file is plain markdown that Claude Code automatically loads at the start of every session, giving the model persistent project memory: commands, architecture, and conventions it can't infer from code. Anthropic's own guidance is to keep it concise, since every line consumes context budget on each turn."
}
},
{
"@type": "Question",
"name": "How long should a CLAUDE.md be?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Aim for under roughly 200 lines; shorter is better. Frontier models reliably follow only about 150 to 200 instructions, and Claude Code's system prompt already uses roughly 50 of them. Beyond a couple hundred lines you risk context rot, where the rules that matter get diluted and adherence quietly drops."
}
},
{
"@type": "Question",
"name": "Is CLAUDE.md the same as AGENTS.md?",
"acceptedAnswer": {
"@type": "Answer",
"text": "No, but they do the same job. CLAUDE.md is Claude Code's native memory file; AGENTS.md is the cross-tool open standard backed by OpenAI, Google, Cursor, and others, which passed 20,000 adopting repos by August 2025. If multiple agents touch your repo, symlink one to the other so there's a single source of truth."
}
},
{
"@type": "Question",
"name": "How do I create a CLAUDE.md file?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Run /init inside any repo and Claude Code scaffolds a starter file from your project structure, then rewrite it by hand. Structure it around three questions (what the project is, why its parts exist, and how to build and test it), then point to agent_docs/ for anything detailed instead of pasting it inline."
}
},
{
"@type": "Question",
"name": "Does a bloated CLAUDE.md hurt performance?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes, measurably. Chroma's 2025 study found all 18 frontier models tested, including Claude Opus 4, degrade as context grows, some dropping from 95% to 60% accuracy past a threshold. A long CLAUDE.md spends context budget every turn and buries critical rules through the lost-in-the-middle effect, a 30%+ accuracy drop."
}
},
{
"@type": "Question",
"name": "Should I commit CLAUDE.md to git?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes. Commit CLAUDE.md so the whole team gets consistent agent behavior, and gitignore CLAUDE.local.md for personal overrides. Review changes to it in pull requests, since a new rule changes how Claude behaves for every engineer, which matters when 84% of developers now use AI coding tools."
}
}
]
}
]
}

SEO vs GEO vs AEO: Why They Need Different Strategies

Nishil Bhave — Thu, 28 May 2026 19:14:37 +0000

SEO vs GEO vs AEO in 2026: Why They Need Different Strategies

One maketocreate.com post earned 86 AI citations in three months and only 5 Google clicks in the same window. Same URL. Same article. Same author. That is the SEO vs GEO problem in 2026, and it proves that AI citations, answer boxes, and Google rankings don't reward the same content shape.

I pulled this from Bing Webmaster Tools AI Performance and Google Search Console for 32 published posts, covering March through May 2026. The site earned 266 AI citations in three months, but those citations didn't map neatly to Google clicks.

cross-border payment gateway comparison that earned 86 AI citations

Key Takeaways

One post earned 86 AI citations but only 5 Google clicks, based on maketocreate.com first-party data from March-May 2026.

SEO, GEO, and AEO share a 70% overlap zone, but the remaining 30% changes what you write, refresh, and measure.

Use 60% of effort on Google SEO, 30% on AI citation maintenance, and 10% on cleanup until your data says otherwise.

Don't merge Google and AI metrics into one score. They answer different questions.

The 86 vs 5 Problem

Bing Webmaster Tools recorded 86 AI citations for my international payment gateway guide, while Google Search Console showed 5 clicks and 1,300 impressions for the same URL in March-May 2026. The answer is simple: AI systems cited the page as a useful source, while Google still treated it as a low-ranking organic result.

That single gap changed how I think about SEO vs GEO. Before seeing the data, I assumed AI search would mostly mirror Google. Strong Google pages would get cited. Weak Google pages would disappear. The first-party data said otherwise.

The payment gateway article ranked between positions 50 and 80 for many Google queries. That is usually dead traffic. But Bing AI surfaced it for grounding queries like "leading APIs for cross-border payments 2026," where one query family produced 16 citations.

Citation capsule: From March-May 2026, maketocreate.com earned 266 AI citations across 32 posts. The top post, an international payment gateway comparison, earned 86 AI citations but only 5 Google clicks, according to Bing Webmaster Tools AI Performance and Google Search Console.

Why did that happen? The post had traits AI systems like: a dated comparison, vendor categories, concrete selection criteria, and answer-shaped sections. Google still cared about domain strength, backlinks, historical authority, and SERP competition. Both systems saw the same page. They valued different signals.

What Do SEO, GEO, and AEO Actually Mean?

The 32-post dataset produced 266 AI citations in three months, which is enough to separate the terms without turning this into a glossary. SEO earns rankings and clicks from search engines. GEO earns citations inside generative answers. AEO earns placement in direct answer surfaces where the user may never click.

SEO is the slow compounding channel: crawlability, topical authority, intent match, links, and click-worthy snippets. GEO asks whether an AI system can understand, trust, extract, and cite your passage. AEO sits at the answer layer: AI Overviews, featured snippets, voice assistants, and chatbot-style search.

JavaScript vs TypeScript 2026 comparison

Citation capsule: SEO optimizes for ranked results and clicks. GEO optimizes for citations inside generated answers. AEO optimizes for direct answer extraction. In maketocreate.com's March-May 2026 data, these surfaces diverged enough that one URL produced 86 AI citations and only 5 Google clicks.

The overlap is real, but the goals aren't identical. Treat GEO as "SEO with AI sprinkled on top" and you'll miss pages that AI systems cite before Google rewards them.

How Do SEO, GEO, and AEO Differ?

Ahrefs analyzed 300,000 keywords and found that AI Overviews correlated with a 34.5% lower click-through rate for top-ranking pages (Ahrefs, 2025). That matters because the SEO vs GEO split isn't only about visibility. It's about whether visibility turns into traffic, citations, or answers.

Dimension	Google SEO	AI Citations (ChatGPT/Copilot/Perplexity)
What it rewards	Topical authority, intent match, backlinks, helpful content, page quality	Clear extractable answers, named entities, fresh data, comparison structure, quotable claims
Authority signal	Links, brand, domain history, author signals, topical clusters	Source clarity, factual density, citations, recency, repeated entity confidence
Recency	Helpful, but not always decisive outside news and fast-changing topics	Often decisive, especially when the query includes current-year intent
Competition	SERP competitors with indexed pages and link profiles	Any source the model can retrieve, summarize, or ground against
Winners	Durable pillar pages, clusters, tools, original research	Fresh comparisons, lists, benchmarks, direct definitions, statistics pages
Click value	High when the user clicks, browses, and converts	Indirect: brand mention, cited authority, assisted discovery
Time to results	Slow, often 3-12 months on a young domain	Faster for highly structured fresh posts, but less predictable
Defensibility	Strong if you build authority and clusters	Weaker unless you keep pages current and citation-worthy

The mistake is assuming "AI search" is one surface. It isn't. Google AI Overviews may still borrow from Google's index. ChatGPT Search, Perplexity, and Copilot behave more like answer assemblers. AEO cares about answer shape. GEO cares about citation usefulness. SEO cares about rankings and clicks.

Citation capsule: Ahrefs found that AI Overviews correlated with a 34.5% lower CTR for top-ranking pages across 300,000 keywords. That makes AI visibility a separate measurement problem, because being present in an answer can reduce clicks while still increasing brand exposure.

So when someone asks for aeo vs seo vs geo, the clean answer is this: SEO gets ranked results, GEO gets generated citations, and AEO gets answer extraction.

How Does Answer Engine Optimization Differ From Traditional SEO?

Pew Research Center analyzed 68,879 Google searches from 900 U.S. adults and found users clicked a result on 8% of pages with AI summaries versus 15% without them (Pew Research Center, 2025). Answer engine optimization vs traditional SEO starts with that behavior change.

Traditional SEO assumes the page is the destination. AEO assumes the answer may be the destination. That changes the writing pattern: the first 40-60 words under a heading should answer the question without needing the rest of the article.

For answer engine optimization best practices, I now write each major section like a source card. The heading asks the question. The first paragraph answers it. The next paragraph adds evidence. Then I include a comparison, checklist, or table so the answer has structure.

Citation capsule: Pew Research Center found that Google users clicked a traditional result on 8% of pages with AI summaries, compared with 15% on pages without summaries. AEO responds to that shift by optimizing for answer extraction, not only for post-click reading.

This doesn't mean AEO replaces SEO. It means answer engine optimization techniques in 2026 need a different unit of work: the reusable passage.

Real Data: 86 AI Citations, 5 Google Clicks

Across 32 maketocreate.com posts, Bing Webmaster Tools recorded 266 AI citations in March-May 2026, while Google Search Console showed that the highest-cited post got only 5 Google clicks. This is the clearest evidence I have that seo vs aeo vs geo isn't a naming debate. It's a measurement split.

The top two AI-cited posts produced 133 of the site's 266 citations, or 51% of the total. One was a payment gateway guide. The other was a JavaScript vs TypeScript comparison. Both were current-year, comparison-heavy, and easy to quote.

Supabase vs Firebase comparison

The pattern repeated in smaller numbers. Supabase vs Firebase earned 8 citations. Replit vs Bolt vs Lovable earned 8 citations. Comparison posts gave AI systems clean pairs and categories to cite.

Citation capsule: In maketocreate.com's first-party data, two posts generated 51% of all AI citations. Both used comparison formats and 2026 framing, suggesting that fresh decision-support content can outperform broader essays in generative answer systems.

The 70% Overlap Zone: Tactics That Serve Both

Google says AI Overviews are available in more than 200 countries and territories and more than 40 languages, and drive over 10% more usage for query types that show them in major markets (Google, 2025). That scale makes overlap valuable. You don't want two completely separate content machines.

Tactic	Helps Google	Helps AI
Direct answer in first 50 words	Improves intent match and snippet eligibility	Gives models a clean passage to extract
Comparison tables	Increases dwell time and SERP usefulness	Creates structured facts for citation
FAQ schema	Clarifies question coverage	Maps questions to concise answers
Sourced statistics	Builds trust and E-E-A-T	Gives models verifiable claims
Clear H2/H3 hierarchy	Helps crawlers and readers parse depth	Helps retrieval systems segment answers
Long-form depth	Builds topical coverage	Supplies enough context for grounding
Internal links in cluster	Passes authority and context	Reinforces entity relationships

This 70% overlap zone is where most founders should start. If a tactic helps Google and AI at the same time, do it by default. Direct answers, tables, citations, FAQs, and internal links are good publishing hygiene.

When I refreshed older posts, I rewrote headings as questions, added answer-first openings, cleaned up tables, and marked stronger internal links. The same edits made the pages easier for humans to scan.

Citation capsule: Google reports that AI Overviews now span 200+ countries and 40+ languages, with over 10% usage growth for query types where they appear in major markets. That makes overlap tactics valuable because one well-structured article can serve search engines, AI answers, and human readers.

I don't split SEO vs GEO into two teams of work. I split it into shared work first, then divergence work second.

The 30% Divergence Zone: Where You Have to Choose

Pew also found that only 1% of visits to Google pages with AI summaries resulted in clicks on links cited inside the summary (Pew Research Center, 2025, cited above). That is where the 30% divergence starts. A tactic can help AI citations without sending meaningful traffic.

AI-only optimization	Google-only optimization
"2026" stuffed into titles	Backlink outreach
Listicle format	Long-tail keyword targeting
Bullet-point claims	E-E-A-T author bio
Quote-ready summary boxes	Page experience
Freshness updates	Topic cluster architecture
Distribution to Dev.to+Reddit	Domain authority building

The AI-only column moves faster. A current-year comparison post can get cited before it ranks. The Google-only column is slower, but more defensible. Backlinks, clusters, author trust, and domain authority still compound.

Citation capsule: The divergence zone is where SEO and GEO incentives conflict. AI citations reward freshness, concise claims, and easy extraction. Google SEO still rewards authority, clusters, backlinks, and durable page quality. A 2026 content plan needs both, but not in equal amounts.

How Does AI Change the Future of SEO in 2026?

ChatGPT reached 800 million weekly active users in October 2025, according to OpenAI CEO Sam Altman (TechCrunch, 2025). AI and the future of SEO is no longer about whether people use AI. They do. The question is which surfaces create business value.

Google is not going away. For SaaS founders and technical bloggers, it still owns high-intent discovery, product comparisons, documentation queries, and bottom-funnel searches. But AI surfaces now sit beside it, sometimes before it, and sometimes on top of it.

At Google I/O 2026 on May 19, Google made Gemini 3.5 Flash the global default model for AI Mode, confirmed AI Mode had crossed one billion monthly users, and rolled ads directly into AI Overview responses (Google, 2026). Classic blue links still appear below AI summaries, but the default search experience for over a billion users is now AI-first. The 86-vs-5 gap I tracked on maketocreate.com is no longer an edge case — it's the shape of the new front page.

AI search doesn't kill SEO. It splits SEO into jobs: ranked traffic, cited authority, and answer extraction. If your strategy treats them as one job, your dashboard will lie.

AI coding tool comparison

Citation capsule: At Google I/O 2026, Google made Gemini 3.5 Flash the default model powering AI Mode globally, with AI Mode crossing one billion monthly users and ads appearing inside AI Overview responses. Combined with ChatGPT's 800 million weekly active users in 2025, that means SEO strategy in 2026 has to account for ranked results, generative citations, and answer extraction as three separate surfaces.

The practical shift is not "publish more." It is publish with clearer intent. Is the article meant to rank, get cited, answer a definition, or support a cluster?

The 60/30/10 Framework: How to Allocate Effort

In my 32-post sample, 51% of AI citations came from two posts, while Google traffic remained thin across the same young site. That concentration is why my default allocation is 60% Google SEO, 30% AI citation maintenance, and 10% cleanup. Google compounds slowly. AI citations spike around winners.

The 60% bucket is cluster-based Google SEO: durable pages, internal links, topic hubs, and authority. The 30% bucket is AI citation maintenance: refresh winners, update comparisons, and tighten answer-first sections. The 10% bucket is cleanup: stale claims, thin sections, and off-topic drafts.

Citation capsule: The 60/30/10 framework allocates 60% of effort to Google SEO, 30% to AI citation maintenance, and 10% to cleanup. It fits young sites because Google authority compounds slowly while AI citations often concentrate around a small number of fresh comparison winners.

How Do You Measure SEO, GEO, and AEO Success Separately?

Google Search Console showed 5 clicks and 1,300 impressions for the post that Bing AI Performance credited with 86 AI citations. That is why how to measure success of generative engine optimization needs its own scorecard. A single "search performance" dashboard hides the signal.

For Google SEO, use clicks, impressions, average position, query growth, page CTR, and indexed pages. For GEO, use AI citations, cited pages, cited queries, grounding queries, and citation concentration by URL. For AEO, track snippets, People Also Ask coverage, AI Overview presence, and brand mentions in answer engines.

Citation capsule: One maketocreate.com URL produced 86 AI citations and 5 Google clicks in the same three-month window. That gap makes GEO success impossible to measure with Google Search Console alone. AI citations, cited pages, and grounding queries need separate tracking.

What Are the AEO Best Practices for 2026?

Ahrefs found that 99.2% of keywords triggering AI Overviews were informational in its dataset (Ahrefs, 2025). That makes AEO best practices for 2026 fairly concrete: answer informational questions with short, sourced, reusable sections before you ask the reader to follow your argument.

Here are the practices I now use:

Open every major H2 with a direct answer in 40-60 words.
Put a named source and date near each important claim.
Use tables for comparisons, not long prose.
Add FAQ questions that match real search wording.
Keep definitions brief unless the query is definitional.
Refresh current-year posts when the data changes.
Write summary boxes that can stand alone.

These are not tricks. They're packaging decisions. If an answer engine has to choose between a vague paragraph and a clear sourced answer, the clear sourced answer wins more often.

Citation capsule: Because Ahrefs found that 99.2% of AI Overview-triggering keywords were informational, AEO best practices should prioritize direct answers, sourced statistics, FAQ coverage, and tables. These formats help answer engines extract a clean response without losing attribution.

Which GEO Strategies Actually Work?

Maketocreate.com's "vs" posts repeatedly earned AI citations: JavaScript vs TypeScript got 47, Supabase vs Firebase got 8, and Replit vs Bolt vs Lovable got 8 in March-May 2026. The GEO strategies that actually work are the ones that make citation easier than summarization from scratch.

Start with comparison content because AI systems answer "which one should I use?" questions constantly. Use current-year framing when the topic changes fast. Write quotable claims like "Top 2 posts produced 51% of all AI citations." Distribute selectively through Dev.to, Reddit, Hacker News, and niche communities.

Claude Code MCP server setup

Citation capsule: In maketocreate.com's data, comparison posts outperformed as AI citation sources. JavaScript vs TypeScript earned 47 citations, while Supabase vs Firebase and Replit vs Bolt vs Lovable each earned 8. GEO strategy should prioritize fresh comparisons, quotable claims, and structured tradeoff tables.

What I Do Each Week

The site earned 266 AI citations across 32 posts in three months, but 51% came from two URLs. My weekly routine follows that concentration. I don't optimize every page equally. I protect the winners, build the cluster, and clean up anything that weakens topical focus.

On Monday, I check Google Search Console for rising impressions with low CTR. On Tuesday, I check Bing AI Performance for cited pages and grounding queries. If a page is cited, I update that section with fresher stats and clearer tables.

On Wednesday, I publish or outline cluster content. For the Claude Code cluster, that means linking new tutorials and comparisons back to stronger pillar pages.

Claude Code errors guide

On Thursday, I refresh one winner. I update dated claims, sharpen answer-first openings, and add one or two citation capsules. On Friday, I clean broken links, thin sections, stale claims, and off-topic drafts.

Citation capsule: A practical SEO, GEO, and AEO routine separates weekly checks by surface: GSC for clicks and impressions, Bing AI Performance for citations and grounding queries, and editorial cleanup for stale or off-topic content. This prevents one metric from hiding another.

Frequently Asked Questions

What is generative engine optimization?

Generative engine optimization is the practice of making content easy for AI systems to retrieve, trust, summarize, and cite. In my maketocreate.com data, 32 posts earned 266 AI citations in three months, which gave me a separate signal from Google clicks.

What is answer engine optimization?

Answer engine optimization is the practice of formatting content so answer surfaces can extract it cleanly. Pew found users clicked traditional links on 8% of Google pages with AI summaries versus 15% without them, so the answer itself now carries more value.

Is GEO different from SEO?

Yes. GEO and SEO overlap, but they optimize for different outcomes. My payment gateway post earned 86 AI citations and only 5 Google clicks in the same three-month window, which shows citation value can appear before search traffic.

Does AI Overviews use SEO?

Partly. Google AI Overviews still draw from web content, and Google says the feature is available in 200+ countries and 40+ languages. But AI Overview visibility doesn't equal organic ranking visibility, and Google Search Console doesn't isolate AI Overview clicks cleanly.

How does AI choose what to cite?

AI systems tend to cite content that is clear, current, well-structured, and useful for grounding an answer. In my dataset, the top two posts produced 51% of all AI citations, and both were fresh comparison-style articles with concrete decision criteria.

Does ChatGPT replace SEO?

No. ChatGPT reached 800 million weekly active users in 2025, but Google still drives high-intent discovery and conversion traffic. The better framing is not replacement. It is separate surfaces: SEO for ranked clicks, GEO for citations, and AEO for direct answers.

Do This Next

The 86-vs-5 gap in my May 2026 data points to the next action: don't rename your SEO strategy as GEO. Pull two reports: Google Search Console pages by clicks, and Bing Webmaster Tools AI Performance pages by citations. Put them side by side.

If a page gets Google clicks, strengthen its cluster and conversion path. If a page gets AI citations, refresh its data and make its best passages easier to quote.

Claude Code pricing analysis

That is the operating model: 60% compounding SEO, 30% citation maintenance, 10% cleanup. The split isn't perfect. It's honest about what the data now shows.

{
"@context": "https://schema.org",
"@graph": [
{
"@type": "BlogPosting",
"headline": "SEO vs GEO vs AEO: Why They Need Different Strategies",
"description": "SEO vs GEO vs AEO: I tracked 32 posts across Google and AI in 2026. One earned 86 AI citations but only 5 Google clicks. Here's the 60/30/10 framework.",
"datePublished": "2026-05-19",
"dateModified": "2026-05-29",
"author": {
"@type": "Person",
"name": "Nishil Bhave",
"url": "https://maketocreate.com/about/"
},
"image": "https://maketocreate.com/images/seo-vs-geo-vs-aeo-2026-og.png",
"url": "https://maketocreate.com/seo-vs-geo-vs-aeo-2026/",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://maketocreate.com/seo-vs-geo-vs-aeo-2026/"
}
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is generative engine optimization?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Generative engine optimization is the practice of making content easy for AI systems to retrieve, trust, summarize, and cite. In my maketocreate.com data, 32 posts earned 266 AI citations in three months, which gave me a separate signal from Google clicks."
}
},
{
"@type": "Question",
"name": "What is answer engine optimization?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Answer engine optimization is the practice of formatting content so answer surfaces can extract it cleanly. Pew found users clicked traditional links on 8% of Google pages with AI summaries versus 15% without them, so the answer itself now carries more value."
}
},
{
"@type": "Question",
"name": "Is GEO different from SEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes. GEO and SEO overlap, but they optimize for different outcomes. My payment gateway post earned 86 AI citations and only 5 Google clicks in the same three-month window, which shows citation value can appear before search traffic."
}
},
{
"@type": "Question",
"name": "Does AI Overviews use SEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Partly. Google AI Overviews still draw from web content, and Google says the feature is available in 200+ countries and 40+ languages. But AI Overview visibility doesn't equal organic ranking visibility, and Google Search Console doesn't isolate AI Overview clicks cleanly."
}
},
{
"@type": "Question",
"name": "How does AI choose what to cite?",
"acceptedAnswer": {
"@type": "Answer",
"text": "AI systems tend to cite content that is clear, current, well-structured, and useful for grounding an answer. In my dataset, the top two posts produced 51% of all AI citations, and both were fresh comparison-style articles with concrete decision criteria."
}
},
{
"@type": "Question",
"name": "Does ChatGPT replace SEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "No. ChatGPT reached 800 million weekly active users in 2025, but Google still drives high-intent discovery and conversion traffic. The better framing is not replacement. It is separate surfaces: SEO for ranked clicks, GEO for citations, and AEO for direct answers."
}
}
]
}
]
}

Sequential Thinking in Claude Code: A Practical MCP Guide

Nishil Bhave — Wed, 27 May 2026 14:15:58 +0000

What the Sequential Thinking MCP Server Is For

Anthropic's official @modelcontextprotocol/server-sequential-thinking package shipped as one of the original reference MCP servers and is still maintained by the MCP team today (npm, v2025.12.18). It's one of the most-recommended servers in every Claude Code setup guide, and it's also one of the most misused: installed once, forgotten, and quietly burning tokens on tasks it has no business touching.

I've had sequential-thinking in my Claude Code config for nine months. This is the working guide: what the server actually does at the protocol level, the prompts that reliably invoke it, the tasks where it earns its slot, and the ones where it's pure latency tax.

the broader question of when MCP servers belong in your Claude Code loop vs when a Skill does the job. If you haven't set up MCP at all yet, start with the complete MCP configuration playbook — sequential-thinking only makes sense once the scope hierarchy and config file basics are in place.

Key Takeaways

Sequential thinking is an external MCP tool Claude calls during the agent loop, not an internal reasoning mode. It exposes thought, nextThoughtNeeded, isRevision, and branchFromThought parameters so the model can revise and branch its own chain (MCP servers/sequentialthinking, 2025).

On Opus 4.7, manual extended thinking now returns a 400 error. Adaptive thinking is the only built-in option, which makes sequential-thinking MCP one of the few ways to get explicit, inspectable reasoning back (Anthropic platform docs, 2026).

It earns its slot on debugging, architecture decisions, and multi-step planning. It loses on simple edits, renames, and any task the model would have solved in one shot. Install it, but learn when to invoke it explicitly.

What Does the Sequential Thinking MCP Server Actually Do?

The sequential-thinking server exposes exactly one tool (also called sequential_thinking) that Claude can call during a session to record an explicit, revisable chain of thoughts (MCP servers/sequentialthinking README, 2025). Each call writes one numbered thought to a per-session ledger; the model decides when it's done by setting nextThoughtNeeded: false. That's the whole protocol.

The tool spec, verbatim from the official README, takes nine parameters:

{
  "thought": "Current thinking step (any string)",
  "nextThoughtNeeded": true,
  "thoughtNumber": 1,
  "totalThoughts": 5,
  "isRevision": false,
  "revisesThought": null,
  "branchFromThought": null,
  "branchId": null,
  "needsMoreThoughts": false
}

The interesting parameters are the last four. isRevision plus revisesThought lets the model say "thought 3 was wrong, here's the corrected version." branchFromThought plus branchId lets it explore two alternative approaches in parallel without losing the original. needsMoreThoughts overrides totalThoughts when the model realizes mid-stream that the problem is bigger than it estimated.

None of those exist in default tool-calling or in Anthropic's built-in extended thinking. They're the actual reason to install this server — not the linear chain itself, but the explicit revision and branching primitives.

Most people install sequential-thinking expecting it to make Claude smarter. It doesn't. It makes Claude's reasoning inspectable and revisable, which is a different thing. The smartness comes from the same model; what changes is that the model now has a sanctioned mechanism to walk back a wrong assumption mid-task instead of doubling down on it.

why exposing intermediate reasoning to the agent loop changes what context engineering can do

How Do You Install Sequential Thinking in Claude Code?

One command, from any directory:

claude mcp add sequential-thinking -- npx -y @modelcontextprotocol/server-sequential-thinking

That's it. The server runs on demand via npx, so there's no daemon to babysit. The Claude Code docs cover three scopes for this command: local (you, this project), project (.mcp.json checked into the repo), and user (every project on this machine), picked via --scope (Claude Code MCP docs, 2026). For a server this lightweight I keep it at user scope.

If you prefer editing JSON directly, the equivalent block in ~/.claude.json is:

{
  "mcpServers": {
    "sequential-thinking": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
    }
  }
}

Verify the install three ways. From the shell, claude mcp list should show sequential-thinking in the green-checkmark column. Inside a Claude Code session, /mcp prints live connection status and tool inventory; you should see the sequential_thinking tool listed. And in a session itself, "think through this step by step using sequential thinking" will trigger a call you can watch in the transcript.

One environment variable worth knowing: DISABLE_THOUGHT_LOGGING=true silences the formatted thought output in your terminal but keeps the protocol working. I leave it on for long agent runs where the thought stream is noise; I turn it off when I'm debugging the reasoning itself.

If you don't want npx in your boot path, the same package ships as a Docker image:

docker run --rm -i mcp/sequentialthinking

Wire that to your MCP config with "command": "docker" and the run --rm -i mcp/sequentialthinking args. Slower cold-start, no Node toolchain required.

routing MCP servers across multiple Claude Code projects without re-installing each time

Sequential Thinking MCP vs Claude's Extended Thinking — Which One Wins?

They're not the same mechanism and they shouldn't be compared as if they were. Anthropic's extended thinking runs inside the model at the API layer. The model emits thinking content blocks before its visible response, and those tokens are billed as output tokens regardless of whether the client displays them (Anthropic platform docs, 2026). Sequential thinking is an external tool the model calls during the agent loop, with all the latency and per-call overhead of any other MCP tool.

The deeper change is on Opus 4.7. Manual extended-thinking parameters now return HTTP 400. Adaptive thinking is the only built-in option on the latest Opus and Sonnet 4.6 (Anthropic platform docs: Adaptive thinking, 2026). The model decides how deep to think per request, and you don't see the trace. If you want inspectable, revisable, branchable reasoning back, sequential-thinking MCP is one of the few ways to get it.

The practical decision is simpler than the theory. Adaptive thinking is on by default and you can't see it, so for any hard task you want to inspect the reasoning trail of — debugging, architecture, anything multi-step — sequential thinking gives you a tool you can read after the fact. For tasks where you just want the answer and don't care how Claude got there, adaptive thinking wins on latency.

the token accounting for thinking tokens vs tool-call tokens in Claude Code billing

When Does Sequential Thinking Earn Its Slot in the Loop?

Three task shapes pay back the latency overhead consistently. They share a feature: the failure mode is not "wrong syntax" but "wrong direction." Sequential thinking buys you a chance to detect the wrong direction before the model commits 8,000 tokens to coding it up.

1. Bug hunts where the symptom is far from the cause. A failing test that's actually upstream of a stale lock file. A 500 that's really a CORS preflight from a sibling service. The kind of bug where the first hypothesis is almost never right. Sequential thinking lets the model state "Hypothesis 1: stale cache" and then mark it isRevision once it sees evidence pointing elsewhere, instead of writing a fix for hypothesis 1 and only noticing after running tests.

2. Architecture decisions with non-obvious trade-offs. "Should this be a worker or a serverless function?" "Where does the rate limiter live?" These are decisions where the right answer depends on five constraints the model needs to surface one by one. The branchFromThought parameter is purpose-built for this — "Branch A: worker. Branch B: function. Compare on cold-start, on cost ceiling, on observability."

3. Multi-step planning where step N depends on step N-1. Migration scripts, refactors that touch 20 files, anything where you need a written plan before any code. Sequential thinking forces the plan to live in a readable ledger; without it, the plan exists only in the model's hidden adaptive-thinking trace and you can't audit it.

Practitioners who've run the same workflow for months report material wins on the first two categories. Rob Marshall, writing on robertmarshall.dev, reports a "60–70% reduction on complex features, fewer bugs, better patterns, and consistent architecture" after a month of Claude Code with sequential thinking in the loop (Rob Marshall, 2025). Luis Gallardo, writing about Cursor-to-Claude-Code migration, notes that "Claude Code solved the same problem in one or two runs" where Cursor "would cycle through planning, implementing, and troubleshooting repeatedly" (lgallardo.com, Jul 2025). Mapping those reports against my own task log, here's roughly where the payoff lands by task type. Treat the numbers as a directional estimate, not a measured benchmark.

Where it earned its keep for me: Last March I had a Next.js 16 build that was failing intermittently: passes locally, fails on CI, passes again on rerun. Without sequential thinking, Claude's first move was to "fix" the Tailwind config. With sequential thinking explicitly invoked, the first thought was "This is a non-determinism symptom; the config is unlikely to be the cause." Five thoughts later it was on the actual culprit: a race between next build and an instrumentation hook firing twice. Two thoughts in the ledger were marked isRevision. That session would have cost me an evening; it cost me twelve minutes.

the broader Claude Code troubleshooting catalog this pattern slots into

When Does Sequential Thinking Hurt More Than It Helps?

The honest answer: most of the time you invoke it for a task that doesn't need it.

Three categories where sequential thinking is pure overhead. Renames and refactors of a single function. The model already knows the rename; the tool call adds two round-trips for a problem that didn't have any branching to do. Documentation writing. The thought ledger competes for attention with the prose you're trying to produce, and adaptive thinking already handles this category fine. Quick file edits driven by a clear instruction, such as "add a try/catch around line 42" or "swap the parameter order in this function." There's nothing to revise. Just do it.

The token math is unforgiving here. A trivial edit that should be one tool call (Edit) becomes 4–6 calls once sequential thinking is in the mix. The package itself has no per-call cost, but Claude pays standard input/output token charges per round-trip. On Sonnet 4.6's $3-in / $15-out pricing, a five-thought ledger on a thirty-second task adds maybe $0.04. Not catastrophic, but if it happens 200 times a day across a team it's $8/day for thinking on tasks that didn't need it.

The right mental model is "sequential thinking has a fixed cost per invocation and a variable payoff depending on task complexity." The payoff curve is steep. Hard tasks pay back ten times the overhead; easy tasks pay back zero.

From my own logs: Across 1,200 Claude Code sessions in the last three months, sequential-thinking calls fired in 38% of sessions. Of those, roughly 22% (about 8% of all sessions) accounted for tasks where I judged the call clearly load-bearing. The other 16% were the model invoking it on tasks that didn't need it — adaptive thinking would have produced the same answer faster. That's not a problem with the server; it's a prompting problem.

Which Prompting Patterns Actually Trigger It?

The model invokes sequential thinking opportunistically based on the task description and what's available in the tool inventory. You can nudge it deliberately with prompts that pattern-match to its training on "structured reasoning" tasks.

Three patterns I've tested for several months and watch consistently fire the tool:

Pattern A — explicit invocation. "Think through this step by step using sequential thinking. Revise if you find evidence against an earlier step." This is the cheap one. It works almost always on Sonnet 4.6 and Opus 4.7. Use it when you've already decided the task is complex enough to deserve it.

Pattern B — hypothesis framing. "List your top three hypotheses, rank them, and as you investigate, mark any hypothesis that gets ruled out." The "rank and rule out" language is the trigger; the model reaches for the isRevision parameter naturally because the prompt has set up a refutation loop.

Pattern C — branching for trade-offs. "Compare approach A and approach B on cold-start, cost, and observability. Use branches if you want to develop each independently before recommending one." The word "branches" is doing work here — the model treats it as a hint that branchFromThought is the right primitive.

What doesn't reliably work: vague calls for "deep thinking" or "extended thinking" without describing the structure. The model has a strong prior that those phrases mean adaptive thinking, not the MCP server. If you want the MCP server, name the structure: revise, rank, branch, rule out.

There's also a system-prompt-level pattern worth knowing. In a CLAUDE.md at the repo root, this clause significantly increases sensible invocation rates on hard tasks:

## Reasoning
When tackling debugging, architecture, or multi-step planning,
invoke sequential-thinking with at least 3 thoughts and revise
any earlier thought that turns out to be wrong. For simple
edits, do not invoke sequential-thinking.

The negative half of that clause — "do not invoke for simple edits" — is the part most setups miss. Without it the model errs toward invoking on everything once it sees the tool listed.

how sequential thinking composes with subagent dispatch for hard tasks

Before/After: Three Real Tasks Where Sequential Thinking Mattered

Three examples from my own logs, lightly anonymized. Each shows the model's first move with and without sequential thinking in the loop.

Task 1 — Flaky CI build. "Next.js 16 build passes locally, fails 30% of the time on CI, passes on rerun."

Without sequential thinking, first response: "This is likely a stale Tailwind cache. Let me clear it and rebuild." — patched the wrong layer.

With sequential thinking explicitly invoked:

Thought 1: Symptom is non-determinism. Cache issues are deterministic
once the cache is warm. This is unlikely to be a pure cache problem.

Thought 2: Intermittent + only-on-CI suggests a race condition or
an environment-specific timing issue.

Thought 3 (isRevision of 2): Refining — the instrumentation hook in
this repo runs on every Next.js build entry. If it can fire twice
in the CI environment, two scheduler instances would race.

Thought 4: Check `instrumentation.ts` for an idempotency guard.

Thought 5: Confirmed — no guard. The `__schedulerStarted` global is
exactly the pattern needed. Implement and verify.

Five thoughts, one revision, problem solved.

Task 2 — Service refactor decision. "Should the email queue stay as a Bull queue in the same Node process or move to a dedicated worker?"

Without sequential thinking: "Move it to a dedicated worker; it's cleaner architecturally." — true but didn't surface the constraints.

With sequential thinking (Pattern C — branching):

Thought 1: Two approaches to compare. Use branches.

Branch A, thought 1: In-process Bull. Cold-start: instant.
Cost: shared with web tier. Observability: shares the web logger.

Branch B, thought 1: Dedicated worker. Cold-start: 8s on Fly.io.
Cost: $5/mo standing. Observability: separate logger needed.

Branch A, thought 2: Risk — slow email handler blocks event loop.
Mitigated by Bull's own queueing but the worker shares CPU.

Branch B, thought 2: Risk — deploy coupling between web and worker.
Both must redeploy on shared-package change.

Synthesis (thought 3): Pick A until email volume exceeds 50/min
sustained. Below that, the deploy-coupling cost of B outweighs
the event-loop risk of A.

The synthesis thought is the answer; the branches are the receipts.

Task 3 — Twelve-file refactor. "Extract the auth middleware into a separate package; update all callers."

Without sequential thinking: started editing files in alphabetical order, missed two callers in a test fixture, broke CI.

With sequential thinking:

Thought 1: Enumerate callers first. Don't edit until the full list
is in this ledger.

Thought 2: ripgrep finds 14 imports across the repo, including 2 in
__tests__/fixtures.

Thought 3: Group by file type. Production code: 12. Test fixtures: 2.
Order edits so tests run green throughout — fixtures last.

Thought 4: Risk — the package boundary changes the import path.
Update the new package's public exports first, then call sites.

Thought 5: Order locked. Begin.

The pre-flight thought ledger is the deliverable; the actual edits become mechanical.

how to combine sequential thinking with PostToolUse hooks that capture the thought ledger for audit

Do You Actually Need to Install Sequential Thinking?

A contrarian beat that's earned. The reason most Claude Code users get away without sequential thinking is that adaptive thinking on Opus 4.7 and Sonnet 4.6 already handles 80% of what you'd reach for it on. The model is doing internal reasoning regardless; you just can't see it.

Install sequential thinking only if you want inspectable, revisable, branchable reasoning visible in the transcript. That's a real value but a specific one. The use cases are: post-hoc auditing of how an agent reached a decision, situations where you want to interrupt and redirect a long thought chain, and tasks where revision-aware reasoning measurably out-performs single-pass reasoning.

If your Claude Code workflow is mostly "ask, edit, commit" — quick iterations, short sessions, you eyeball the diff — sequential thinking is overhead you won't recover. There's no shame in not installing it. The HN sentiment on this is mixed for a reason; one practitioner notes "it's better than thinking mode [for certain use cases]" (Hacker News id=43681296, 2025), which is exactly the right framing: certain use cases.

The decision rule I now use: install it if you do any of these regularly — debug hard intermittent failures, make architectural decisions in code, run multi-file refactors longer than 30 minutes, audit agent decisions after the fact. Skip it if your sessions are mostly under 5 minutes and your tasks are mostly atomic edits.

Frequently Asked Questions

Does sequential thinking work with Claude Sonnet 4.6 and Haiku 4.5, or only Opus?

Yes to all three. It's an MCP tool, not a model feature — any model that supports MCP tool calling can invoke it. Sonnet 4.6 and Opus 4.7 invoke it most reliably; Haiku 4.5 will use it when explicitly prompted but invokes it less often on its own (Anthropic platform docs, 2026).

How much does sequential thinking cost in tokens compared to extended thinking?

There's no special "thinking token" billing for the MCP server — each thought is a normal tool round-trip billed at the model's input/output rate. Extended thinking, by contrast, bills its thinking tokens as output tokens even when the SDK shows them as "omitted" (Anthropic platform docs, 2026). For a 5-thought session on Sonnet 4.6, expect ~$0.03–$0.06 of overhead.

Can I see the thoughts after the session ends?

Yes — the full thought ledger is in the Claude Code transcript log for the session. The server also accepts DISABLE_THOUGHT_LOGGING=true to suppress its formatted terminal output, but that flag only affects the live display, not the stored transcript (MCP servers/sequentialthinking, 2025).

Will sequential thinking break my existing Claude Code prompts?

No. The model only invokes sequential thinking when the task and your prompt suggest it. Installing it adds one tool to the inventory but changes nothing about how other tools behave. The most common failure mode is *over-*invocation on tasks that don't need it, not regressions on tasks that do.

Is the sequential-thinking server safe to enable at project scope (`.mcp.json`)?

It's safe in the sense that the server only reads/writes its own in-memory thought ledger — it doesn't touch files, network, or shell. The risk is the standard MCP risk: any project-scope server runs on every collaborator's machine when they open the repo. For this server that risk is low; for any server that touches the filesystem, vet the source first (Claude Code MCP docs, 2026).

What to Do With This

Install it once with claude mcp add sequential-thinking -- npx -y @modelcontextprotocol/server-sequential-thinking. Add the CLAUDE.md reasoning clause from the prompting section so the model invokes it on the tasks where it pays back and skips the ones where it doesn't. Then watch the transcript for a week and decide whether you keep it.

The reason to install sequential thinking isn't that Claude reasons badly without it. It's that you want to see and audit the reasoning, and on the latest Opus you can't see the adaptive trace any other way. That's a narrow but real reason. Pretend it's broader and you'll burn tokens on overhead; ignore it entirely and you'll lose a tool that genuinely helps on hard problems.

the broader agentic AI context that makes inspectable reasoning a baseline expectation rather than a nice-to-have

Claude Code vs Codex CLI: 6 Months of Real Daily Use

Nishil Bhave — Mon, 25 May 2026 15:26:37 +0000

Claude Code vs Codex CLI: Six Months of Real Daily Use

Two terminal agents. One slot in your daily driver workflow. I've been running both Claude Code and OpenAI's Codex CLI as primary tools for the last six months — different repos, different stakes, different team setups. They look almost identical from the outside: a CLI, a permission prompt, a model that edits your files. Under the hood, they're not the same product at all.

JetBrains' April 2026 research shows Claude Code adoption at work jumped from roughly 3% (April–June 2025) to 18% by January 2026 — a 6x increase in nine months — and its customer satisfaction score hit 91%, the highest of any coding tool they tracked (JetBrains Research, 2026). Codex CLI grew from 82,000 monthly npm downloads at launch to 14.53 million by March 2026, a 177x increase (gradually.ai, 2026). Both are winning. They're winning for different reasons. This is the honest comparison.

portable code review across both agents

Key Takeaways

Claude Code wins on agentic quality and extensibility (Hooks, Skills, Subagents) but is closed-source and had four CVEs disclosed and patched across 2025–2026 (Check Point Research, 2026).

Codex CLI is Apache 2.0, Rust-native, and ships a stricter default sandbox — better for untrusted repos and pull request review work.

70% of developers run 2–4 AI tools at once (The Pragmatic Engineer, 2026). The right answer is usually both, with one as the daily driver.

Why Are Claude Code and Codex CLI Converging?

Both products believe the same thing: the IDE is a deeply customized editor, and an agent doesn't need to live inside it to be useful. 95% of engineers in the Pragmatic Engineer survey now use AI tools weekly, and 75% report AI handles at least half of their engineering work (The Pragmatic Engineer, 2026). When the agent is doing half the work, the question stops being "which editor extension" and starts being "which process runs my repo."

That's the philosophical convergence. A terminal agent reads your files, runs commands, watches output, and proposes changes. It's a long-running process that owns a working directory. Claude Code shipped this model in February 2025. Codex CLI shipped its first public version in April 2025 and then rewrote the whole thing in Rust by June 2025 — the TypeScript prototype is gone, the repo is now 95.6% Rust with over 75,000 stars and 400 contributors (OpenAI Codex GitHub, 2026).

The convergence isn't surface-level. The daily ritual is genuinely the same: open a terminal in the repo, type a goal, watch a plan appear, approve or deny tool calls, accept the diff. If you blindfolded me and dropped me into either CLI mid-task, I'd need at least thirty seconds to figure out which one I was in. The differences only show up under load — when the agent gets confused, when something fails, when you need to do anything outside the happy path.

why the hook layer matters once you're past the happy path

How Do GPT-5 and Claude Opus 4.7 Actually Behave on Real Codebases?

Among the models powering these two tools, Claude Opus 4.7 posts 87.6% on SWE-bench Verified — ahead of GPT-5.3-Codex at around 85% and the base GPT-5 at 74.9% (Vellum, 2026; LLM-Stats, 2026). That gap is real but it's also misleading — both models are trained on a lot of public SWE-bench-like data, and the benchmark increasingly measures how well a model has memorized the eval set, not how it handles your code.

Here's what I see in practice. On a tangled refactor — say, lifting a service interface out of three coupled controllers in a legacy PHP/Laravel codebase — Claude Opus 4.7 produces a more cautious plan. It asks before touching shared types. It writes a checklist and follows it. It backs out cleanly when I tell it to. GPT-5.3-Codex is faster and bolder. It writes more code per turn, which is great when the code is right and painful when it isn't.

My finding: On a 20-file refactor I ran on the same Laravel repo, Claude Code needed 3 prompts and stopped to confirm 4 times. Codex CLI did it in 1 prompt but introduced two regressions that broke tests in unrelated files. The fix for the regressions took longer than the original task would have on Claude.

That's the consistent pattern. Claude is more conservative, more aligned with "ask first," and recovers from mistakes better. Codex is more aggressive, more willing to refactor adjacent code without asking, and faster on greenfield work. Pragmatic Engineer's 2026 survey reflects this preference split: 46% of engineers named Claude Code as the tool they love most, vs 19% for Cursor and 9% for GitHub Copilot (The Pragmatic Engineer, 2026).

Don't read that as "Codex is bad." Codex didn't exist when the 2025 survey ran, and it's already at 6% with momentum. Read it as "Claude Code has the strongest emotional pull right now, especially for engineers doing focused refactor and debug work."

the multi-model workflow I actually use

What's the Real Difference in Sandboxing and Permissions?

Codex CLI ships with a stricter default. It runs with three sandbox modes — read-only, workspace-write, and danger-full-access — combined with three approval modes (suggest, auto-edit, full-auto) (OpenAI Codex Sandboxing, 2026). The default behavior asks before every write and refuses network calls outside the workspace. Claude Code has five permission modes (default, acceptEdits, plan, dontAsk, bypassPermissions) with file-level and command-level deny rules layered on top (Claude Code Permission Modes, 2026).

The naming is different. The actual capabilities are roughly equivalent. The difference that matters is the default. Codex's default refuses more aggressively. Claude's default trusts more aggressively. Neither is wrong; they reflect different assumptions about who's at the keyboard.

Then there's the security record. Four CVEs were disclosed against Claude Code across 2025–2026: CVE-2025-59536 (RCE via untrusted project config, CVSS 8.7), surfaced by Check Point Research (2026); CVE-2025-54794 (path bypass, CVSS 7.7) and CVE-2025-54795 (command injection, CVSS 8.7), both from Cymulate (2025); and CVE-2025-55284 (DNS exfiltration, CVSS 7.1) from Embrace The Red (2025). Anthropic patched all of them, and the underlying issue — that CLAUDE.md and .mcp.json files in a cloned repo could execute arbitrary shell on startup — is now mitigated. But the lesson is real: cloning a repo and immediately running Claude Code on it is not as safe as the UX makes it feel.

Why this matters: If you're reviewing untrusted pull requests or pulling random GitHub repos to investigate them, Codex's stricter default sandbox is the safer starting point. If you're working in a repo you own, on a machine you trust, with a workflow you've tuned, Claude Code's permission model is more ergonomic.

how MCP server config intersects with Claude Code's permission model

Which One Has the Stronger MCP and Extensibility Story?

Both support the Model Context Protocol. Claude Code shipped MCP first and shaped the spec. Codex CLI added MCP support in 2026 with stdio and Streamable HTTP transports, including OAuth, configured through ~/.codex/config.toml (OpenAI Codex MCP docs, 2026). The MCP ecosystem now has more than 10,000 public servers, and the protocol was donated to the Linux Foundation's Agentic AI Foundation in December 2025 (MCP Bundles, 2026).

So MCP support is no longer a Claude-only advantage. What is still Claude-only: Hooks, Skills, and Subagents.

Hooks intercept tool calls at nine documented lifecycle events (PreToolUse, PostToolUse, UserPromptSubmit, Stop, and others). They run as shell scripts, return exit codes, and let you build deterministic gates the model can't reason its way past.
Skills are reusable prompt + tool bundles installed via npm-style commands. Anthropic shipped them in October 2025 alongside Plugins, and they're how the broader ecosystem (skills.sh, etc.) packages workflows.
Subagents are model-launched workers with their own context windows. You spawn one for research, code review, or exploration, and the parent agent continues without polluting its context.

Codex doesn't have direct equivalents. You can build a lot of the same outcomes with shell wrappers and MCP servers, but you're rebuilding the framework. This is the part of the comparison that gets undersold in most reviews. The extensibility surface isn't a checkbox — it's a multiplier. Once you have a Skill that knows how to ship a feature in your repo, or a Hook that blocks rm -rf regardless of what the model thinks, the productivity gap widens fast.

The trade-off is that this surface is also the attack surface. Three of the four CVEs above exploited Hooks, MCP config files, or project-level instructions. Power and risk on the same axis.

when to reach for a Skill vs an MCP server

How Does the Pricing Math Actually Compare?

Both tools start at the same price. Claude Code is included in Claude Pro at $20/month, with Max 5x at $100/month (5x the Pro rate limits) and Max 20x at $200/month (20x Pro limits) above that — or pay-per-token through the API (Anthropic Pricing, 2026). Codex CLI is bundled into ChatGPT Plus ($20/month), Pro ($200/month), Business, Enterprise, and Edu plans, plus pay-per-token through the OpenAI API.

Entry price is a wash — $20 either way. What differs is the ladder above it. Here's the math from my own usage:

Claude Pro or ChatGPT Plus ($20/mo): both real entry points, and both throttle hard on long agentic sessions. I burn through either one in roughly 2 hours of serious refactor work.
Claude Max 5x ($100/mo): comfortable for one developer doing 6–8 hours of agent-heavy work a day. I rarely hit limits — and Codex has no equivalent middle tier, so its next step up is $200.
Claude Max 20x or ChatGPT Pro ($200/mo): top tier for both. Max 20x rarely throttles me even on heavy solo days; ChatGPT Pro lifts Codex's ceiling the same way.
API for both: roughly comparable per-token, but Claude Sonnet 4.6 is significantly cheaper than Opus 4.7 for most coding tasks, and you can route between them in the same session.

The honest version: at $20 it's a genuine tie — both rate-limit you on heavy days. Claude's real edge is the $100 Max 5x tier, which has no ChatGPT counterpart and is the sweet spot for full-time agent work. At $200 the two are matched again. I run both — Max 20x for daily-driver work, ChatGPT Plus for the occasional Codex run on something Claude is being weird about.

Feature Matrix: Where Each Tool Genuinely Wins

Capability	Claude Code	Codex CLI
Underlying model	Claude Opus 4.7 / Sonnet 4.6	GPT-5 / GPT-5.3-Codex
SWE-bench Verified	87.6% (Opus 4.7)	~85% (5.3-Codex)
License	Closed-source (npm)	Apache 2.0 (Rust)
Permission modes	5 modes + deny rules	3 approval + 3 sandbox modes
MCP support	Yes (original)	Yes (stdio + Streamable HTTP)
Hooks	Yes (9+ lifecycle events)	No direct equivalent
Skills	Yes (Oct 2025)	No
Subagents	Yes	No
Plugins	Yes	No
IDE extensions	VS Code, JetBrains	VS Code, JetBrains, Cursor, Windsurf
Desktop app	Yes (Mac/Windows), web, CLI	Yes (macOS Feb 2026, Windows Mar 2026)
Entry pricing	$20/mo (Claude Pro)	$20/mo (ChatGPT Plus)
Mid tier	$100/mo (Max 5x)	— (no equivalent)
Top pricing	$200/mo (Max 20x)	$200/mo (ChatGPT Pro)
Recent CVEs	4 patched in early 2026	None publicly disclosed
CSAT	91%	Not publicly reported
GitHub stars	n/a (closed-source)	75K+
Open governance	No	Yes (Apache 2.0, 400 contributors)

The matrix tells you what; it doesn't tell you what to actually do. That's the next section.

Which One Should Be Your Daily Driver in 2026?

Use this framework. The decision isn't "which is better" — it's "which fits the work you do most."

Pick Claude Code as your daily driver if:

You spend most of your time in repos you trust (your own code, your team's code).
You want the strongest agentic quality and recovery behavior on hard tasks.
You'll actually use Hooks, Skills, or Subagents — that extensibility edge is the main reason to pick Claude over a roughly-comparable Codex.
You do enough daily agent work to justify Max 5x ($100), though Pro ($20) is a fine place to start.
You value extensibility over open-source guarantees.

Pick Codex CLI as your daily driver if:

You frequently work on untrusted repos (PR review, OSS triage, security research).
You need open-source guarantees for legal or audit reasons.
You're already paying for ChatGPT Plus or Pro and want to avoid a second subscription.
You prefer GPT-5's faster, more aggressive coding style.
You want Codex's desktop app with built-in parallel agent management.

Run both if:

You're doing 6+ hours of agent-heavy work daily.
You want a second opinion on hard tasks (one agent's stuck plan often unblocks fast in the other).
You work across languages where the models diverge — I find Claude better on PHP/Ruby/Go, Codex slightly stronger on TypeScript/Python/Rust.

The Pragmatic Engineer survey backs this up: 70% of engineers run 2–4 AI tools simultaneously, and 15% run 5 or more (The Pragmatic Engineer, 2026). Treating this as a one-winner question is the wrong frame.

My current setup: Claude Code is the daily driver. Codex CLI is the second opinion. When Claude gets confused on a long task — usually around the 30-minute mark on something architecturally tangled — I'll fork the conversation, paste the state into a fresh Codex session, and see what it does. The disagreement is often more useful than either agent's answer alone.

why running two agents in parallel beats one for hard tasks

Frequently Asked Questions

Is Claude Code's $100 Max tier worth it, or is the $20 Pro plan enough?

Both Claude Code and Codex start at $20/month (Claude Pro and ChatGPT Plus), and both throttle on heavy sessions at that tier. Claude Max 5x ($100/mo) gives roughly 5x Pro rate limits — enough for 6–8 hours of agent-heavy work daily without hitting walls (Anthropic Pricing, 2026). The crossover where Max pays for itself is around 3 hours/day of active agent use; below that, Pro is plenty.

Does Codex CLI support MCP servers?

Yes. Codex CLI added MCP support in 2026 with both stdio and Streamable HTTP (including OAuth) transports, configured in ~/.codex/config.toml (OpenAI Codex MCP, 2026). The MCP ecosystem has more than 10,000 public servers, and most work in both Claude Code and Codex CLI without modification.

Are Claude Code's CVEs a reason to avoid it?

Not really. All four 2026 CVEs were patched within days of disclosure, and the underlying class of bug — trusting project-local config files on startup — has been mitigated (The Hacker News, 2026). The takeaway isn't "Claude is unsafe," it's "don't run any agent on a freshly cloned untrusted repo without sandboxing."

Which model is actually better at coding, GPT-5 or Claude Opus 4.7?

On SWE-bench Verified, Claude Opus 4.7 leads at 87.6% vs GPT-5.3-Codex at ~85% and base GPT-5 at 74.9% (Vellum, 2026). In practice the gap is smaller and task-dependent. Claude is more cautious and recovers better from mistakes; GPT-5 is faster and more aggressive on greenfield work.

Can I use Claude Code on the cheaper Claude Pro plan?

Yes. Claude Code is included in Claude Pro at $20/month — the same entry price as ChatGPT Plus with Codex. Pro's rate limits are tight for heavy agentic work, which is why Max 5x ($100/mo) exists for full-time use. You can also pay per-token through the Anthropic API. There's no free tier for Claude Code itself (Anthropic Pricing, 2026).

Nishil Bhave is a developer and builder who writes about AI tooling, agentic workflows, and the practical realities of shipping with AI. He has been running Claude Code and Codex CLI as primary tools since their respective launches.

Conclusion

If you only have one slot, Claude Code is the daily driver I'd recommend for most engineers in mid-2026 — 91% CSAT and the strongest extensibility story aren't accidents. If you also have $20/month for ChatGPT Plus, add Codex CLI as your second opinion. The cost of running both is rounding-error for any working engineer, and the dual-agent setup beats either one solo on hard tasks.

The terminal-agent paradigm is the new default. Pick the one that fits the work you do most, and don't agonize over the choice — both will be different products by Q4 2026, and the only mistake is staying on the sidelines.

Gemini CLI vs Claude Code: A 2026 Verdict Before the Shutdown

Nishil Bhave — Sat, 23 May 2026 21:25:06 +0000

Gemini CLI vs Claude Code, After Months of Daily Use

Update — May 24, 2026: Google has deprecated the original Gemini CLI. Starting June 18, 2026, it stops serving requests for free personal, Google AI Pro, and Google AI Ultra accounts, and points users to the new Antigravity CLI instead (Google Developers Blog, 2026). This rewrite accounts for that. Short version: the free tier that made Gemini CLI a no-brainer is going away, but the successor is cheaper than Claude Code and now ships Skills, Hooks, and Subagents too. The choice got more interesting, not simpler.

Two terminal agents. One slot in your shell. I've been driving both Gemini CLI and Claude Code in real work — different repos, different stakes, different budgets — for the better part of a year. Most comparison posts you'll find online are either feature-checklist exercises or thinly veiled marketing for one side. This one is neither, and as of late May it has to account for a moving target.

The honest take, before I show my work: until June 18, Gemini CLI's free tier is still the most generous deal in the agent space — 1,000 free model requests per day with a 1M token context window on a personal Google account (Gemini CLI Docs, 2026). After that date, free, Pro, and Ultra users either move to Antigravity CLI or keep Gemini CLI alive with a paid API key. Claude Code charges $20 to $200 a month for a more refined product, and its footing hasn't moved. Which one belongs in your daily shell now depends on a transition almost nobody had priced in a month ago, so this post prices it in.

if you're comparing Claude Code against Codex CLI instead

Key Takeaways

Google deprecated the original Gemini CLI on May 19, 2026. It stops serving free, Pro, and Ultra accounts on June 18, 2026, replaced by the closed-source Antigravity CLI (Google Developers Blog, 2026). The Apache 2.0 repo stays public, but the hosted free quota does not.

Antigravity CLI is a Go rewrite that now ships Skills, Hooks, and async Subagents — the extensibility features that were Claude Code's clearest moat (Agentpedia, 2026). Entry pricing is $20/month Google AI Pro, undercutting Claude's $100 Max 5x.

Claude Code starts at $20/month Pro (serious use realistically $100 Max 5x) and holds a 91% CSAT and 54 NPS in JetBrains' April 2026 survey — the highest of any coding tool tracked (JetBrains Research, 2026).

The honest split for 2026: Claude still wins on raw agentic quality (87.6% SWE-bench Verified). Gemini's successor wins on price and now matches the feature checklist. Most serious daily-driver users I know still run both.

Why Are Gemini CLI and Claude Code Even Comparable?

Both products start from the same bet: the IDE is a deeply customized editor, and an agent does not need to live inside it to do useful work. 95% of engineers in The Pragmatic Engineer's 2026 survey use AI tools weekly, and 75% report AI handles at least half of their engineering work (The Pragmatic Engineer, 2026). When the agent is doing half the work, the right question stops being "which editor extension" and starts being "which process owns my repo."

That's the philosophical convergence. Both CLIs run as long-lived processes that read files, run shell commands, watch output, and propose diffs. Claude Code shipped this paradigm in May 2025. Google launched Gemini CLI on June 25, 2025 as an open-source Apache 2.0 project (Google Blog, 2025). Inside a year, the Gemini CLI repo has grown to roughly 104,000 GitHub stars and 13,700 forks (Gemini CLI GitHub, 2026). That's not a minor side project — it's the largest agent CLI codebase by community footprint.

The daily ritual is genuinely similar. Open terminal, point at repo, type a goal, watch a plan appear, approve tool calls, accept the diff. If you blindfolded me and dropped me into a session mid-refactor, I'd need a minute to identify which CLI I was driving. The differences only surface when the agent gets confused, when something fails, or when you need to push past the happy path — and that's where this comparison actually lives.

Is Gemini CLI Being Shut Down?

Yes — the original Gemini CLI is being deprecated. Google announced on May 19, 2026 that starting June 18, 2026, Gemini CLI and the Gemini Code Assist IDE extensions stop serving requests for Gemini Code Assist for individuals, Google AI Pro, and Google AI Ultra accounts (Google Developers Blog, 2026). It is not being wiped off your machine. The free hosted quota that made it worth installing is what's being switched off for personal accounts.

The headlines flattened this into "Gemini CLI is dead," and that's not quite right. Here's who is actually affected:

Free, Pro, and Ultra users (most readers of this post): After June 18, your existing login stops authorizing the legacy CLI. You either migrate to Antigravity CLI or keep Gemini CLI running by switching to a paid Gemini API key (Google Developers Blog, 2026).
Gemini Code Assist Standard and Enterprise orgs: Nothing changes. Google keeps supporting Gemini CLI and the IDE extensions with the latest models for licensed organizations and for Gemini Code Assist for GitHub (Google Developers Blog, 2026).
The open-source repo: The Gemini CLI codebase stays public under Apache 2.0 with its ~104K GitHub stars intact. What ends is the hosted serving for free tiers, not the project (The Register, 2026).

I ran the migration the week it was announced. The good news: Google published migration docs at antigravity.google/docs/gcli-migration, and my GEMINI.md, MCP configs, and most extensions carried over with minor edits. The annoying news: if you leaned on the free tier across a few machines like I did, June 18 is a hard deadline, not a soft nudge. Plan for it now, not on the 17th.

What's the Best Gemini CLI Alternative?

If you're looking for a Gemini CLI alternative, there are three honest options, ordered from least to most disruption. Antigravity CLI is Google's own successor and the lowest-friction move if you want to stay in the Gemini ecosystem. Claude Code is the alternative most former Gemini CLI users I know are actually testing right now, because it's the most mature paid agent — and it's what the rest of this post compares head to head. Keeping Gemini CLI on a paid Gemini API key works if you specifically need the open-source binary or air-gapped guarantees and don't mind metered billing. What no longer exists is the old 1,000-requests-a-day deal on the legacy CLI — but Antigravity itself keeps a genuinely generous free allotment (gated by your Google account tier), which remains its clearest edge over Claude Code's no-free-tier model.

if Claude Code is your likely landing spot, here's its full pricing and limits breakdown

How Does the Pricing Reality Actually Compare?

Here is where the two products are most obviously different — though that gap is narrowing as the free tier sunsets. Through June 18, 2026, Gemini CLI gives you a 1M token context window and 1,000 free model requests per day at 60 requests per minute on a personal Google account, no credit card required (Gemini CLI Docs, 2026). After that date, as covered above, free personal access moves to Antigravity CLI or a paid API key. Claude Code has no free tier. It starts at the Anthropic Pro plan ($20/month) for limited access and scales through Max 5x ($100/month) and Max 20x ($200/month) for serious daily-driver use (Anthropic Pricing, 2026).

According to Google's developer pricing page, the Gemini 2.5 Pro API costs $1.25 per million input tokens and $10 per million output tokens for context windows under 200K, doubling to $2.50 and $20 above that (Google AI Pricing, 2026). Claude Opus 4.7 sits at $5 per million input and $25 per million output, and Sonnet 4.6 at $3 and $15 (Anthropic API Docs, 2026). Per-token, Gemini is meaningfully cheaper.

But raw per-token pricing isn't what most people actually pay. Here's the real shape of it from my own usage over the last six months:

Free Gemini CLI (until June 18, 2026): For students, side projects, and one-off scripts, it's genuinely free. The 1,000 daily requests cover a surprising amount of work if you avoid burning them on chitchat, and the 1M context window is included with no upcharge. Just watch the clock — this is the exact tier being switched off for personal accounts, after which the equivalent entry point is $19.99/month Google AI Pro on Antigravity CLI.
Google AI Pro ($19.99/month): Lifts daily limits and gives priority access, and post-transition it's the tier that unlocks the full Antigravity platform — desktop app, CLI, and SDK (9to5Google, 2026). I burn through it in maybe four hours of heavy agentic work — fine for moderate use, tight for a serious daily driver.
Google AI Ultra ($249.99/month): Includes $100/month of Google Cloud credits and the highest CLI limits (Gemini Subscriptions, 2026). It's the closest equivalent to Claude Max 20x by intent, but priced higher.
Claude Max 5x ($100/month): Comfortable for one developer doing 6–8 hours of agent-heavy work a day. I rarely hit limits.
Claude Max 20x ($200/month): Same price as ChatGPT Pro and gives effectively unlimited Claude Code for solo work, in my experience.

My finding: Over a 30-day window where I logged session token usage on a single mid-size Laravel + Next.js project, Gemini CLI used roughly 40% more tokens than Claude Code on the same task set — partly because the 1M context tempts you into including everything, and partly because Gemini's responses are more verbose. The free tier still came out ahead on cost, but the "cheaper per token" gap narrows once you account for behavior.

The crossover point is around three hours per day of active agent driving. Below that, Gemini CLI's free tier is unbeatable — at least until June 18, after which the cost-conscious entry point becomes the $19.99 Antigravity tier rather than $0. Above three hours, Claude Max 5x is the better deal for sustained throughput. There's also a hidden cost on the Claude side: on April 21, 2026, Anthropic briefly removed Claude Code from the $20 Pro tier and then reversed the decision within 24 hours (Simon Willison, 2026). The episode reminded a lot of people that the pricing model isn't fully settled.

One more thing worth naming: API access economics differ in ways that aren't obvious from the price list. Anthropic offers cache hits at 10% of the base input price and a 50% Batch API discount (Anthropic API Docs, 2026). Google offers a 50% batch discount on Gemini 2.5 Pro through Vertex AI (Vertex AI Pricing, 2026). If your workload is batch-heavy or repeats similar prompts, the real cost can be half the headline number on both sides. For interactive agent use, the cache hit pricing is what actually drives Claude Code's daily economics down — that's a real piece of why Max 5x works at $100/month.

complete guide to Claude Code pricing and rate limits

Does Gemini's 1M Context Window Actually Matter in Daily Work?

On paper, this is the most decisive Gemini advantage. The 1M token context window is standard across the entire Gemini 2.5 Pro lineup, including the free tier (Google AI for Developers, 2026). Claude finally caught up — Anthropic shipped 1M context windows for Sonnet 4.6 and Opus 4.6 to general availability on March 13, 2026 at standard per-token pricing (Anthropic Context Windows, 2026). But the experience is different because Gemini treats long context as default behavior, while Claude treats it as a feature you opt into.

In practice, three things actually change when you have 1M tokens:

Whole-repo reasoning. I can dump a 200-file mid-size codebase into a single Gemini CLI session without thinking about it. Claude Code with extended context can do this too, but the model attention degrades faster on truly enormous contexts in my testing — the lift-and-shift refactor work where Claude shines tends to live in the 50K–200K context band.
Long-running session memory. Gemini CLI maintains context across long sessions more gracefully. Claude Code is more aggressive about compaction. The trade-off is that Gemini's "remembers everything" mode sometimes drags in stale context that biases the next response, where Claude's tighter context is more deliberate.
Document-heavy work. When the task is "read these 30 PDFs and summarize the differences," Gemini wins outright — both because of context size and because Google's multimodal handling is more native.

Where the 1M context matters less than the marketing suggests: most coding work doesn't actually fit nicely into a 1M context shaped reasoning task. Real refactors live across 5–30 files. Real bug hunts involve targeted reading, not exhaustive ingestion. The 1M token marketing claim is impressive, but the median useful session for both tools uses 30K–80K tokens. That's not a knock on Gemini — it's a reminder that "biggest context" is a noisy proxy for "best agent."

how the underlying memory architectures differ across major AI assistants

How Do the Models Actually Behave on Real Codebases?

Benchmarks first, then experience. Claude Opus 4.7 currently leads SWE-bench Verified at 87.6% (Anthropic News, 2026). Gemini 3.1 Pro, released on February 19, 2026, sits at 80.6% on the same benchmark and 54.2% on the harder SWE-Bench Pro (DeepMind Gemini 3.1 Pro Model Card, 2026). Claude Opus 4.7's SWE-Bench Pro score is 64.3% — a meaningful 10-point gap on the harder evaluation (Vellum, 2026).

Benchmarks are a noisy signal — both Anthropic and Google have spent real engineering on these specific evaluations. The pattern I see in actual repos is what matters more. Claude is more cautious and more aligned with "ask before doing." When it's wrong, it backs out cleanly. Gemini is more confident, sometimes overconfident — it will happily generate 200 lines of code that look right and aren't, and the recovery loop costs more than the original task would have on Claude.

On a 20-file refactor I ran on the same Laravel codebase three different ways — first with Claude Code, then a fresh session with Gemini CLI on the free tier, then Gemini CLI with Gemini 2.5 Pro paid — Claude got it right in three prompts with four confirmation stops. Free-tier Gemini got 80% there in one prompt but missed a service-binding update that caused two test failures. Paid Gemini did better, but still introduced one stale type import that I caught in review. None of these are disqualifying. They're a consistent pattern: Claude is conservative-by-default, Gemini is aggressive-by-default.

There's a meta-pattern worth naming. Pragmatic Engineer's 2026 survey of ~906 engineers showed Claude Code at 46% "most loved" tool — vs Cursor at 19%, GitHub Copilot at 9%, and Codex at 6% (The Pragmatic Engineer, 2026). JetBrains' April 2026 report tracks 24% Claude Code adoption at work in the US and Canada (18% globally), up from roughly 3% in mid-2025 — a 6x increase in nine months, with a 91% CSAT and NPS of 54 (JetBrains Research, 2026). Gemini CLI adoption is harder to measure but sits around 10% in the same Pragmatic survey. The emotional pull is decisively with Claude right now. That's not destiny — it's a snapshot.

One pattern that doesn't show up in benchmarks but matters in daily use: codebase reasoning across multiple files. When I ask either tool to "find where the user authentication flow connects to the billing webhook," Claude tends to do less searching but more careful synthesis. Gemini casts a wider net — it reads more files thanks to the bigger context — but the summary it produces is sometimes diluted by including irrelevant matches. On Terminal-Bench 2.0, the gap is visible at the system level: Codex CLI with GPT-5.2 leads at 63%, Claude Opus 4.5 with Terminus 2 sits at 58%, and Gemini 3 Pro with Terminus 2 at 57% (Terminal-Bench Leaderboard, 2026). The Aider polyglot leaderboard tells a similar story — Claude Opus 4.5 at 89.4% vs GPT-5 at 88% with Gemini further back (Aider Leaderboards, 2026). The benchmarks aren't the whole story, but they're consistent with what I see when I drive both tools through the same task.

broader landscape including Cursor, Codex, and Windsurf

What About Tool Use, MCP, and the Extensibility Story?

This is where the comparison gets more interesting and where the older "Claude wins extensibility" narrative needs an update. Both products now support the Model Context Protocol. Gemini CLI has full MCP server support via stdio and HTTP transports, configured through ~/.gemini/settings.json (Gemini CLI MCP Docs, 2026). Google also shipped Hooks as a default capability in Gemini CLI v0.26.0 — they cover pre/post tool execution, session events, and prompt submission (Google Developers Blog, 2026).

What Gemini CLI doesn't have, and Claude Code does:

Skills. Reusable prompt + tool bundles that bundle workflows into named, invokable units. Anthropic shipped them in October 2025 and the ecosystem (skills.sh and others) has grown around them. The full documentation lives at code.claude.com/docs/en/skills.
Subagents. Model-launched workers with their own context windows. You spawn one for research, code review, or exploration, and the parent agent continues without polluting its context.
Plugins. A first-class extension surface for community-authored tooling.

Gemini CLI has Extensions, which are conceptually similar to Skills + Plugins combined — they package prompts, MCP servers, and slash commands together. The implementation is younger and the ecosystem thinner. If you're starting from zero today, both surfaces are usable. If you're inheriting a setup with a dozen carefully tuned workflows, Claude Code's ecosystem has more depth.

Here's the twist that reframes this whole section, though: the gap I just described is a snapshot of the legacy Gemini CLI. Its successor, Antigravity CLI, ships first-class Skills, Hooks, and async Subagents (Agentpedia, 2026). So "Claude Code wins extensibility" is true today and largely false by Q3. I dig into what that means in the dedicated Antigravity section below.

Here's the part that doesn't get said often enough: the extensibility surface is also the attack surface. Anthropic patched four CVEs against Claude Code in early 2026 — RCE via untrusted project config, path bypass, command injection, and DNS exfiltration (Check Point Research, 2026). Three of those exploited the hook/MCP/skill layer. Gemini CLI being newer and Apache 2.0 means more eyes are on it, but it also means the attack surface is younger and less battle-tested. Neither is a reason to avoid either tool, but it's a reminder that running an agent on a freshly cloned untrusted repo is not as safe as the UX makes it feel.

deterministic hook gates Claude Code can't reason its way past

when to reach for Skills vs MCP servers

How Does the Terminal UX Actually Compare?

This is the part that almost never shows up in feature checklists, and it's the thing that drives my daily preference more than benchmarks do. Both CLIs render a similar visual surface — a chat pane, a tool-call list, a diff preview, an approval prompt. But the small details are different in ways that compound across a working day.

Claude Code's interactive panel is denser and more interruption-friendly. I can hit a slash command mid-stream, switch from auto-accept to plan mode in one keystroke, fork a session, or push a Subagent off to do a side task while the main thread keeps going. The keyboard ergonomics feel deliberate. There's a permission UI tier that lets me allow a specific tool for the rest of the session without granting blanket access — small, but I lean on it constantly.

Gemini CLI's UX is closer to a chat-first design with tools bolted on. The diffs render cleanly, the tool calls are explicit, and the VS Code Companion extension gives you in-editor diff previews (Google Blog, 2025). What's missing for me is the in-session flexibility — switching between modes, gating specific tool classes, and recovering from a derailed plan takes more keystrokes. None of these are dealbreakers. They're the kind of papercuts you notice on hour six of a long day.

Two specific things I keep running into:

Plan mode. Claude Code's plan mode — where the agent proposes a written plan you approve before any tool use — is the single biggest UX feature I rely on. Gemini CLI doesn't have a direct equivalent. You can prompt it into "show me a plan first" behavior, but it's not enforced, and on long tasks the agent will drift back to "do then show" without you noticing.
Session forking. Claude Code lets me fork a session at any point and try a different approach in parallel without losing my place. Gemini CLI requires me to open a separate terminal and re-instantiate context, which negates some of the 1M-context advantage.

On the other hand, Gemini CLI's GitHub Actions integration — currently in beta — is genuinely useful for running Gemini against pull requests at scale (Google Blog, 2025). Claude Code has agentic CI patterns through hooks and headless mode, but the out-of-the-box CI story is less polished. If a meaningful chunk of your agent use is "run on every PR" rather than "drive interactively in a terminal," Gemini wins that lane.

Antigravity CLI vs Claude Code: The Comparison That Now Matters

For most readers, the real 2026 decision isn't Gemini CLI vs Claude Code anymore — it's Antigravity CLI vs Claude Code, because Antigravity is what your Gemini login points to after June 18. Google unveiled Antigravity at I/O 2026 on May 19 as a standalone, agent-first platform: a Go-based CLI, an SDK, a desktop app, managed execution, and enterprise support, all on a shared runtime (MarkTechPost, 2026). It isn't a Gemini CLI point release. It's a different product wearing the migration path.

What changes for this comparison, concretely:

The feature checklist now matches. Antigravity CLI ships Skills, Hooks, and async Subagents — the exact extensibility trio that was Claude Code's clearest moat (Agentpedia, 2026). Its async subagents run long refactors or parallel research in the background without blocking your prompt, which is arguably ahead of where Claude's subagents sit today. Extensions get rebranded to Plugins, and MCP support carries over.
It's faster, and it's closed-source. The Go rewrite is noticeably snappier than the old Node-based CLI, and the CLI shares a runtime with the desktop app so updates land everywhere at once. The catch: Antigravity is not open source. The binary is free to install and the repo is public, but the source isn't (The Register, 2026). The open-governance argument that favored Gemini CLI does not transfer to its successor.
Pricing undercuts Claude. Antigravity access is gated by your Google account tier, and $20/month Google AI Pro unlocks the full platform — desktop, CLI, and SDK (Agentpedia, 2026). That's the same entry price as Claude Pro but a more complete bundle, and a fifth of Claude Max 5x.

So where does that leave Antigravity CLI vs Claude Code? Closer than legacy Gemini CLI ever was. Claude Code still holds the two advantages that are hardest to copy: raw model quality (Opus 4.7 at 87.6% SWE-bench Verified vs Gemini 3.1 Pro at 80.6%) and a year-deep, battle-tested extension ecosystem with a 91% CSAT behind it. Antigravity counters with price, background multi-agent orchestration, and a desktop-plus-CLI-plus-SDK surface Claude doesn't match in a single bundle. The cleanest framing: Antigravity is the stronger platform play and the obvious migration for Gemini loyalists, while Claude Code is still the more reliable agent on hard, trust-sensitive work. If you were picking Gemini CLI for the free tier and the open license, neither reason survives the transition — which is exactly why this comparison is worth re-running for yourself.

My Take After Testing Antigravity CLI

I've put Antigravity CLI through real work alongside Claude Code and the other terminal agents, and I'll be blunt about where I landed. The single biggest problem for me: there's still no real plan mode — the one Gemini CLI gap I flagged above that the rewrite didn't fix. Claude Code's plan mode (propose a written plan, I approve it, then it touches anything) is the feature I lean on hardest, and Antigravity inherited Gemini CLI's lack of a true equivalent. For the kind of careful, multi-step work I do, that alone is a deal-breaker.

Stability is the other gap. Head to head, Claude Code feels mature and stable in a way Antigravity doesn't yet. Antigravity reads like exactly what it is: a freshly launched rewrite. I hit rough edges, inconsistent behavior, and the general sense that it needs a few more release cycles to settle. It'll get there — Google's cadence is fast — but "will get there" isn't "is there."

Then there's the model gap, which the benchmarks flag and daily use confirms: Claude still wins by a meaningful margin on hard, careful work. The one thing Antigravity genuinely has on Claude Code is generous free usage. Even after the legacy Gemini CLI free tier sunsets, Antigravity's free allotment is real, and Claude Code has no free tier at all. If budget is your binding constraint, that's a legitimate reason to keep Antigravity in the rotation. For everything else, I'm still reaching for Claude Code.

how Claude Code stacks up against Cursor if you're weighing more than two agents

Feature Matrix: Where Each Tool Genuinely Wins

Capability	Gemini CLI	Claude Code
Status (May 2026)	Deprecated — free/Pro/Ultra serving ends June 18, 2026; succeeded by Antigravity CLI	Active, no changes
Underlying model	Gemini 2.5 Pro / Gemini 3.1 Pro	Claude Opus 4.7 / Sonnet 4.6
SWE-bench Verified	80.6% (Gemini 3.1 Pro)	87.6% (Opus 4.7)
SWE-Bench Pro	54.2%	64.3%
License	Apache 2.0 (open source)	Closed-source (npm)
GitHub stars	104K, 13.7K forks	n/a (closed)
Default context window	1M tokens (standard)	1M tokens GA since March 13, 2026
Free tier	1,000 req/day, 60 req/min (ends June 18, 2026)	None
MCP support	Yes (stdio + HTTP)	Yes (original implementation)
Hooks	Yes (default since v0.26.0)	Yes (mature, 9+ lifecycle events)
Skills	Legacy: no (Extensions closest). Antigravity: yes	Yes (Oct 2025+)
Subagents	Legacy: no. Antigravity: yes (async)	Yes
Plugins	Extensions (Plugins on Antigravity)	Plugins
IDE integration	VS Code Companion	VS Code, JetBrains
Entry pricing	Free until June 18; then $19.99/mo Google AI Pro (Antigravity)	$20/mo Pro (limited)
Paid mid-tier	$19.99/mo Google AI Pro	$100/mo Max 5x
Top pricing	$249.99/mo Google AI Ultra	$200/mo Max 20x
API input/output (per 1M)	$1.25 / $10 (Gemini 2.5 Pro ≤200K)	$5 / $25 (Opus 4.7)
Open governance	Gemini CLI: yes (Apache 2.0). Antigravity: no	No
CSAT / NPS	Not publicly reported	91% / 54 (JetBrains, 2026)
Recent CVEs	None publicly disclosed	4 patched in early 2026

The matrix tells you what. It doesn't tell you what to actually do. The next section does.

Which One Should Be Your Daily Driver in 2026?

Use this framework. The decision isn't "which is better" — it's "which fits the work you do most."

Pick Gemini CLI (or its Antigravity successor) as your daily driver if: — with the caveat that after June 18, "Gemini CLI" effectively means Antigravity CLI for free, Pro, and Ultra accounts:

You're cost-sensitive or just starting out — the free tier is genuinely usable, not a teaser, through June 18; after that the $19.99/mo Antigravity tier is the cost-sensitive pick.
You work on document-heavy or whole-repo reasoning tasks where 1M context is load-bearing.
You need open-source guarantees for legal, audit, or air-gapped reasons — but note this holds for the legacy Gemini CLI, not its closed-source Antigravity successor. If open governance is a hard requirement, that's now a reason to look elsewhere, not toward Antigravity.
You're already inside the Google Cloud ecosystem and Vertex AI billing makes sense.
You want background multi-agent orchestration — Antigravity's async subagents are a genuine reason to move, not just a forced migration.

Pick Claude Code as your daily driver if:

You spend most of your time in repos you trust, doing focused refactor and debug work.
You'll actually use Skills, Subagents, and Hooks (otherwise you're paying $100+/mo for a CLI that's roughly comparable on the surface).
You value model recovery behavior on hard tasks more than raw context size.
You can afford Max 5x or Max 20x.
You want the highest reported developer satisfaction (91% CSAT) and don't mind paying for it.

Run both if:

You're doing 6+ hours of agent-heavy work daily.
You want a second opinion on hard tasks — one agent's stuck plan often unblocks fast in the other.
You work across language ecosystems where the models diverge. I find Claude stronger on PHP, Ruby, and complex TypeScript refactors. Gemini is stronger on Python, data work, and anything multimodal.

Pragmatic Engineer's survey reflects how the market is actually behaving: 70% of engineers run 2–4 AI tools simultaneously, and 15% run five or more (The Pragmatic Engineer, 2026). Treating this as a one-winner question is the wrong frame. The cost of running both — especially when Gemini CLI's free tier is one of them — is rounding error for any working developer.

My current setup, in case it's useful: Claude Max 20x is the daily driver. Gemini CLI free tier is the second opinion (and becomes Antigravity CLI once the June 18 cutoff lands). When Claude gets confused on a long task — usually around the 45-minute mark on something architecturally tangled — I'll fork the conversation, paste the state into a fresh Gemini session, and let the 1M context chew on the whole repo. The disagreement between them is more useful than either single answer.

the full multi-model workflow including ChatGPT and Grok

Frequently Asked Questions

Is Gemini CLI really free, or does the free tier have hidden costs?

Through June 18, 2026, the free tier is genuinely free for personal Google account holders — 1,000 model requests per day at 60 per minute, with the full 1M context window, no credit card required (Gemini CLI Docs, 2026). Google does collect data to improve models on the free tier, which is the trade-off. After June 18, that free serving ends for personal accounts; you move to Antigravity CLI or keep Gemini CLI on a paid API key. For commercial work where data privacy matters, you'll want a paid Google AI Pro plan or Vertex AI billing either way.

Does Claude Code have a free tier I can try first?

No. Claude Code requires at least an Anthropic Pro subscription at $20/month, and serious daily-driver use realistically starts at Max 5x ($100/month) (Anthropic Pricing, 2026). There's no free tier in 2026. You can use pay-per-token API access without a subscription, but it adds up fast for agent-heavy work.

Can Gemini CLI use MCP servers, or is that Claude Code only?

Both products support MCP. Gemini CLI shipped MCP support with stdio and HTTP transports, configured through ~/.gemini/settings.json (Gemini CLI MCP Docs, 2026). The MCP ecosystem now has roughly 2,300 public servers, and most run in both tools without modification. The Claude-only advantages are Skills and Subagents — MCP is fully shared territory.

Is Gemini 3.1 Pro actually competitive with Claude Opus 4.7 for coding?

Close, but not at parity. Gemini 3.1 Pro scores 80.6% on SWE-bench Verified vs 87.6% for Claude Opus 4.7 (DeepMind, 2026; Anthropic, 2026). The gap is roughly 7 points on the easier benchmark and 10 points on SWE-Bench Pro. In daily use, Claude feels noticeably more careful on hard refactors. Gemini is faster on greenfield work where caution isn't load-bearing.

Is Gemini CLI being shut down?

The original Gemini CLI is being deprecated, not deleted. Starting June 18, 2026, it stops serving requests for free personal, Google AI Pro, and Google AI Ultra accounts (Google Developers Blog, 2026). Standard and Enterprise organizations keep full access, and the Apache 2.0 source stays public. For most individual developers, though, the practical answer is: yes, your free Gemini CLI stops working that day unless you switch to a paid API key.

What replaces Gemini CLI?

Antigravity CLI replaces it. Announced at Google I/O 2026 on May 19, it's a Go-based rewrite that ships as part of a standalone agent platform — CLI, SDK, and desktop app on a shared runtime (MarkTechPost, 2026). It carries over MCP support and adds first-class Skills, Hooks, and async Subagents. Google's migration guide lives at antigravity.google/docs/gcli-migration.

Is Antigravity CLI better than Claude Code?

It depends on what you weight. Antigravity CLI wins on price ($20/mo Google AI Pro unlocks the full platform vs $100/mo Claude Max 5x) and background multi-agent orchestration. Claude Code still leads on raw model quality (Opus 4.7 at 87.6% SWE-bench Verified vs Gemini 3.1 Pro at 80.6%) and a more mature, battle-tested extension ecosystem (Anthropic, 2026). For hard, trust-sensitive refactors I still reach for Claude; for cost and async background work, Antigravity is compelling.

Should I switch from Gemini CLI now, or wait?

If you're on a free, Pro, or Ultra account, start before June 18 rather than on the deadline. The migration is straightforward — your GEMINI.md, MCP configs, and most extensions carry over to Antigravity CLI with minor edits (Google Developers Blog, 2026). If open-source governance was your reason for choosing Gemini CLI, this is the moment to evaluate alternatives, because Antigravity is closed-source and that requirement no longer points to Google's tooling.

Conclusion

If you only have one slot in your shell, here's the honest verdict: Claude Code is the daily driver I'd still recommend for most professional engineers working in trusted repos with budget for $100+/month. The 91% CSAT and the depth of the Skills + Subagents ecosystem aren't accidents. But the comparison just shifted under everyone's feet. The reason most people reached for Gemini CLI — a genuinely free, generous tier — stops being an option for personal accounts on June 18, 2026, and the open-source argument doesn't survive the move to closed-source Antigravity CLI.

So the real 2026 choice for most readers is Antigravity CLI vs Claude Code, and there it's closer than the legacy matchup ever was: Antigravity matches the feature checklist and undercuts the price, while Claude holds the edge on model quality and ecosystem maturity. If you were a free-tier Gemini CLI user, the cleanest path is to migrate to Antigravity before the deadline, then re-run this comparison against Claude with fresh eyes.

The smart move for anyone doing serious work is still to run both. Claude Code as primary, Antigravity CLI as second opinion — or vice versa, depending on your budget and the shape of your repos. The terminal-agent paradigm is the new default for AI-assisted coding. Pick the one that fits the work you do most, mind the June 18 deadline if Gemini CLI is in your stack, and don't agonize over the rest. The only mistake is staying on the sidelines while everyone else ships faster than you.

patterns for orchestrating subagents on long-running tasks

Claude Code vs Cursor in 2026: 12 Months on Both

Nishil Bhave — Fri, 22 May 2026 15:19:39 +0000

Claude Code vs Cursor in 2026: 12 Months on Both

Most "Claude Code vs Cursor" posts you'll find online either declare a winner from a feature table they never actually used, or refuse to declare one at all because both products run ads. I'll do neither. I've used both as primary tools for the last twelve months — and after running real production work through both, Claude Code wins for individual developers who reach for premium models. Cursor still earns its $20/month, but as the editor partner, not the lead. The rest of this piece is the math and the experience that gets you there.

Cursor is an AI-augmented IDE where you sit at the wheel and the model rides shotgun. Claude Code is an agentic terminal process where the model drives and you're the safety driver, hands hovering, ready to grab the bar. Both shapes of work happen in a typical day, which is why The Pragmatic Engineer's 2026 developer survey shows 70% of engineers run two to four AI tools at once and 15% run five or more (The Pragmatic Engineer, 2026). This piece is the honest comparison I wish I'd had when I started: what each tool actually is, where each one genuinely wins, the cost-structure trap most people don't see, and how to pick when you can only run one.

if you're comparing terminal-only agents, start here instead

Key Takeaways

Claude Code is the right primary tool for most individual developers in 2026, especially anyone running Opus 4.7 for real work, where Cursor's credit-pool model burns ~$45 in a day and Claude Code's session-cap model doesn't.

Claude Code holds a 91% CSAT and 54 NPS — the highest of any coding tool JetBrains surveyed in 2026 — and 46% of engineers named it the tool they love most, 2.4x Cursor's 19% (JetBrains Research, 2026; The Pragmatic Engineer, 2026).

Cursor still earns its $20/month as the editor partner: Tab completion, Cmd+K inline edits, and Composer V2 are genuine wins. The right setup for most professionals: Claude Code Max 5x ($100) as the primary tool + Cursor Pro ($20) as the editor, running Claude Code inside Cursor's terminal.

Why Are Claude Code and Cursor Built on Different Philosophies?

First, kill a misread before it spreads: AI coding tools are not niche anymore. By January 2026, 90% of developers were using at least one AI tool at work (JetBrains Research, 2026), and Stack Overflow's 2025 survey put it at 84% using or planning to, with 51% reaching for AI daily (Stack Overflow, 2025). So when you see an "18%" figure in this article, it does not mean only 18% of developers use AI. It means 18% reach for that one specific tool. The question stopped being whether developers use AI and became which tool they reach for — and that split is where the real story lives. GitHub Copilot leads it at 29%, with Cursor and Claude Code tied at 18% each.

The fundamental split is who is driving. Cursor and Claude Code sit tied at 18%, but Claude Code climbed there from roughly 3% in April 2025 — a 6x jump in nine months — while Cursor has hovered at this level for over a year (JetBrains Research, 2026). Same headline number, completely different growth shape. The difference reflects what each product is asking the developer to do.

Cursor is a fork of VS Code with deep AI integration baked into every surface. Tab completion happens as you type. Inline edits accept a comment, return a rewritten block. Composer, Cursor's agent mode, opens a chat where you describe a task and watch it spread across files. The cursor — the literal blinking caret on your screen — is the central metaphor. You're still the one moving it, choosing what to accept, deciding when to drive deeper.

Claude Code refuses that framing entirely. You type a goal, hit enter, and the agent reads files, runs commands, writes code, runs tests, and proposes a final diff. There is no caret. There is no editor window. There is a transcript of decisions you can scroll back through. When I'm using Claude Code well, I'm not editing — I'm reviewing. The model is doing the typing. My job is to set the goal, watch the plan, and reject the bad ones before they propagate.

This is not a UX preference. It's a fundamentally different contract about who owns the next keystroke. Once you internalize that split, every other feature comparison falls into place. The 91% CSAT score Claude Code earned in the JetBrains survey — the highest of any AI coding tool they measured — and the 46% "love most" vote in Pragmatic Engineer's survey aren't accidents (JetBrains Research, 2026; The Pragmatic Engineer, 2026). They reflect a category of developer who'd rather give up the steering wheel than fight an autocomplete suggestion.

how the agentic loop actually composes work

Where Does Cursor Actually Win in 2026?

Cursor wins anywhere the work is "I know what I want, I just don't want to type it." That covers a surprisingly large share of daily engineering. It also explains why Cursor crossed $1 billion in annualized revenue by November 2025 and reportedly raised at a $29.3 billion valuation in its Series D (Sacra, 2025). It's a developer-driven IDE that wins on inline ergonomics.

Tab completion is where Cursor's lead is hardest to argue with. The model knows your codebase, your imports, your naming conventions. It finishes the line you were about to write before you've thought about how to write it. For a senior developer with a clear mental model, this is faster than dictating to an agent — you don't lose the flow state of typing. Cmd+K inline edits are the second killer feature. Select a function, describe what you want changed, accept or reject the diff. It's basically a turbocharged refactor menu and it stays close to the developer's intention because the developer is the one selecting what to change.

My finding: Last week I converted 47 React class components to function components across our admin panel. Cursor with Tab + Cmd+K finished it in under two hours. I tried the same task with Claude Code in agentic mode the prior week and it took longer because the agent kept exploring files I didn't need touched. Different shapes of work, different tools.

Cursor 2.0 shipped in late 2025 with a multi-agent mode that runs parallel attempts in git worktrees and a frontier in-house model called Composer that targets sub-30-second turns (Cursor Blog, 2025). Composer is now on V2 and it's genuinely impressive for what it is: fast, cheap, and good enough for a large slice of day-to-day work where you don't need frontier reasoning. The honest tradeoff: Composer V2 is not on par with Claude Opus 4.7 for hard architectural reasoning, gnarly debugging, or multi-step planning across unfamiliar code. For everything else, it punches above its price.

The other genuine Cursor advantage is the model picker. From the same editor I can switch between Claude Sonnet, Claude Opus, GPT-5, Gemini, and Composer V2 depending on the task. Premium model when the work is hard, Composer or Sonnet when it's mostly typing. Claude Code, by definition, runs Claude only. If your workflow benefits from comparing model outputs or picking the cheapest model that's good enough for a given task, Cursor's multi-model surface is a real workflow win, and one I underweight when I evangelize Claude Code to colleagues.

That said, agentic work still isn't where Cursor feels most native. The product's center of gravity is the cursor in your editor, not a long-running process in your terminal. That center hasn't moved.

If you're doing inline edits, refactors, UI iteration, working through tutorials, or pair-programming on a problem you already understand, Cursor is genuinely faster than the alternative. Bloomberg reported over 1 million daily active users and 50,000 businesses on Cursor, with 64% of Fortune 500 listed as customers (Cursor Enterprise, 2026). These aren't curious developers. They're teams that ship every day.

Where Does Claude Code Actually Win?

Claude Code wins whenever the work is "I want this outcome, please figure out how." That's the agentic shape: long-running, multi-file, requires planning, benefits from running tests and iterating. The 91% customer satisfaction score and the fact that 46% of engineers named it the tool they love most — almost two and a half times Cursor's 19% — both come from this category of work (The Pragmatic Engineer, 2026).

The work where Claude Code genuinely shines: migrations like Node 18 to 22 across a monorepo, jQuery to Vue, or Webpack to Vite. Test scaffolding across dozens of files. End-to-end debugging where you need to read logs, trace through code, and run a fix. Any task where the right answer involves "first read these eight files." It plans. It writes a checklist. It backs out cleanly when you say no. Anthropic's own revenue tells the same story. Claude Code reportedly crossed $1 billion in annualized revenue by November 2025 and helped push Anthropic's total ARR to $19 billion by early March 2026 (Reuters via Yahoo Finance, 2026).

Here's what nobody talks about: Claude Code is better at stopping. When Cursor's agent mode misreads a task, it tends to keep going. It'll happily refactor adjacent code, add tests you didn't ask for, modify config files in service of a goal you didn't set. Claude Code asks more often. It writes a plan first, and the plan stage is genuinely useful for catching misalignment before any code gets written. For risky changes — production code, security-sensitive logic, anything touching billing or auth — that pause is the difference between a clean PR and a postmortem.

why Claude Code's hook layer matters for safe agentic execution

The extensibility surface matters more than benchmarks here. Claude Code ships Subagents, Skills, Hooks, and MCP support: primitives for shaping how the agent behaves on your codebase. Cursor has rules and context files but doesn't expose the same depth of programmable agent control. If you spend a lot of time shaping how the AI approaches your code, Claude Code gives you more handles to grab.

How Do Claude Code and Cursor Compare on Pricing?

The pricing structures look superficially similar — both start at $20/month — but they cap usage in fundamentally different ways. Cursor uses a usage-based credit pool tied to plan price (Pro is $20/month with $20 in API credits, premium models bill down from the pool, Auto mode stays unlimited). Claude Code uses a token budget within rolling 5-hour and 7-day windows (Anthropic Help Center, 2026).

Here's the actual feature matrix as of May 2026:

Capability	Cursor Pro ($20/mo)	Claude Code Pro ($20/mo)	Cursor Ultra ($200/mo)	Claude Code Max 5x ($100/mo)
Primary surface	IDE (VS Code fork)	Terminal CLI + IDE extension	IDE	Terminal CLI + IDE extension
Models included	Claude, GPT, Composer, others	Claude (Sonnet, Opus)	Same with bigger pool	Same with bigger limits
Usage cap	$20 in API credits + unlimited Auto	~40-80 Sonnet hrs/week, 5-hr windows	$200 in credits + unlimited Auto	5x the Pro limits
Agent mode	Composer + multi-agent worktrees	Native agentic (the whole product)	Yes	Yes
Tab completion	Yes (signature feature)	No	Yes	No
Hooks / Skills / Subagents	No	Yes	No	Yes
MCP support	Yes (via extension)	Yes (native)	Yes	Yes
Free tier	Hobby (limited completions)	Anthropic Free (basic Claude.ai access)	n/a	n/a
Enterprise	Teams $40/user, Enterprise custom	Team / Enterprise via Anthropic	Included	Included

I'm showing Claude Code Max 5x ($100) in the right-hand column instead of Max 20x ($200) because that's the tier most individual developers actually settle on — Max 20x is built for heavy parallel-agent workloads and team accounts, not solo daily use. If you want the full breakdown of which Claude Code tier maps to which workload, see the honest Pro vs Max breakdown after cycling through every tier.

Here's the part the spec sheet hides: at $20/month, the cost-structure favors Claude Code for anyone reaching for premium models more than occasionally. Cursor's credit pool ticks down per token whenever you use Claude Opus, GPT-5, or any non-Auto model, so a heavy day on frontier models drains the wallet fast. Claude Code's 5-hour window resets and weekly caps mean you can throw long-running Opus tasks at it without watching the meter. The only scenario where Cursor's pricing model is cleanly better is if you genuinely live in tab completion and Auto mode and almost never reach for premium models. In that case the credit pool barely moves and you get unlimited Auto on top.

My finding: I run Claude Code Max 5x ($100/month) alongside Cursor Pro ($20/month) — combined $120/month covers both shapes of work for an individual developer. Max 5x is the sweet spot here. Max 20x at $200 is overkill unless you're running multiple parallel agentic sessions all day or working as part of a team that shares one account. For solo daily use, 5x gives me enough Opus headroom that I almost never hit a wall, while Cursor Pro handles tab + inline edits where the credit pool stays low because Auto mode covers most of it.

One nuance worth flagging: Anthropic announced on May 13, 2026 that Claude Code's weekly limits are getting a 50% bump and 5-hour limits are doubling through July 2026 — a temporary increase tied to capacity coming online. If you've been bouncing off Claude Code's caps, the ceiling just moved up significantly.

My finding — the Cursor burn-rate nobody warns you about: When I activated Cursor Pro, my account showed $20 in API credits + a $25 bonus credit = $45 total in the pool. Running Claude Opus 4.7 on heavy multi-file refactor and debugging work, I burned through the entire $45 in roughly a single day. Not a heavy week. A day. That's the part no other comparison article tells you: Cursor's $20 Pro tier is functionally a $45 starter wallet for premium model usage, and premium models drain it fast. After that, you either stop, switch to Auto mode (which uses cheaper models you may not want), or top up at API rates.

Claude Code Pro is architecturally different. It's not a credit pool — it's session-based caps. You can use Opus until you hit the 5-hour rolling window, then you wait it out and keep going. There is no scenario where you "burn through" your $20/month in a single afternoon. You might hit a session limit and have to pause for a few hours, but your monthly budget is structurally protected from token-burn. If you run Opus heavily, Claude Code's session-cap model is structurally cheaper than Cursor's credit-pool model at the $20 tier — and the math gets more favorable for Claude Code the more frontier-model work you do.

Can You Run Claude Code Inside Cursor? (Yes, and It's Great)

Anthropic's official VS Code extension runs natively inside Cursor's integrated terminal, since Cursor is a VS Code fork (Claude Code Docs, 2026). This is the setup most experienced users I know are running. You get Cursor's UI for editing — Tab completion, Cmd+K, Composer when you want a quick AI-assisted change — and Claude Code in the terminal for everything agentic.

The setup takes about three minutes. Install Cursor. Open the integrated terminal (Ctrl+). Install Claude Code from npm withnpm install -g @anthropic-ai/claude-code. Runclaude` in the terminal. Authenticate with your Pro or Max plan. Done. The Claude Code panel even appears as a sidebar tab inside Cursor when you install the VS Code extension companion, so you don't have to context-switch between terminal and editor.

Why this works so well: the two tools don't fight each other. Cursor doesn't know what's happening in the terminal. Claude Code doesn't try to take over your editor. They share the same working directory — when Claude Code edits a file, Cursor picks up the change and re-indexes. When you make an inline edit in Cursor, Claude Code reads the updated file on its next turn. The integration is essentially "they're both pointing at the same git repo."

The one rough edge: if you have both Cursor's agent mode and Claude Code editing files at the same time, you can get merge confusion. The fix is simple: don't run both agents simultaneously on overlapping files. Use Cursor's Composer for one thing, Claude Code for another, and treat them as separate sessions. In practice I almost always have one of them inactive while the other works.

if you want a code review skill that works across both tools

What Does My Actual Daily Workflow Look Like?

I'll describe a typical day in May 2026, since this is where the comparison gets honest. The Pragmatic Engineer's survey found 95% of developers now use AI tools at least weekly and 75% report AI handles at least half their engineering work — these numbers describe my own work pretty accurately (The Pragmatic Engineer, 2026).

Mornings I'm usually exploring something: reading a new library, prototyping a UI change, debugging a tricky bit of state. That's Cursor time. I keep the editor open, use Tab heavily, hit Cmd+K when I want to rewrite a block of code, and occasionally drop into Composer for a contained change ("rewrite this component to use the new useStore hook"). Speed-to-thought is the priority and Cursor's keystroke loop is unbeaten here.

Afternoons tend toward bigger work: a refactor, a feature that touches many files, a migration script, writing tests for an untested module. That's Claude Code time. I describe the goal, watch the plan, approve the tool calls, and let it run while I review the diff. Claude Code's terminal output is the work surface. I might check Cursor periodically to scroll through what changed, but the agent is in charge.

The hardest part of this workflow isn't switching between tools. It's noticing which mode you're in. Am I exploring (Cursor) or executing (Claude Code)? Am I editing (Cursor) or supervising (Claude Code)? Once you can answer that question reliably, the dual setup gets out of the way. Most of the meta-cognitive overhead disappears within a couple of weeks.

One contrarian data point worth sitting with: a METR randomized controlled trial of 16 experienced open-source developers in early 2025 found AI tools actually slowed them down by 19% on average, even though developers believed they were 20% faster ((https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/), 2025). METR is rerunning the experiment for 2026 agentic tools because they suspect the result may not hold for current Claude Code / Composer (METR Update, 2026). The lesson isn't that AI tools don't help. The lesson is that perceived productivity and measured productivity diverge. Pay attention to what you actually ship, not how fast it feels.

Which One Should You Pick If You Have to Pick One?

If you can only run one tool, pick based on the shape of the work you do most days, not the spec sheet. Here's the decision framework I'd actually use:

Pick Cursor if you:

Spend most of your day editing existing code, not planning big changes
Value tab completion and Cmd+K inline edits above everything else
Work primarily on UI, frontend, or any code with tight visual feedback loops
Prefer staying in an editor window and don't enjoy reading agent transcripts
Are part of a team that already standardized on Cursor and you need to match their setup
Want a tool that 64% of Fortune 500 already pays for (your IT department will not push back)

Pick Claude Code if you:

Spend most of your day planning, refactoring, debugging, or shipping multi-file work
Are comfortable in the terminal and prefer text-based interfaces
Want to extend the agent's behavior with hooks, skills, or subagents
Work on backend code, data pipelines, infrastructure, or anything where the win is "the agent reads ten files and figures out the right change"
Are willing to trade tab completion for stronger agentic execution
Run Opus heavily and want a session-cap cost model rather than a per-token credit pool that can burn out in a single day
Want the tool that 91% of users rate as satisfying and 46% of engineers love most

Pick both if you:

Already use AI tooling daily and want each shape of work to use the right tool
Have $40-220/month in budget for tools (most professionals do)
Are comfortable context-switching between an editor and a terminal
Work on a mix of inline edits and agentic tasks across a typical week

For most professional developers I'd push toward "both", but with a clear hierarchy. Claude Code Max 5x ($100/month) as the primary tool, Cursor Pro ($20/month) as the editor partner = $120/month. That's the setup that holds up after a full year of daily use. If $120 is over budget, drop to Claude Code Pro + Cursor Pro at $40/month. You'll hit the 5-hour session ceiling on heavy Opus days, but it's still a better starting point than going Cursor-only. Cursor Pro's $45 credit pool will burn out faster than Claude Code Pro's session window resets.

Frequently Asked Questions

Is Claude Code better than Cursor?

For individual developers who use premium models regularly, yes. Claude Code wins on agentic multi-file work and ships the strongest extensibility primitives: Hooks, Skills, Subagents, MCP. It also earns the highest customer satisfaction score of any AI coding tool in JetBrains' 2026 survey at 91% CSAT, with 46% of engineers naming it their most-loved tool, 2.4x Cursor's 19% (JetBrains Research, 2026; The Pragmatic Engineer, 2026). The cost-structure also favors Claude Code at the $20 tier. Heavy Opus use can deplete Cursor Pro's $45 credit pool in a single day, while Claude Code Pro's session-cap model keeps your monthly budget structurally protected. Cursor still genuinely wins on inline edits, Tab completion, Cmd+K, and the multi-model picker. So the best setup most days is running both, with Claude Code as the primary and Cursor as the editor partner.

Can I use Claude Code inside Cursor?

Yes, fully. Cursor is a VS Code fork, so Anthropic's official VS Code extension installs and runs identically in Cursor (Anthropic Docs, 2026). Install it via the marketplace or run claude in the integrated terminal after installing the CLI. The two tools share the working directory and don't interfere with each other.

What's the price difference between Claude Code and Cursor in 2026?

Both start at $20/month for the Pro tier. Cursor uses a usage-based credit pool ($20 in API credits plus a $25 bonus credit at signup = $45 total, plus unlimited Auto-mode usage). Claude Code Pro gives you roughly 40-80 hours of Sonnet usage weekly within 5-hour rolling windows. Claude Code Max sits at $100/month (5x Pro limits, the individual sweet spot) and $200/month (Max 20x, for heavy parallel-agent workloads); Cursor's top tier is Ultra at $200/month (Anthropic Help Center, 2026). The nuance no one quantifies: Cursor's credit pool burns down per token when you use premium models like Claude Opus 4.7 — in my testing, heavy Opus use can deplete the full $45 starter pool in roughly a single day. Claude Code Pro uses session-based caps instead, so your monthly budget is structurally protected from a one-day burn-out.

Does Claude Code work in Cursor's agent mode?

Yes, but you should treat them as separate sessions. Run Claude Code in Cursor's integrated terminal for agentic, multi-file work. Use Cursor's Composer (agent mode) for shorter, contained changes you want to drive from the editor. Avoid running both agents on overlapping files in the same session to prevent merge conflicts.

Which AI coding tool has more users in 2026?

JetBrains' April 2026 research shows Cursor and Claude Code tied at 18% adoption among developers at work, behind GitHub Copilot at 29% (JetBrains Research, 2026). Cursor reports over 1 million daily active users and 64% of Fortune 500 as customers (Cursor Enterprise, 2026). Claude Code climbed faster, from 3% in April 2025 to 18% in January 2026.

My Verdict: Claude Code Wins, And Here's Why

I'll stop dancing around it. For an individual developer who works in code every day and reaches for premium models when the problem is hard, Claude Code is the right primary tool in 2026. Cursor remains useful — I run it too — but it sits in a supporting role, not the lead.

Here's the honest argument, stripped of feature-table diplomacy. Cursor genuinely wins on inline ergonomics. Tab completion, Cmd+K, Composer V2 — these are real strengths and I use them daily. The multi-model picker is a workflow advantage if you need to compare outputs across Claude, GPT, Gemini, and Composer. None of that is in dispute.

But the moment you start running Opus 4.7 for real work — the kind of work that pays for the tool — Cursor's economics break. $45 of credits, including the welcome bonus, lasts about a day. After that you're either paying more, switching to weaker models, or stopping. Claude Code's session-cap model is structurally different: there is no scenario where heavy daily Opus use drains your monthly budget. You wait out a 5-hour window and keep going. For sustained frontier-model work, Claude Code isn't just better — it's structurally cheaper at every realistic usage level.

Combine that with the agentic execution quality, the Hooks/Skills/Subagents extensibility, and the 91% satisfaction score (the highest of any AI coding tool JetBrains measured in 2026) and the verdict is clear (JetBrains Research, 2026). The 46% of engineers who named Claude Code their most-loved tool — almost two and a half times Cursor's 19% — aren't picking with their hearts (The Pragmatic Engineer, 2026). They're picking with their afternoons.

My actual recommendation: run Claude Code as your primary AI coding tool. Max 5x at $100/month is the sweet spot for individual developers. Add Cursor Pro at $20/month as your editor partner for the inline edits and Tab completion you'd otherwise miss. $120/month covers everything, and the cost ceiling stays predictable because Claude Code is doing the heavy lifting on a session-cap model — your budget can't disappear in an afternoon.

If you can only run one tool, pick Claude Code. It's the rare case where the most-loved tool is also the most economically sound for the work it does best.

Next, if you want to wire Claude Code into your existing tools and services, read the full MCP server setup playbook. If you want to extend what the agent can do without writing your own integrations, read how the Claude Code skills marketplace works and what to install first.

Claude Skills Marketplace: skills.sh & Shipping Your Own Skill

Nishil Bhave — Wed, 20 May 2026 15:55:33 +0000

Claude Skills Marketplace: skills.sh, Discovery, and Shipping Your Own Skill

The Anthropic SKILL.md spec landed publicly in late 2026; Vercel Labs opened skills.sh in February 2026 (InfoQ, 2026). Inside ninety days, the meta-discovery skill find-skills had crossed 1.6 million installs and Vercel's own frontend-design skill sat around 427,000 (skills.sh, 2026). That curve isn't normal for a 2026-launched developer ecosystem, and most Claude Code users I talk to don't know it exists.

I've shipped two skills to the marketplace in the last sixty days — codeprobe, a 9-category code reviewer, and youtube-inspector, a 4-skill toolkit for pre-watching videos. This Claude skills marketplace guide is the version I wish I'd had on day one: what a skill actually is, where the marketplace lives, the SKILL.md anatomy that ships, the case studies I have first-hand, and the distribution patterns that earn installs. For the deeper trade-off question — Skills versus MCP servers, when to use which — keep the full Skills vs MCP decision guide open in the next tab.

Key Takeaways

Claude Skills are filesystem-based capability modules (a SKILL.md plus optional scripts/ and references/ folders) that load on demand via progressive disclosure — roughly 100 tokens of catalog overhead per installed skill versus 5,000+ for an equivalent MCP server (Anthropic, 2026).

skills.sh is the install channel. Vercel Labs launched it February 2026; npx skills add <owner>/<repo> works for Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, and roughly 48 other agent surfaces — every runtime that reads the SKILL.md format pulls from the same shelf.

Three other directories exist (ClaudeSkills.io, claudemarketplaces.com, SkillsMP) but treat them as discovery surfaces, not install paths. Every confirmed install still flows through the skills.sh CLI.

The shipping bar is low — a SKILL.md with name, description, and a 10-line instruction block is publishable. The hard part is writing the description well enough that the model picks the right skill for the job.

What Is a Claude Skill, Actually?

A Claude Skill is a folder with a SKILL.md file at the root: a YAML frontmatter block (name, description, allowed-tools) plus a body of instructions written for Claude to read on demand. Anthropic shipped the format publicly in late 2026, and it now runs identically across Claude.ai, Claude Code, and the Claude API via the skill_id parameter (Anthropic Engineering, 2026). The whole spec fits on a page.

The mental model that finally clicked for me: a skill is a recipe, an MCP server is a pantry. The pantry — your MCP servers — is open all the time, and the model pays the schema cost on every turn even when nothing is touched. A skill sits on the shelf until the model decides it's needed, at which point the runtime loads only the SKILL.md body for that turn. That's the progressive-disclosure pattern, and it's the entire reason skills exist as a category separate from MCP.

Source: Anthropic agent skills documentation, 2026

Skills can carry scripts/ (any executable Claude has permission to run — Python, shell, Node), references/ (long-form markdown docs the model pulls only when relevant), and even nested skills if the directory grows. But the minimum viable skill is a single SKILL.md with three frontmatter fields. Anthropic's 32-page playbook covers progressive disclosure in depth (Anthropic playbook, 2026), but you don't need to read it to ship something useful.

Where they run matters because the runtime conventions differ. Claude.ai loads user skills from the upload UI; Claude Code reads from ~/.claude/skills/ (user) and .claude/skills/ (project); the API accepts skills via skill_id behind a beta header. The SKILL.md format itself is identical — port a skill from Code to .ai by copy-paste, then trim anything that needs a tool the destination doesn't expose. For the longer comparison against MCP servers — token costs, statefulness, when each primitive earns its slot — see the full Skills vs MCP decision guide.

How Does the Skills Marketplace Look in 2026?

The Claude Skills marketplace in 2026 is concentrated in one channel that handles installs and three aggregators that mostly handle discovery. skills.sh, operated by Vercel Labs, is the install channel. It launched February 2026 and works as a thin npm-style installer: npx skills add <owner>/<repo> pulls the skill from its GitHub source and lands the files in your local skills directory (skills.sh, 2026). The same command works for Claude Code, Cursor, Cline, GitHub Copilot, Windsurf, Gemini CLI, and roughly 48 other agent surfaces — every runtime that understands SKILL.md reads from the same shelf.

Source: skills.sh, ClaudeSkills.io, claudemarketplaces.com, SkillsMP (May 2026)

The skills.sh leaderboard surfaces what the ecosystem actually uses. find-skills (a meta-skill that searches the directory from inside any agent) has crossed 1.6 million all-time installs; Vercel's frontend-design skill sits around 427,000 (skills.sh, 2026). The long tail looks healthy — Vercel Labs, Microsoft Azure, Anthropic, Firebase, and a growing list of individual developers are all publishing. The installer infrastructure is open-source at github.com/vercel-labs/skills if you want to read how the CLI resolves a repo to a local install.

The three aggregators are useful for discovery but don't change how you install. ClaudeSkills.io (and its .info sibling) is a categorized directory of 650+ skills with descriptions and use-case tags. claudemarketplaces.com lists roughly 6,700 entries pulled from various sources — broader coverage, less curation. SkillsMP bills itself as a cross-platform aggregator, mostly forks and mirrors. None run their own install commands. Click "install" on any of them and you'll either get redirected to the skills.sh installer or handed the GitHub URL to clone manually. Treat them as search engines, not registries.

The practical implication: if you want a skill discovered, ship it on GitHub, submit it to skills.sh, and let the aggregators scrape it later. Skipping skills.sh and listing only on an aggregator means none of your installs flow through a counted leaderboard — currently the only authority signal the marketplace has.

One caveat worth flagging: install counts on skills.sh are aggregate across every agent surface, not just Claude. A skill that's popular with Cursor users inflates the same number a Claude-only user sees, which makes the leaderboard a noisier authority signal than it looks. Useful for ranking, less useful for "is this skill any good for Claude Code specifically."

How Do You Use a Skill From the Marketplace?

The install path is two commands once npx is available. First, find a skill — either by browsing skills.sh in a tab or by running the meta-skill find-skills from inside any agent. Second, install it:

# Install a skill (works from Claude Code, Cursor, Cline, etc.)
npx skills add nishilbhave/codeprobe

# Or install the meta-discovery skill first
npx skills add anthropics/find-skills

# Pin to a specific commit or tag for reproducibility
npx skills add nishilbhave/codeprobe@v0.3.1

That's the whole install. The CLI clones the skill's GitHub repo into your local skills directory — ~/.claude/skills/<skill-name>/ for Claude Code's user scope, .claude/skills/<skill-name>/ if you're inside a project — and registers it with the runtime. On Claude.ai, skills are uploaded via the web UI rather than the CLI; on the Claude API, you pass the skill via skill_id with the right beta header (Anthropic, 2026).

How a skill activates is the part that trips most people on first install. Skills don't run on every turn — they load only when Claude decides the user's request matches the skill's frontmatter description. A skill named git-cherry-pick-helper with the description "use when reviewing branch divergence or staging cherry-picks" fires only when the conversation actually mentions one of those things. If your skill never seems to load, the fix is almost always rewriting the description, not the body.

Inspecting and removing skills works the same way on every runtime. Run ls ~/.claude/skills/ to see what's installed. Run npx skills list to see what's registered. Run npx skills remove <name> to uninstall — or just delete the directory if you prefer. Skills don't persist state outside their own folder, so there's nothing to garbage-collect.

A few practical patterns worth memorizing:

Pin to a SHA when reproducibility matters. npx skills add owner/repo@<sha> installs a specific commit. The default install pulls main, which can change under you. If you're shipping skills to a team config, pin them — the same supply-chain hygiene argument that applies to MCP servers in the Claude Code MCP server configuration guide applies here too.

User scope by default; project scope when team-shared. Personal skills go in ~/.claude/skills/. Team-shared skills go in .claude/skills/ at the repo root and commit cleanly. Narrower scope wins on a name collision — same precedence rule MCP servers follow.

Don't install more than you need. Each registered skill costs roughly 100 tokens of catalog overhead on every turn — small per skill, but a hundred installed skills is 10,000 tokens of overhead before the model loads any actual SKILL.md body. The leaderboard rewards installs; your context window rewards restraint.

For Claude Code specifically, the version that matters is 2.x — earlier versions used a different registry path and an older SKILL.md schema. Run claude --version to confirm; if you're on 1.x, the install path above won't work until you update via the Claude Code install guide.

What Goes Inside a SKILL.md File?

The minimum viable SKILL.md is fifteen lines. The frontmatter declares three things — what the skill is called, when to use it, and what tools it's allowed to call. The body is the actual instructions. Here's the smallest publishable example:

---
name: commit-message-pro
description: |
  Use when the user asks to write a git commit message, draft a commit, or
  improve an existing message. Generates conventional commits with scope, type,
  and a clear subject line under 72 characters.
allowed-tools: ["Bash", "Read"]
---

# Commit Message Pro

When invoked:

1. Run `git diff --staged` to see what's actually being committed.
2. Detect the package or module changed from the file paths.
3. Choose a type from: feat, fix, refactor, docs, test, chore, perf.
4. Write a subject line under 72 characters, present tense ("add" not "added").
5. If the diff is non-trivial, add a body paragraph explaining the *why*.

Never include the words "various", "miscellaneous", or "updates" — they hide intent.

That's a complete skill. Drop it into ~/.claude/skills/commit-message-pro/SKILL.md, restart your session, ask Claude to write a commit, and the skill will load on demand. The body becomes part of the prompt only for that turn.

Frontmatter fields that matter. name is the directory name and the registry slug — kebab-case, no spaces. description is the single most important field in the whole file. Claude reads only the descriptions of installed skills on every turn (this is the progressive-disclosure shortcut) and decides which body to load based on that text alone. Write it as a when-to-use sentence, not a what-it-does one — "Use when the user is reviewing a pull request" beats "PR review helper" by a wide margin in practice. allowed-tools is an array of tool names the skill is permitted to invoke; ["Read", "Write", "Bash"] covers most needs.

Body structure. The body is just markdown for Claude to read. Two conventions earn their keep:

Open with "When invoked:" and a numbered list. It puts the model on rails. Claude reads top-to-bottom, and an explicit ordered procedure beats prose 9 times out of 10.
Separate when-to-use from how-to-execute. The frontmatter description tells Claude when to load the skill; the body tells Claude how to run it. Mixing them is the most common cause of a skill that loads at the wrong time or doesn't load when it should.

The scripts/ folder is where you put any code the skill needs to execute — Python, shell, whatever Claude can run with the tools you allowed. The skill body invokes them as needed (run scripts/extract.py). The references/ folder is for long-form documentation Claude pulls when relevant — API specs, style guides, glossaries. Reference files don't load by default; the body has to point to them.

Progressive disclosure is the architectural rule that makes this work. The catalog (every installed skill's name + description) is always in context, around 100 tokens per skill. The body of a specific skill loads only when Claude picks it. The scripts and references load only when the body asks for them. This is why a developer can install thirty skills without blowing their context window — versus an MCP setup where every connected server's full tool schema sits in context on every turn (Anthropic playbook, 2026).

The thing the playbook doesn't tell you: write the description three times before you ship. The first draft is always too vague. The second is too verbose. The third — usually after watching your skill not load on a request it obviously should have handled — is the one that actually works.

What I Learned Shipping codeprobe to skills.sh

I shipped codeprobe to skills.sh in April 2026 (github.com/nishilbhave/codeprobe). It's a 9-category code reviewer covering security, SOLID principles, architecture, error handling, performance, testing, code smells, design patterns, and framework best practices. Install path: npx skills add nishilbhave/codeprobe. Invocation: /codeprobe audit . from inside a project root. The skill scores each category 0–100 and emits findings as copy-pasteable fix prompts.

The original version was an MCP server. I rewrote it as a skill in a weekend and the result was both faster and easier to maintain. The reason maps directly to what skills are good at.

Why a skill and not an MCP server. A code review pass is a workflow recipe — read the diff, apply nine evaluation lenses, score per category, emit findings. There's no external API, no persistent state, no live data the server needs to broker. Every input it needs comes from the file system Claude already has access to. The MCP version ran a full JSON-RPC server, registered twelve tool schemas on every turn, and consumed roughly 9,000 tokens of overhead before I'd reviewed a single file. The skill version costs about 120 tokens until it activates — and it activates only when I actually ask for a review.

SKILL.md structure. The skill is one parent SKILL.md plus nine child references in references/security.md, references/solid.md, and so on. The parent body is short: five steps that read the project, auto-detect the stack (Python, TypeScript, React, PHP/Laravel, SQL), pick which of the nine categories actually apply, then pull the relevant reference files only for those categories. That's progressive disclosure two levels deep — the catalog points to the skill, the skill body points to the references, and each reference loads only when needed.

The auto-detect step matters. On a TypeScript-only repo, the SOLID and patterns references load but the PHP framework reference stays on disk. On a SQL-heavy backend, the security reference loads early and gets weighted higher in the final score. The whole engine works because Claude can read the file system before the skill commits to which lenses to apply.

Read-only is a feature, not a limitation. codeprobe never modifies user code. Every finding emits a separate fix prompt the user can copy-paste back to Claude for the actual change. I made this explicit in the frontmatter — allowed-tools: ["Read", "Glob", "Grep"], no Write, no Edit — so the model can't accidentally edit. The result is something teams will run on production code without a sandbox.

What I'd do differently. Two things. First, my initial description was "9-category code reviewer for security, SOLID, architecture, and more." It loaded inconsistently. Rewriting it as "Use when the user asks for a code review, security audit, architecture check, or wants to find code smells" — a when-to-use sentence rather than a what-it-does one — roughly doubled the activation rate within a day. Second, I shipped without the auto-detect logic in v0. Users on PHP projects got JavaScript-flavored feedback. I added auto-detect in v0.2 and the GitHub issues stopped. The lesson: skills run in whatever repo the user happens to be in. The skill has to inspect first, opine second.

The codeprobe skill works across 45+ agent surfaces — Claude Code, Cursor, Cline, GitHub Copilot, Windsurf — because they all read the same SKILL.md format. I didn't write any agent-specific code. That portability is the second-order reason skills are interesting: ship once, run everywhere.

How I Packaged youtube-inspector as Four Skills

The second skill — youtube-inspector (github.com/nishilbhave/youtube-inspector) — is a counterexample to "one skill per repo." It's actually four skills shipped together in a single repo:

Watch-or-skip verdict — analyzes a YouTube URL, returns a 0–10 score with a gap-analysis paragraph, recommends watch or skip
Section summaries — bulleted summary per ten-minute segment, flags skippable sections
Artifact extraction — pulls links, code snippets, tools mentioned, people referenced, timestamps
Claim inventory — verbatim transcript quotes for any substantive claim made

Install path: npx skills add nishilbhave/youtube-inspector. The four skills install together as a single bundle. Requirements: Python 3.13+, yt-dlp, youtube-transcript-api. MIT licensed.

Why four skills and not one mega-skill. The temptation when shipping something with multiple capabilities is to wrap them in one big SKILL.md and let the body decide which mode to run. I tried that first. The result was a description so vague — "use when analyzing YouTube videos" — that it loaded for queries it had no business loading for, and didn't load for queries it should have matched. Splitting into four narrower skills with sharp when-to-use descriptions fixed the activation problem. The user types "is this video worth watching?" and only the watch-or-skip skill fires. They type "extract the tools mentioned in this video" and only artifact extraction fires.

The shared infrastructure pattern. All four skills depend on the same Python scripts in scripts/ — download_transcript.py, parse_video.py, extract_artifacts.py. Rather than copy them four times, the parent directory holds the shared code and each child SKILL.md path-references it. The cost is one indirection in the body; the benefit is one place to update yt-dlp when YouTube changes its API again.

A real-world flag the skill caught. On a recent "$5,219 first month with this AI side hustle" video, the watch-or-skip skill returned a 3/10 with this note: "specific revenue claim is unsubstantiated; transcript shows no breakdown of store name, product spend, or refund rate; the thumbnail's $5,219 figure isn't corroborated by any screenshot in the video." I would have spent fifteen minutes watching before reaching the same conclusion manually. The skill reached it from the transcript in about twelve seconds. That's the kind of speed-up skills make easy that MCP servers don't.

What multi-skill packaging taught me. The directory model rewards sharp focus. If you find yourself writing a description with "or" in it ("use when the user wants X or Y"), split the skill. The catalog overhead of a second skill is around 100 tokens; the cost of a poorly-targeted skill that loads at the wrong time is unbounded. For the longer write-up on watch-or-skip specifically — what makes the scoring system non-trivial, why clickbait detection needs vision in addition to transcript — see my youtube-verdict skill walkthrough.

How Do You Get Your Skill Discovered?

Shipping a skill and getting it installed are two different problems, and most authors solve the first and ignore the second. The skills.sh install leaderboard is the only authority signal the marketplace currently has, and the way it ranks skills favors a small set of distribution patterns.

Source: Anthropic agent skills documentation, 2026

README first, SKILL.md second. The skills.sh listing pulls metadata from your GitHub README, not from the SKILL.md. A repo with a one-paragraph README will sit unnoticed even when the underlying skill is excellent. Lead the README with one line that finishes the sentence "Use this skill when…" — same shape as the SKILL.md description, but written for a human browser. Add one screenshot or terminal capture, an install command, and a usage example. That's it.

Submission flow. Open a PR to the vercel-labs/skills registry repo with your skill name, GitHub URL, category, and a one-line description. The Vercel Labs team has been merging within days through May 2026; once merged, your skill is installable via npx skills add owner/repo within hours.

Naming earns installs. Generic names (pr-helper, code-utils) lose discoverability fights to specific ones (pr-review-pro, git-cherry-pick-helper). The leaderboard ranks roughly by install velocity, and install velocity correlates with people typing the skill's domain into search. Pick a name a user would type if they didn't know the skill existed.

Cross-post deliberately. A skill listed only on skills.sh is missing the discovery layer. Submit to ClaudeSkills.io and let claudemarketplaces.com scrape it — both are free, and each adds a different inbound search surface. SkillsMP is optional; the inbound traffic is real but the listing quality is uneven.

The decision before you publish. Not everything that looks like a skill should ship as a skill. The cleanest test is the three-card decision above. If the answer is external API or persistent state, ship an MCP server. If the answer is multi-step agent workflow with parallel sub-tasks, ship a subagent. Skills are for the middle: domain expertise, workflow recipes, role personas, repeatable evaluation passes. For the full five-axis framework Anthropic recommends when choosing between primitives, see the full Skills vs MCP decision guide.

Frequently Asked Questions

Are Claude Skills free to install and ship?

Yes on both ends. Installing skills from skills.sh is free and unmetered; shipping your own skill requires a public GitHub repo and a PR to the registry — also free. The runtime cost is whatever your normal Claude usage bill is, which goes up only marginally because skills load on demand. Anthropic's pricing applies to the model tokens you'd consume anyway; skills don't add a per-install or per-invocation fee. Claude Code pricing and token limits covers the full pricing model.

What's the security model when I install a skill from skills.sh?

A skill is a folder of files Claude can read and a list of tools it's allowed to call. Whatever permissions your agent grants the skill, the skill has — including Bash if you allow it. The runtime doesn't sandbox skill execution beyond the existing tool-permission system. Install rule: read the SKILL.md and any scripts/ before installing. Anthropic's MCP guidance — "only install servers you trust" — applies identically here. A malicious skill can do anything your agent itself can do.

Do Claude Code skills work on Claude.ai too?

The SKILL.md format is the same across runtimes, so yes — with a caveat. Claude Code loads skills from ~/.claude/skills/ or .claude/skills/; Claude.ai loads them via the web upload interface. Claude.ai supports a narrower tool surface than Claude Code (no native Bash, for example), so a skill that depends on shell execution will run in Code but not Claude.ai. Skills that read files, write text, or invoke pre-allowed APIs port cleanly (Anthropic, 2026).

How do I version a skill so users can pin to a specific release?

The install path supports @<sha> and @<tag> suffixes — npx skills add owner/repo@v0.3.1 pins to a release tag. Tag your releases in git the same way you would for an npm package; the registry resolves the suffix to a git ref at install time. For breaking changes, bump the major in the tag and document the migration in the README. The skills.sh leaderboard counts cumulative installs across versions; the upgrade path is the user re-running npx skills add with a new tag.

Can I monetize a Claude skill on skills.sh?

Not directly through skills.sh in May 2026. The registry has no payment integration and no paid-tier listings. What works in practice: ship the skill free, document a paid companion service or API the skill calls into, and gate that service yourself. A handful of skills on the leaderboard already follow this model — the skill is open-source, the underlying service it queries (transcription, image generation, specialized analysis) bills the user separately. Whether Anthropic or Vercel Labs will add native monetization is an open question.

The Bottom Line

Claude Skills are the lightest distribution primitive Anthropic has shipped — a folder, a SKILL.md, and a registry entry is the whole pipeline. The marketplace consolidated faster than most 2026 ecosystems: skills.sh is the install channel, three aggregators handle discovery, and the leaderboard rewards specific names and sharp descriptions over generic catch-all skills.

If you do nothing else after reading this:

Install find-skills first so future skill discovery happens from inside the agent, not from a browser tab
Ship one small skill — even a 15-line SKILL.md — to feel the publishing flow end-to-end
Pin every installed skill to a SHA in shared team configs so a maintainer change can't ship into your next session
Write the description three times: too vague, too verbose, then the version that actually loads

The format is small enough that the right comparison isn't npm packages — it's shell aliases. Most of the value comes from a handful of skills that genuinely save five minutes a day. The next time someone asks how Claude is suddenly "so much better at one specific thing," you'll know the answer is probably a skill they shipped Replace with a specific date (e.g., "in March 2026").

{
"@context": "https://schema.org",
"@graph": [
{
"@type": "BlogPosting",
"headline": "Claude Skills Marketplace: Complete Guide to skills.sh and Building Your Own (2026)",
"description": "How the Claude Skills marketplace works in 2026 — skills.sh, install paths, SKILL.md anatomy, and what I learned shipping codeprobe and youtube-inspector to the directory.",
"datePublished": "2026-05-24",
"dateModified": "2026-05-24",
"author": {
"@type": "Person",
"name": "Nishil Bhave"
},
"image": "https://maketocreate.com/images/generated/claude-skills-marketplace-hero-v1.png",
"url": "https://maketocreate.com/claude-skills-marketplace-complete-guide/",
"keywords": ["claude skills marketplace", "claude skills", "skills.sh", "claude agent skills", "claude code"]
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Are Claude Skills free to install and ship?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes on both ends. Installing skills from skills.sh is free and unmetered; shipping your own skill requires a public GitHub repo and a PR to the registry — also free. The runtime cost is whatever your normal Claude usage bill is, which goes up only marginally because skills load on demand. Anthropic's pricing applies to the model tokens you'd consume anyway; skills don't add a per-install or per-invocation fee."
}
},
{
"@type": "Question",
"name": "What's the security model when I install a skill from skills.sh?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A skill is a folder of files Claude can read and a list of tools it's allowed to call. Whatever permissions your agent grants the skill, the skill has — including Bash if you allow it. The runtime doesn't sandbox skill execution beyond the existing tool-permission system. Install rule: read the SKILL.md and any scripts/ before installing. Anthropic's MCP guidance — 'only install servers you trust' — applies identically here. A malicious skill can do anything your agent itself can do."
}
},
{
"@type": "Question",
"name": "Do Claude Code skills work on Claude.ai too?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The SKILL.md format is the same across runtimes, so yes — with a caveat. Claude Code loads skills from ~/.claude/skills/ or .claude/skills/; Claude.ai loads them via the web upload interface. Claude.ai supports a narrower tool surface than Claude Code (no native Bash, for example), so a skill that depends on shell execution will run in Code but not Claude.ai. Skills that read files, write text, or invoke pre-allowed APIs port cleanly."
}
},
{
"@type": "Question",
"name": "How do I version a skill so users can pin to a specific release?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The install path supports @<sha> and @<tag> suffixes — npx skills add owner/repo@v0.3.1 pins to a release tag. Tag your releases in git the same way you would for an npm package; the registry resolves the suffix to a git ref at install time. For breaking changes, bump the major in the tag and document the migration in the README. The skills.sh leaderboard counts cumulative installs across versions; the upgrade path is the user re-running npx skills add with a new tag."
}
},
{
"@type": "Question",
"name": "Can I monetize a Claude skill on skills.sh?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Not directly through skills.sh in May 2026. The registry has no payment integration and no paid-tier listings. What works in practice: ship the skill free, document a paid companion service or API the skill calls into, and gate that service yourself. A handful of skills on the leaderboard already follow this model — the skill is open-source, the underlying service it queries bills the user separately. Whether Anthropic or Vercel Labs will add native monetization is an open question."
}
}
]
}
]
}

Claude Code MCP Server Configuration: 2026 Setup Guide

Nishil Bhave — Mon, 18 May 2026 18:10:45 +0000

Claude Code MCP Server Configuration: Step-by-Step Setup (2026)

The Model Context Protocol SDK crossed 97 million monthly downloads in March 2026, up from roughly 2 million at launch — a 4,750% climb in 16 months (Digital Applied citing Anthropic, 2026). The protocol is everywhere now. The official registry lists 9,400+ servers (MCP Manager, 2026), and Anthropic's reference repo carries 85.7k GitHub stars. Yet the average Claude Code user I talk to has exactly one MCP server installed.

I've been running a six-server setup across macOS and Windows for eight months and have hit nearly every config gotcha along the way. This Claude Code MCP server configuration guide is the working playbook: the scope hierarchy that finally clicked, the JSON anatomy that maps to it, three real server walkthroughs, and the debugging loop that takes you from "server isn't showing up" to "fixed" in under five minutes. For the broader landscape — what MCP is, which 30 servers are worth knowing, and where the ecosystem is heading — see the complete 2026 guide to MCP servers and the Model Context Protocol.

Key Takeaways

Claude Code reads three config locations in priority order: local (~/.claude.json keyed to project), project (.mcp.json at repo root, team-shared), and user (~/.claude.json global block). Highest priority wins.

claude_desktop_config.json is Claude Desktop's file — Claude Code never reads it. Importing requires claude mcp add-from-claude-desktop.

Three transports exist: stdio (local subprocess, fastest), HTTP (remote with OAuth or bearer tokens), and SSE (legacy, deprecated). Pick stdio for local tools, HTTP for everything else.

Pin versions in production configs. OX Security disclosed a systemic SDK flaw in April 2026 that put ~200,000 servers at risk (OX Security via The Register, 2026). npx -y pkg@latest is a rug-pull waiting to happen.

Where Does Claude Code MCP Server Configuration Live?

Claude Code reads MCP configuration from three locations with a strict precedence order, and getting the hierarchy wrong is responsible for about 80% of the "my server won't show up" reports I see in issue trackers. The three scopes are local, project, and user (Claude Code MCP docs, 2026).

Local scope lives in ~/.claude.json, keyed by the absolute path of the current project. It's the default when you run claude mcp add without a --scope flag. It applies only when you're working in that folder, and nothing in it gets committed. Use it for personal credentials and one-off experiments. Project scope is .mcp.json at the repo root, designed to be checked into git so the whole team picks up the same servers. Claude Code prompts each user to approve project servers on first run — a guardrail you want to keep, not bypass. User scope also lives in ~/.claude.json but applies globally. Use it for general-purpose servers like filesystem, fetch, or memory that you want available everywhere.

Source: Claude Code MCP documentation, 2026

The trap most people hit is assuming claude_desktop_config.json is the right file. It isn't — that's the Desktop app's config, living at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS or %APPDATA%\Claude\claude_desktop_config.json on Windows. Claude Code never reads it. If you've already configured the Desktop app and you're on macOS or WSL, run claude mcp add-from-claude-desktop and it imports the lot. Otherwise you'll be editing the wrong file all afternoon.

Per Anthropic's own docs: "Project servers in .mcp.json take precedence over user servers with the same name; local-scoped servers take precedence over project-scoped servers" (code.claude.com, 2026). The mental model that works: narrower scope wins. Your personal override beats the team config, the team config beats the global default. That's intentional — you might need to temporarily disable a project server without committing the change.

What Does the MCP Config JSON Actually Look Like?

The fast path is claude mcp add from your terminal, but knowing what it writes saves you when something breaks. Every Claude Code MCP server configuration entry sits under the mcpServers key with a stable shape: a command (or url for HTTP), optional args, optional env for environment variables, and optional headers for HTTP auth. Claude Code supports ${VAR} and ${VAR:-default} expansion in all string values, which means you can commit .mcp.json without leaking secrets.

Here's an annotated config covering all three transport types in one file:

{
  "mcpServers": {
    "fs": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "${HOME}/code",
        "${HOME}/Documents/notes"
      ]
    },
    "github": {
      "type": "http",
      "url": "https://api.githubcopilot.com/mcp/",
      "headers": {
        "Authorization": "Bearer ${GH_PAT}"
      }
    },
    "notion": {
      "type": "http",
      "url": "https://mcp.notion.com/mcp"
    },
    "supabase": {
      "command": "npx",
      "args": [
        "-y",
        "@supabase/mcp-server-supabase@0.4.5",
        "--access-token",
        "${SUPABASE_TOKEN}"
      ],
      "env": {
        "NODE_NO_WARNINGS": "1"
      }
    }
  }
}

Four things to notice. First, type is http, sse, or omitted (stdio is the default — no type field means it's a local subprocess). Second, env-var expansion works in command, args, url, headers, and the env block; keep secrets in your shell, not in the JSON. Third, pinned versions: @0.4.5 on Supabase, not @latest. The @latest tag is exactly the supply-chain footgun the security section gets to in a minute. Fourth, the env block lets you set runtime variables the server sees but Claude Code itself doesn't.

According to Anthropic's documentation, the JSON form via claude mcp add-json is the safest install path on Windows because it sidesteps a known shell-quoting bug where -- rewrites /c to C:/ in the saved config (anthropics/claude-code #4019, 2026). On Mac and Linux either form works.

How Do You Add Your First Three MCP Servers?

Let's wire up three real servers that cover the full spectrum: filesystem (stdio, no auth), GitHub (HTTP with bearer token), and Notion (HTTP with browser OAuth). User scope so they're available everywhere.

# 1. Filesystem (Anthropic reference) — stdio, paths whitelisted as args
claude mcp add --scope user fs -- \
  npx -y @modelcontextprotocol/server-filesystem ~/code ~/Documents/notes

# 2. GitHub — HTTP, bearer token from a fine-grained PAT
export GH_PAT="ghp_xxxxxxxxxxxxxxxxxxxx"
claude mcp add --transport http --scope user github \
  https://api.githubcopilot.com/mcp/ \
  --header "Authorization: Bearer $GH_PAT"

# 3. Notion — HTTP, opens OAuth in your browser
claude mcp add --transport http --scope user notion https://mcp.notion.com/mcp

# Verify
claude mcp list
claude mcp get github

Every flag must appear before the server name. The -- separates Claude Code's flags from the server command itself, and forgetting it is the single most common cause of "my server won't connect" reports in the issue tracker. After claude mcp add the server is registered but not yet running — Claude spawns it on the next session start.

The filesystem server is your "hello world" — it's the smallest reference implementation and it surfaces every scope and pathing gotcha in one shot. Whitelist only the directories Claude actually needs; granting it your whole home directory is the MCP equivalent of chmod 777. The GitHub server uses a bearer token from a fine-grained personal access token. Scope it to repo:read for most workflows; never grant repo:write unless you specifically want Claude pushing commits. For the full surface — toolset trimming, PAT-vs-App rate-limit math, and the 7 use cases worth the schema overhead — see the GitHub MCP server field-notes guide. The Notion server demonstrates the OAuth flow — claude mcp add opens your browser, you approve, and the token lands in ~/.claude.json under the server's oauth.token field. Never commit that file.

Stdio, HTTP, or SSE — Which Transport Should You Pick?

MCP supports three wire transports and the choice matters more than it looks. Stdio launches a subprocess and pipes JSON-RPC over stdin/stdout — fastest, but the server must live on your machine. HTTP uses regular requests and is the only option for remote servers and OAuth flows. SSE (Server-Sent Events) was the original streaming transport and is now deprecated in favor of streamable HTTP (Claude Code MCP docs, 2026). The decision tree is short: local tool → stdio, remote service → HTTP, anything labeled SSE → use it for now but expect migration to HTTP.

Source: Claude Code MCP transport documentation, 2026

The practical implication is auth surface. A stdio server runs as a subprocess of Claude Code with your full user privileges — no network auth needed, but anything that process can read, the model can read. An HTTP server lives behind whatever auth you configure (OAuth, bearer token, mTLS), which is more setup but a tighter blast radius. Per Anthropic's own security guidance, "MCP servers may execute arbitrary code and access local resources — only install servers you trust" (code.claude.com, 2026). That sentence is doing more work than it looks like.

Why Doesn't My Server Show Up in `/mcp list`?

When a server you just added doesn't appear, failures cluster into four buckets. The diagnostic loop is short.

Bucket 1 — wrong file, wrong scope. Run claude mcp list in the project directory. If your server isn't there, it's in the wrong scope. Run claude mcp get <name> for the server you expect to see; the output tells you exactly which scope file it lives in. Move it with claude mcp remove <name> and re-add with an explicit --scope flag. This catches maybe half of "missing server" reports on its own.

Bucket 2 — server crashes on startup. Open the in-session /mcp panel and you'll see the server status. Red means it failed to start. The actual error lives in Claude Code's logs at ~/.claude/logs/. Common causes: missing env vars (Supabase needs --access-token), wrong Node version (most servers require ≥ 18), or npx hasn't cached the package and the 10-second startup timeout fires first. Bump MCP_TIMEOUT=15000 (milliseconds) in your shell environment if cold starts are killing you.

Bucket 3 — server starts but no tools appear. This is almost always ENABLE_TOOL_SEARCH deferring schema load. Claude Code v2.x ships with auto:5 as the default, which means tool schemas only register after the model actively searches for them. Either invoke a tool by name in your prompt (@supabase list_tables) or set ENABLE_TOOL_SEARCH=off to force eager loading. Trade-off: you pay the schema cost on every turn.

Bucket 4 — OAuth flow hangs. Most common with Notion, Linear, or Atlassian remote servers. The callback port (default 8765) is often taken by another process. Pass --callback-port 9876 to claude mcp add to pick a free port, or kill whatever's bound to 8765 with lsof -i :8765. After OAuth completes, the token lives in ~/.claude.json under the server's oauth.token field.

# The reliable diagnostic sequence
claude mcp list                     # is it registered?
claude mcp get supabase             # which scope, what command?
claude --debug                      # run with verbose logs
tail -f ~/.claude/logs/mcp-*.log    # watch logs in another window

If all four buckets are clean and the server still doesn't work, run it manually outside Claude Code (npx -y @supabase/mcp-server-supabase@latest --access-token $TOKEN) and check that it responds to a basic initialize JSON-RPC call. If the standalone server is broken, it's the server's bug, not yours. For failures outside MCP itself — API errors, process exits, OAuth 403s, and file-editing problems — keep the Claude Code errors troubleshooting guide open in the next tab.

What Are the Most Common MCP Config Mistakes?

After eight months of running this setup, the same handful of mistakes account for most of the lost time. They're worth memorizing because they're invisible until you know what to look for.

Mistake 1 — bare npx on Windows. Claude Code spawns server processes via Windows' native CreateProcess API, which doesn't resolve .cmd shims, and npx on Windows is exactly that. The result is spawn npx ENOENT or, worse, a silent disconnect with no error in the logs. Wrap explicitly in cmd /c:

{
  "mcpServers": {
    "seq-thinking": {
      "command": "cmd",
      "args": ["/c", "npx", "-y", "@modelcontextprotocol/server-sequential-thinking"]
    }
  }
}

The cleanest answer for Windows is WSL2. Inside the Linux subsystem, npx works natively, configs live in the Linux filesystem, and claude mcp add-from-claude-desktop even bulk-imports existing Desktop configs (Mads Hovgaard, 2026). The 20-minute setup is cheaper than the bug surface.

Mistake 2 — secrets in plaintext JSON. Per Anthropic, "only install servers you trust" isn't advice, it's a requirement. A stdio MCP server runs with your full user privileges, and a remote HTTP server receives whatever you put in headers. Use ${VAR} expansion so secrets stay in your shell environment, or — better on macOS — store them in Keychain and pull through a wrapper script:

# Five-line wrapper, called from .mcp.json
#!/usr/bin/env bash
exec npx -y @supabase/mcp-server-supabase@0.4.5 \
  --access-token "$(security find-generic-password -s supabase-mcp -w)"

Five lines, no secrets in dotfiles (Kahunam, 2026).

Mistake 3 — npx -y pkg@latest. OX Security disclosed a systemic flaw in Anthropic's official MCP SDK (TypeScript, Python, Java, and Rust) in April 2026 that put approximately 200,000 servers at risk via STDIO command injection, affecting 150 million combined SDK downloads (OX Security via The Register, 2026). Patched, but the lesson: a compromised maintainer can push malware to your next session if you never pin versions. Pin them.

Mistake 4 — token bloat from unused servers. Every server you connect re-registers its tool schemas at every conversation turn. MindStudio's 2026 benchmark found that a Supabase + GitHub + Linear stack of 81 tools consumed over 20,000 tokens before the user typed a single character — about 16% of a 128k context window (MindStudio, 2026). If you call a server twice a week, it's costing you 100+ schema-loads to save 2 paste operations. Use the CLI version instead and expose it via Bash. Same result, near-zero overhead. If the job is really about reusable prompt behavior rather than external tools, use this Claude Skills vs MCP Servers decision guide before adding another server.

Mistake 5 — forgetting the project-approval reset. When you pull a teammate's .mcp.json, Claude Code shows you the approval prompt once and remembers your choice. If the team later changes the server set, you may not get re-prompted. Run claude mcp reset-project-choices after pulling changes so you can re-approve from scratch.

What's the Working Strategy for Most Developers?

After eight months of running MCP across two laptops, here's what holds up. Run four servers at user scope: filesystem, fetch, memory, and Context7. They're cheap on tokens, useful everywhere, and not worth re-configuring per project. Add GitHub at user scope if you ship code. Add Playwright at user scope if you do any browser testing. Add Sequential Thinking only when you actually need structured reasoning traces; the trade-offs are covered in the Sequential Thinking MCP guide for Claude Code. Then put project-specific servers — Supabase, Notion, Linear — in .mcp.json at the repo root with env-var references, commit it, and document the required environment variables in your README.

In raw numbers: my user-scope config runs six servers and consumes roughly 8,000 tokens of schema overhead per turn. Project .mcp.json files add another 15,000–28,000 depending on the project. The 16% context threshold the MindStudio benchmark flagged is real but manageable. If you find yourself there, the answer isn't usually fewer servers — it's enabling ENABLE_TOOL_SEARCH=auto:5 (the default) and letting Claude lazy-load schemas on demand.

The Microsoft Playwright server has 32.5k GitHub stars and ships an update every five days as of May 2026 — by far the most maintained browser-automation MCP. GitHub's official server sits around 29.7k stars (The Agent Times, 2026). Both are well past "experimental" and worth your day-one install slot.

Frequently Asked Questions

Is `.mcp.json` the same as `claude_desktop_config.json`?

No. claude_desktop_config.json is Claude Desktop's config file at ~/Library/Application Support/Claude/ on macOS or %APPDATA%\Claude\ on Windows. Claude Code never reads it. Claude Code uses .mcp.json at the repo root for project-scoped servers and ~/.claude.json for local and user scopes. To import existing Desktop servers into Claude Code, run claude mcp add-from-claude-desktop (macOS or WSL only).

How many MCP servers should I run?

Six to eight is the sweet spot for solo developers, based on token-budget analysis. Four at user scope (filesystem, fetch, memory, Context7) covers daily generalist work, two more (GitHub, Playwright) if you ship code or test in browsers, and 2–4 at project scope for whatever's specific to that repo. More than ten servers and you'll burn 20%+ of your context window on tool schemas before typing a prompt (MindStudio, 2026).

Why does `npx` fail on Windows but work everywhere else?

Claude Code spawns processes via Windows' CreateProcess, which doesn't resolve .cmd shims like npx.cmd. Wrap the command in cmd /c (e.g., "command": "cmd", "args": ["/c", "npx", "-y", "package"]) or — strongly recommended — install Claude Code inside WSL2 where npx works natively. The CLI form has a known parser bug that mangles cmd /c quoting on Windows; use claude mcp add-json or hand-edit the JSON (anthropics/claude-code #4019, 2026).

How do I share an MCP server config with my team?

Add it to .mcp.json at the repo root with ${VAR} placeholders for any secrets, commit the file, and document the required environment variables in your README. Claude Code prompts each team member to approve project-scoped servers on first run — a security guardrail you want to keep. Each teammate can override locally in ~/.claude.json if a specific server isn't useful for them. Run claude mcp reset-project-choices if the team config changes and you need to re-approve.

Can MCP servers see my API keys?

Yes, if you put them in the env block or command args. A stdio MCP server runs as a subprocess of Claude Code with your full user privileges, and a remote HTTP server receives whatever you pass in headers. Store sensitive credentials in macOS Keychain or Windows Credential Manager and pull them through a wrapper script, or use shell env-var expansion (${VAR}) so secrets never land in dotfiles. Anthropic's guidance — "only install servers you trust" (Claude Code MCP docs, 2026) — is a hard requirement, not a suggestion.

The Bottom Line

MCP is the layer where Claude Code stops being a chat window and starts being a workspace. The config looks dense the first time you open .mcp.json, but the mental model is small: three scopes, three transports, a handful of servers that earn their context-window cost. Most users install one server. The gap between "one server" and "the right six" is where Claude Code goes from "useful sometimes" to "I can't work without this."

If you do nothing else after reading this:

Install filesystem at user scope tonight as your "hello world"
Add GitHub at user scope if you push code, scoped to repo:read
Pin versions in .mcp.json so a compromised maintainer can't ship malware to your next session
Run claude mcp list whenever something feels off and claude --debug when it actually breaks

The protocol is still evolving — 30+ CVEs filed in the first year, an SDK supply-chain flaw patched in April 2026, OAuth scopes still tightening. Pin what you can, scope what you can't pin, and review the /mcp panel before approving a new project's servers. Most friction is on the configuration side, not the protocol side, and most of that friction is the kind this guide covers.

The next time someone asks why their claude_desktop_config.json "isn't working," you'll already know what to tell them.

{
"@context": "https://schema.org",
"@graph": [
{
"@type": "BlogPosting",
"headline": "Claude Code MCP Server Configuration: 2026 Setup Guide",
"description": "Learn Claude Code MCP server configuration across local, project, and user scopes. Use proven JSON patterns to fix 80% of setup failures before wasting hours.",
"datePublished": "2026-05-15",
"dateModified": "2026-05-15",
"author": {
"@type": "Person",
"name": "Nishil Bhave"
},
"image": "https://images.unsplash.com/photo-1558494949-ef010cbdcc31?w=1200&h=630&fit=crop&q=80",
"url": "https://maketocreate.com/claude-code-mcp-configuration-complete-guide/",
"keywords": ["claude code mcp server configuration", "claude code mcp setup", "configure mcp server claude", "model context protocol", "claude code"]
},
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Is .mcp.json the same as claude_desktop_config.json?",
"acceptedAnswer": {
"@type": "Answer",
"text": "No. claude_desktop_config.json is Claude Desktop's config file at ~/Library/Application Support/Claude/ on macOS or %APPDATA%\Claude\ on Windows. Claude Code never reads it. Claude Code uses .mcp.json at the repo root for project-scoped servers and ~/.claude.json for local and user scopes. To import existing Desktop servers into Claude Code, run claude mcp add-from-claude-desktop (macOS or WSL only)."
}
},
{
"@type": "Question",
"name": "How many MCP servers should I run?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Six to eight is the sweet spot for solo developers, based on token-budget analysis. Four at user scope (filesystem, fetch, memory, Context7) covers daily generalist work, two more (GitHub, Playwright) if you ship code or test in browsers, and 2-4 at project scope for whatever's specific to that repo. More than ten servers and you'll burn 20%+ of your context window on tool schemas before typing a prompt."
}
},
{
"@type": "Question",
"name": "Why does npx fail on Windows but work everywhere else?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Claude Code spawns processes via Windows' CreateProcess, which doesn't resolve .cmd shims like npx.cmd. Wrap the command in cmd /c (e.g., \"command\": \"cmd\", \"args\": [\"/c\", \"npx\", \"-y\", \"package\"]) or — strongly recommended — install Claude Code inside WSL2 where npx works natively. The CLI form has a known parser bug that mangles cmd /c quoting on Windows; use claude mcp add-json or hand-edit the JSON."
}
},
{
"@type": "Question",
"name": "How do I share an MCP server config with my team?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Add it to .mcp.json at the repo root with ${VAR} placeholders for any secrets, commit the file, and document the required environment variables in your README. Claude Code prompts each team member to approve project-scoped servers on first run — a security guardrail you want to keep. Each teammate can override locally in ~/.claude.json if a specific server isn't useful for them. Run claude mcp reset-project-choices if the team config changes and you need to re-approve."
}
},
{
"@type": "Question",
"name": "Can MCP servers see my API keys?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes, if you put them in the env block or command args. A stdio MCP server runs as a subprocess of Claude Code with your full user privileges, and a remote HTTP server receives whatever you pass in headers. Store sensitive credentials in macOS Keychain or Windows Credential Manager and pull them through a wrapper script, or use shell env-var expansion (${VAR}) so secrets never land in dotfiles. Anthropic's guidance — 'only install servers you trust' — is a hard requirement, not a suggestion."
}
}
]
}
]
}

Claude Code Errors: Every Code, Cause, and Fix (2026 Guide)

Nishil Bhave — Sun, 17 May 2026 12:28:49 +0000

Claude Code Errors: The Definitive Troubleshooting Reference (Every Code, Cause, and Fix)

Anthropic's annualized run rate hit $30 billion by April 2026 on the back of an 80x jump in Q1, with Claude Code alone crossing $1 billion within six months of public launch (VentureBeat, 2026). The npm package @anthropic-ai/claude-code now sees roughly 6.5 million weekly downloads (npm, 2026). All of that growth has produced a corresponding wall of cryptic terminal errors — and a 5,000+ open-issue GitHub repo (github.com/anthropics/claude-code, 2026) where the same eight messages get re-asked weekly.

This is the reference I wish I had two years ago. Every error in this guide is one I've personally hit or watched a teammate hit, traced to a verified root cause, and fixed with a command that actually worked. No "try restarting." No "check your internet." Scroll, Ctrl+F your exact error text, and you should be unblocked in 10 seconds.

the hooks layer that prevents many of these errors from happening in the first place

Key Takeaways

"Process exited with code 1" is overloaded — it hides at least five root causes (env var conflict, Windows Bash path, VS Code CSP, corrupted session, MCP crash). Always run claude doctor first.

API 400 is almost never a real bad request — it's a CLI/model version mismatch with thinking.type.enabled, stale top_p, or a third-party gateway dropping the anthropic-beta header.

"Too slow" is rarely the model; community measurements show MCP tool definitions burning up to 66,000 tokens of context before your first prompt (Scott Spence, 2026). With the 1M context window now standard, that overhead is "only" ~6.6% of the window — but it's still tokens you paid for, cache-invalidations you triggered, and routing latency you ate on every turn.

Weekly rate limits were introduced August 28, 2026; Anthropic raised them 50% across all paid tiers on May 13, 2026 and stopped publishing per-tier hour numbers (Anthropic, May 2026). On Pro the binding constraint is the 5-hour session, not the weekly cap.

Anthropic's April 23 postmortem (2026) confirmed three engineering changes degraded Claude Code for over a month, then reset every subscriber's limits as compensation.

How Do You Triage a Claude Code Error in 60 Seconds?

Before you Google the message, run three commands. Stack Overflow's 2026 Developer Survey found that 45% of developers say debugging AI-generated code now takes longer than writing it from scratch (Stack Overflow, 2026), and most of that time is spent triaging silent failures. Claude Code ships with built-in tooling that collapses that triage window from hours to seconds.

claude doctor          # 1. checks Node, auth, env vars, MCP servers
claude --debug         # 2. re-runs your last action with verbose logs
/usage                 # 3. inside a session: see rate-limit state

claude doctor alone catches roughly half of every "exited with code 1" report on GitHub. The other half need the decision tree below.

I keep this tree pinned in my notes. About 80% of the time my error fits one branch and I'm back in flow inside two minutes. The remaining 20% is when I learn something new — usually a Bun-runtime quirk on Windows or a stale credential I forgot I'd exported six months ago.

the subagent patterns that make long sessions less error-prone in the first place

"Error: Claude Code process exited with code 1" — What Actually Causes It?

This is the most overloaded error message in modern AI dev tooling. The same exit code wraps at least five completely different root causes, and the official error reference doesn't disambiguate them. After watching this fail in production a dozen times, here's the actual mapping.

Error: Claude Code process exited with code 1
  at ChildProcess.<anonymous> (claude.js:142:14)

Cause 1 — Auth conflict (most common). A stale ANTHROPIC_API_KEY env var collides with the OAuth token from /login. The environment variable wins, the OAuth token gets ignored, and the resulting credential mismatch crashes the parent process before the prompt loop opens. This is the single root cause for over 30% of code-1 reports I've seen, including the heavily commented Issue #8557.

Cause 2 — Windows Git Bash path. Claude Code on Windows expects bash.exe. If CLAUDE_CODE_GIT_BASH_PATH isn't set, the spawn fails (Issue #51886).

Cause 3 — VS Code extension CSP regression. A March 2026 build of the extension shipped with connect-src missing a-api.anthropic.com, breaking every request with exit 1 (Issue #14295). Fixed in 2.0.2+ (version floor stated as of May 2026 — verify against the current marketplace listing).

Cause 4 — Corrupted session transcript. Resume mode reads a malformed .jsonl and crashes on parse.

Cause 5 — MCP server crash on launch. A bad entry in .claude/settings.json kills the parent before the prompt loop starts.

# Fix in order — each one resolves a different cause
unset ANTHROPIC_API_KEY            # Cause 1
claude /logout && claude /login

# Windows only — Cause 2
setx CLAUDE_CODE_GIT_BASH_PATH "C:\Program Files\Git\bin\bash.exe"

# Cause 3 — update the VS Code extension
# Marketplace → Claude Code → Update (must be 2.0.2+ as of May 2026; check current min)

# Cause 4 — quarantine the bad session
mv ~/.claude/projects/<project>/sessions/<id>.jsonl{,.bak}

# Cause 5 — disable MCP, see which one's broken
mv ~/.claude/settings.json ~/.claude/settings.json.bak
claude   # if it boots, your MCP config killed it

Prevention. Run claude doctor after every install or major upgrade — and if you're getting these errors on a fresh machine, the root cause is often the install itself, not Claude Code. The platform-by-platform install guide covers the gotchas (Node version, PATH conflicts, WSL quirks) that cause most "exit code 1" reports. If the issue traces back to MCP startup, the MCP configuration playbook walks through scope hierarchy and debugging. Never set ANTHROPIC_API_KEY if you're on a Pro or Max OAuth subscription — pick one auth path and stick to it.

My contrarian take: Anthropic should split this into typed errors (E_AUTH_CONFLICT, E_SESSION_PARSE, E_MCP_BOOT) the way Go's net/http does with ErrServerClosed. Until then, claude doctor is mandatory triage — not optional.

"Claude Code process exited with code 3" — The Bun Runtime Tell

Exit code 3 is rarer than code 1 but more specific: it's almost always a runtime mismatch between Claude Code's bundled Bun interpreter and your environment. Anthropic shipped Bun 1.2.23 inside the v2.x binaries as of May 2026 (check claude doctor output for the version your binary is actually carrying), and on Windows after a password change the credential-manager handshake throws ENOTCONN and crashes with exit 3 (Issue #9217, #35990).

$ claude
Claude Code process exited with code 3
TLSWrapError: ENOTCONN

Root causes. Node below 18 (the legacy installer's minimum), an unexported ANTHROPIC_API_KEY referenced but empty in the current shell, or arm64 quirks. The latter hits Jetson Orin developers in particular, where Bun's TLS event loop can hang indefinitely (Issue #58680).

node --version                       # must be >= 20 (current LTS)
claude doctor                        # confirms the runtime requirements

# Windows after a password change:
# 1. Open Credential Manager, delete every "Claude Code" entry
# 2. claude /logout && claude /login

If you're on Linux arm64 and Bun keeps hanging, the workaround is the legacy npm installer (npm install -g @anthropic-ai/claude-code-legacy) which uses the Node runtime instead.

The Windows password-change trap. I hit exit code 3 on a Windows laptop two days after rotating my domain password. The credential manager had cached an OAuth-derived token signed under the old credential, the Bun TLS handshake threw ENOTCONN, and Claude Code refused to start. Took me an hour to trace because every guide tells you to check Node. Deleting the Claude Code entries from Credential Manager and re-running /login fixed it in 30 seconds. Now I add it to my password-rotation checklist.

the broader Claude Code config model these errors surface in

"API Error: 400" — Which Variant Did You Hit?

The 400 family is the most-Googled Claude Code error category, and approximately none of the matches you get on the first page are correct. Almost every 400 is a CLI-version or third-party-gateway issue, not a malformed body. The official error reference lists six distinct sub-cases. Here's the mapping I use:

Error text fragment	Real cause	Fix
`Extra inputs are not permitted ... context_management`	LLM gateway dropping the `anthropic-beta` header	Configure gateway to forward it, or `export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`
`unexpected tool_use_id` / `thinking blocks ... cannot be modified`	Conversation history desync after an interrupted tool call	Press Esc twice or `/rewind` to checkpoint back
`thinking.type.enabled is not supported`	CLI older than v2.1.111 sending the old config to Opus 4.7	`claude update`
`max_tokens must be greater than thinking.budget_tokens`	`MAX_THINKING_TOKENS` set above the platform's output cap	Lower `MAX_THINKING_TOKENS` or raise `CLAUDE_CODE_MAX_OUTPUT_TOKENS`
`This organization has been disabled`	Stale `ANTHROPIC_API_KEY` from a disabled org masking your subscription	`unset ANTHROPIC_API_KEY`
400 + `top_p` / `top_k` / `temperature`	Opus 4.7 removed sampling params; older CLI still sending them	Upgrade to 2.1.70+

The 2.1.70 changelog confirms this directly: "Fixed API 400 errors when using ANTHROPIC_BASE_URL with a third-party gateway" and "Fixed API Error: 400 This model does not support the effort..." (Claude Code changelog, 2026). If you can't tell which variant you hit, the universal first move is claude update followed by /rewind — that single pair fixes about two-thirds of 400 reports.

I hit the tool_use_id variant for the first time last March on a session where I'd Ctrl-C'd Claude mid-tool-call to reword my prompt. The next message threw a 400 every time. /rewind fixed it in one shot. Now I never Ctrl-C during a tool call — I let it finish, then redirect.

the audit workflow where I keep hitting this 400 variant the most

"API Error: Connection error" — It's Almost Always the Proxy

According to Anthropic's docs, the connection-error family surfaces as ECONNREFUSED, ECONNRESET, ETIMEDOUT, fetch failed, or Request timed out (error reference, 2026). In a corporate network, the #1 cause by a wide margin is TLS interception by a proxy whose CA isn't in Node's trust store — the cert from the proxy looks self-signed, Node bails.

API Error: Connection error
  cause: Error: self-signed certificate in certificate chain
  code: 'SELF_SIGNED_CERT_IN_CHAIN'

Other real causes I've seen. A region-restricted VPN blocking api.anthropic.com (Issue #30318). A missing HTTPS_PROXY (lowercase https_proxy sometimes isn't picked up). The May 2026 incident where Anthropic changed outbound IPs and broke GitHub Enterprise allowlists for Claude Code remote sessions (status.anthropic.com, 2026). And — my favorite — a stale utun interface left behind by a VPN client I'd uninstalled six months earlier.

# Step 1: confirm the network path works at all
curl -I https://api.anthropic.com    # expect HTTP/2 200

# Step 2: secure proxy setup (do this — don't disable TLS verification)
export NODE_EXTRA_CA_CERTS=/path/to/corp-ca-bundle.pem
export HTTPS_PROXY=http://proxy.corp:8080

# Step 3: bump timeout for slow corp networks (default is too tight)
export API_TIMEOUT_MS=900000

# Anti-pattern — DO NOT do this
# export NODE_TLS_REJECT_UNAUTHORIZED=0
# The proxy can now MitM your API key in plaintext

The security note matters. Disabling TLS verification with NODE_TLS_REJECT_UNAUTHORIZED=0 is the single most common "fix" suggested on Stack Overflow for this error, and it's actively dangerous — every byte of your prompts and your API key is now legible to the proxy operator. Get the CA bundle from your infra team. It's worth the 10-minute Slack message.

A second class of connection failures is genuinely server-side. Anthropic's status page records the May 2026 IP-range change that broke GitHub Enterprise allowlists for hours (status.anthropic.com, 2026), and third-party trackers like StatusGator have logged over a thousand incident events since the platform launched. If curl -I https://api.anthropic.com succeeds and Claude Code still can't reach the API, check the status page before you tear apart your network config.

"OAuth error 403" — Your Token Is Missing a Scope

The 403 family is the price you pay for Claude Code's OAuth-first auth model. When it works, it's seamless. When it fails, it fails in ways that are confusing because the same status code covers four different problems: missing scopes, no Pro/Max subscription, daemon eviction, and organization access changes.

OAuth Error: 403 Forbidden
  scope_required: user:sessions:claude_code
  scope_granted: user:profile

The dominant cause on macOS Max subscribers is that the token was minted before the user:sessions:claude_code scope existed, and OAuth refresh keeps dropping it (Issue #34785, #28583). The fix is brutally simple — /logout && /login re-mints the token with current scopes — but the docs are quiet about why this works.

# The universal 403 fix
/logout
/login

# Desktop app stuck in an OAuth 403 spinner loop
rm -rf ~/Library/Application\ Support/Claude/
# then relaunch and re-auth

Other 403 causes worth knowing. Free tier doesn't get Claude Code — Pro ($20) is the minimum paid tier, and all paid tiers (Pro, Max $100, Max $200) include full Opus 4.7 access (Anthropic support, 2026); a 403 isn't a "you need to upgrade to Max for Opus" message — it's an auth-scope or subscription-status problem. A personal Stripe subscription on a business email domain triggers org-level scope restrictions server-side. And on Max plans tied to a personal Gmail, the Remote Control daemon can drop with "no longer a member of the organization" after long idle periods (Issue #53635).

the multi-model setup where I keep four sets of credentials alive without scope collisions

"Error Editing File" — CRLF Will Ruin Your Afternoon

The Edit-tool failures cluster around two root causes, and one of them is responsible for nearly every Windows report.

Error: The string to replace was not found in the file.
  file: src/app.tsx
  old_string: "const App = () => {\n  return <Layout>"
  reason: file content did not match

Cause 1 — CRLF vs LF line endings (Windows). The Edit tool doesn't normalize \r\n versus \n before matching. An old_string with LF won't match a file saved as CRLF. This single root cause shows up in Issue #13456 and #27718 and is the #1 source of edit failures on Windows.

Cause 2 — File modified mid-edit. Your editor's format-on-save (Prettier, Ruff, gofmt) rewrites the file between Claude's Read and Edit calls. The content drifts. The Edit fails.

Cause 3 — BOM or UTF-16 encoding. Invisible byte-order marks break exact-string matches even when the visible content looks identical.

# Force LF in your repo (the durable fix)
git config --global core.autocrlf input
echo "* text=auto eol=lf" > .gitattributes
git add --renormalize .
git commit -m "Normalize line endings"

# Disable format-on-save while Claude Code is running
# VS Code: settings.json → "editor.formatOnSave": false

# If you're on Alpine, NixOS, or a non-glibc distro:
brew install ripgrep                # macOS
sudo apt install ripgrep            # Ubuntu
export USE_BUILTIN_RIPGREP=0

The official troubleshooting page confirms: "Claude usually succeeds on retry" for these errors (Anthropic Troubleshooting, 2026). When stuck, the second-attempt success rate is high enough that retrying once before debugging is the right move.

"Claude Code Is Too Slow" — It's the MCP Tools, Not the Model

When Claude Code feels sluggish, every reflex tells you to blame the model. Switch from Opus to Sonnet. Wait for the next release. Submit feedback. None of that is the actual lever. Community measurements show MCP tool definitions can eat tens of thousands of tokens before your first prompt; one well-documented case clocked 66,000 tokens just on tool definitions before any user work began (Scott Spence, 2026). With Anthropic's 1M context window now standard for Claude Code, that 66K is "only" ~6.6% of the window — but the cost is real on every single turn: cache invalidations, slower routing, and tokens you paid for whether you used them or not. A bigger window dilutes the percentage, not the cost.

The real fixes, in order of effectiveness:

/context              # audit what's eating tokens THIS session
/mcp disable <name>   # remove every MCP server you aren't using right now
/compact              # before you hit the limit, not after
/clear                # nuclear: start fresh with same project
/model sonnet         # 2-3x faster than Opus for routine coding
/heapdump             # if memory pressure, writes JS heap to ~/Desktop

If you're on WSL, move the project off /mnt/c/. Reading from the cross-mount is roughly 10x slower than the Linux filesystem and triggers Autocompact is thrashing in long sessions. The official troubleshooting page documents this explicitly.

The contrarian framing. Most "Claude Code is slow" advice tells you to switch models. The model is rarely the bottleneck. MCP tool bloat is — and unlike model latency, it's something you control. Audit .claude/settings.json, scope MCP servers to the projects that actually need them, and you'll get more speed back than any model swap delivers.

why deterministic context layers matter as much as the model choice

"Approaching 5-Hour Usage Limit" — How the Two-Tier Cap Actually Works

The rate-limit story changed twice in less than a year. Anthropic first added weekly rate limits to Pro and Max plans on August 28, 2026, on top of the pre-existing 5-hour rolling session window. They estimated the change would affect "less than 5% of subscribers" at the time (TechCrunch, 2026). Then on May 13, 2026 they raised those weekly caps 50% across all paid tiers (in effect through July 13, 2026) and stopped publishing per-tier hour numbers — the marketing pages now describe Max in multipliers of Pro ("5x Pro", "20x Pro") rather than absolute hour budgets (Anthropic, May 2026). The practical takeaway: on Pro the binding constraint is almost never the weekly cap any more — it's the 5-hour session window, especially on Opus.

For the full pricing breakdown, per-tier math, and the throttle log from my own five months of cycling Pro → Max → Pro: the complete pricing & limits guide, plan by plan, with first-hand throttle data.

The mental model that fixes 90% of the confusion: there are now three distinct limit messages, not one.

session limit · resets 3:45pm                    # 5-hour rolling window (the one you'll actually hit on Pro)
weekly limit · resets Mon 12:00am                # overall weekly cap, +50% since May 13 2026
Opus weekly limit · resets Mon 12:00am           # separate Opus-only bucket (Max plans surface this distinctly)

Critical mental model. Usage is weighted, not message-counted. Model choice, tool calls, context length, and extended thinking all multiply your burn rate. A short prompt that triggers heavy file reads can cost more than a long prompt with no tool calls. This is why the same Pro user can hit the weekly cap on day 3 in week one and day 6 in week two — the underlying work differed.

/usage              # see session %, weekly %, and reset times
/extra-usage        # buy on-demand additional usage (Pro & Max)
/model sonnet       # save your Opus budget for hard tasks

The undocumented goodwill reset. During the April 2026 regression Anthropic reset every subscriber's weekly limit as compensation, with no announcement beyond a status-page note. If you hit a wall during a known degradation window, check status.anthropic.com — they've quietly done it more than once.

Watch on YouTube: Mastering Claude Code in 30 minutes — Boris Cherny

Which Errors Hit You Most Often?

Looking back at six months of Claude Code triage across my own work and the public GitHub issue queue, the frequency distribution is more lopsided than I expected. About three-quarters of every Claude Code support request maps to four error families.

The distribution explains something important: most published "Claude Code error" guides over-index on rate limits (visible, dramatic) and under-index on exit code 1 (boring, frequent). If you're a maintainer documenting your team's setup, spend your effort on the top four — they'll account for three out of every four tickets.

What Did the April 2026 Regression Actually Teach Us?

For roughly six weeks in March and April 2026, Claude Code felt measurably worse for a lot of users, and the discourse on Hacker News and Reddit became almost unmanageable. On April 23, Anthropic published a postmortem that's worth reading in full (Anthropic Engineering, 2026) because it names three specific decisions instead of waving at "model variance."

The first was an effort parameter default that quietly shifted from high to medium on March 4, downgrading reasoning depth for a class of agentic tasks. The second was a prompt-cache bug that cleared thinking traces between turns, forcing the model to re-derive context it had just established. The third, on April 16, was a verbosity-prompt change that made the model terser in ways that broke tool-use chains. Each looked benign in isolation; together they produced a month of "Claude is broken" reports — many of which surfaced in the Fortune cover story (2026).

What's unusual isn't that this happened — every model lab ships regressions. What's unusual is the transparency: dated changes, named causes, and a blanket reset of every subscriber's weekly limit as compensation. Compare with the GPT-4 "quality drift" discourse of 2026–2026 that never got a real postmortem and you can see why the response shifted the trust conversation.

The lesson I took from it. If your Claude Code session feels slower or dumber than yesterday, check the status page before you blame your config. Roughly one in ten of my "the model got worse" moments over the Replace with a specific date (e.g., "in March 2026") has correlated with a documented incident. Subscribing to status-page RSS does more for my sanity than any retry-loop tweak.

How Do You Prevent These Errors From Repeating?

The most expensive errors aren't the ones you can't fix — they're the ones that come back. After two years of running Claude Code in anger across three machines and four organizations, the prevention layer that actually matters is small enough to fit on a sticky note.

Pin Node ≥ 20 with nvm or fnm. Node 18 reached end-of-life in April 2026 and Node 20 is the active LTS as of 2026 (Node.js Releases, 2026). Half the exit-code-3 reports come from stale system Node still pinned to 18 or earlier by package managers.
Pick one auth path and delete the others. Either OAuth via /login or ANTHROPIC_API_KEY. Never both. The conflict is the #1 cause of code-1 crashes.
Run claude doctor after every upgrade. It's the only thing that catches the silent ones (CSP regressions, MCP boot failures, missing scopes) before you waste an hour.
Normalize line endings repo-wide. One .gitattributes + core.autocrlf=input eliminates the entire "Edit failed" category on Windows.
Audit MCP servers monthly. Anything you haven't actively used in 30 days gets disabled. This single habit recovered me a third of my context budget.
Document NODE_EXTRA_CA_CERTS in your team onboarding. If you work behind a corporate proxy, this saves every new hire a day of "API connection error" frustration.
Subscribe to the status page RSS. A regression you can't fix is one you should at least know about — Anthropic's April 2026 postmortem confirms they're real and disclosed late.
Treat the 5-hour window as your unit of work, not the day. Plan sessions to wrap before reset. Switch to Sonnet for everything that doesn't need Opus.

The Stack Overflow 2026 survey found that 75% of developers say "when I don't trust AI's answers" is their top reason for asking a human instead (Stack Overflow, 2026). Every silent failure is a withdrawal from a trust account that's already overdrawn. The prevention work above is the deposit side.

the multi-agent review setup that catches issues before they become Claude Code errors

Frequently Asked Questions

Why does `claude doctor` say I'm fine when I'm clearly not?

claude doctor checks environment preconditions (Node version, env vars, MCP config, OAuth token presence) but not runtime state. If your token's scopes are wrong or your session transcript is corrupted, doctor will report green while your prompt still crashes with code 1. Run /logout && /login and try a fresh session — that catches the runtime-state class of failure (Anthropic docs, 2026).

Should I disable `NODE_TLS_REJECT_UNAUTHORIZED` to fix connection errors?

No. Setting NODE_TLS_REJECT_UNAUTHORIZED=0 exposes your API key and every prompt to whatever proxy is in the path — including ones you didn't intend. Use NODE_EXTRA_CA_CERTS with your corporate CA bundle instead; it preserves TLS verification while letting Claude trust the proxy (Anthropic network config, 2026).

How do I know if I'm hitting the 5-hour limit versus the weekly limit?

Run /usage mid-session. The output shows session percentage (resets in hours), overall weekly percentage (resets Monday), and the separate Opus weekly bucket (Max plans only). All three are tracked independently — you can blow the Opus cap while the session and overall caps are healthy (Anthropic support, 2026).

Does Anthropic publish a status page I can subscribe to?

Yes — status.anthropic.com supports RSS and webhook subscriptions. Anthropic's April 23, 2026 engineering postmortem confirmed they now post regression timelines there (Anthropic Engineering, 2026), so it's worth subscribing if you're running Claude Code on a production schedule.

What's the fastest way to recover from a corrupted session?

Rename the offending .jsonl transcript out of the way: mv ~/.claude/projects/<project>/sessions/<id>.jsonl{,.bak} and restart. Claude Code creates a fresh transcript on next launch. If you've lost critical work, the .bak file is still valid JSONL and can be inspected with jq.

Conclusion

Every error in this guide costs you somewhere between 30 seconds and an afternoon depending on whether you can match the message to its real cause. The hard part isn't fixing them — Anthropic's error reference covers about 90% of them with verified commands. The hard part is the lookup.

If you remember three things: claude doctor first, unset ANTHROPIC_API_KEY second, /logout && /login third — you'll resolve roughly half of every Claude Code failure before you finish reading the stack trace. Bookmark this page for the rest.

the deterministic hook layer that prevents most of these errors before they happen

DEV Community: Nishil Bhave

Claude Code Save Conversation: Find & Export Transcripts

Claude Code Save Conversation: Where Transcripts Live

Why Should You Save Claude Code Conversations at All?

Where Does Claude Code Save Conversations on Disk?

What's Inside a Claude Code Transcript File?

How Do You Use the Built-In /resume and /export Commands?

How Do You grep Your Own Transcript History?

Which Open-Source Tools Turn Transcripts Into Real Archives?

How Should You Redact a Transcript Before Sharing It?

My Own Setup: Archive, Index, and Never Lose a Session

Frequently Asked Questions

Does Anthropic see my Claude Code transcripts if I'm on a Pro plan?

Where does Claude Code save conversations on Windows?

How do I disable Claude Code's 30-day auto-delete?

Can I export every saved session at once, not just the current one?

What's the difference between /resume and /continue?

Do hooks have access to the transcript file?

Conclusion

Claude Code Router: Cut Your Claude Bill 21x

Claude Code Router: Cut Your Claude Bill 21x

What Is Claude Code Router and How Does the Proxy Architecture Work?

How Much Money Can Claude Code Router Actually Save You?

Why Geographic Restrictions Make Claude Code Router Essential Outside the US

How Do You Install and Configure Claude Code Router?

What Routing Rules Actually Work in Practice?

What Tradeoffs Should You Expect When Routing Around Anthropic?

Is It Safe to Route API Traffic Through a Local Proxy?

Frequently Asked Questions

Does claude-code-router work with Claude Code's plan mode and subagents?

Will the router break when Claude Code updates?

Can I use Ollama locally and route everything offline?

Does claude-code-router work with MCP servers?

Is the project legitimate or a stealthy way to harvest API keys?

Will Anthropic ban my account for using a router?

Conclusion

Claude Code Installation Guide: Every Platform, Every Gotcha

Claude Code Installation Guide: Every Platform, Every Gotcha (2026)

Which Claude Code Install Method Should I Pick in 2026?

How Do I Install Claude Code on macOS?

How Do I Install Claude Code on Windows?

How Do I Install Claude Code on Linux?

When Should You Use npm to Install Claude Code?

How Do I Install the Claude Code VS Code Extension?

How Do I Install the Claude Code JetBrains Plugin?

Is There a Claude Code Desktop App?

How Do I Log Into Claude Code After Install?

How Do I Update Claude Code Without Losing My Config?

How Do I Cleanly Uninstall Claude Code from Every Platform?

Frequently Asked Questions

What Node.js version does Claude Code require?

Can I install Claude Code on multiple machines with the same account?

Is the npm install of Claude Code deprecated?

Does Claude Code work on Windows without WSL?

How do I check what version of Claude Code I'm running?

What to Install After Claude Code Is Running

CLAUDE.md Best Practices: The Complete 2026 Guide

CLAUDE.md Best Practices: What Actually Moves the Needle

What Is CLAUDE.md and Where Does It Live?

Why CLAUDE.md Is the Most Underrated Claude Code Feature

How to Write a CLAUDE.md: The Step-by-Step Structure

Real CLAUDE.md Examples: Laravel API, Next.js App, Python Pipeline

CLAUDE.md vs claude/agents.md: When to Use Which

The 12 CLAUDE.md Best Practices I Follow Daily

Common CLAUDE.md Mistakes (Before and After)

Team CLAUDE.md Workflows: Committing, Code Review, Monorepos

The Three-File CLAUDE.md System: CLAUDE.md / Subagents / Per-Skill Markdown

Frequently Asked Questions

What is a CLAUDE.md file?

How long should a CLAUDE.md be?

Is CLAUDE.md the same as AGENTS.md?

How do I create a CLAUDE.md file?

Does a bloated CLAUDE.md hurt performance?

Should I commit CLAUDE.md to git?

The Bottom Line

SEO vs GEO vs AEO: Why They Need Different Strategies

SEO vs GEO vs AEO in 2026: Why They Need Different Strategies

The 86 vs 5 Problem

What Do SEO, GEO, and AEO Actually Mean?

How Do SEO, GEO, and AEO Differ?

How Do You Use the Built-In `/resume` and `/export` Commands?

What's the difference between `/resume` and `/continue`?

Is the sequential-thinking server safe to enable at project scope (`.mcp.json`)?