zecheng

Posted on Mar 17 • Originally published at lizecheng.net

How to Structure Claude Code for Production: MCP Servers, Subagents, and CLAUDE.md (2026 Guide)

#ai #buildinpublic #productivity #webdev

Structured AI development — using Claude Code with MCP servers, custom subagents, and project-scoped configuration — produces measurably more reliable software than ad-hoc "vibe coding," with 1.7x fewer defects and 2.74x fewer security vulnerabilities according to a CodeRabbit analysis of 470 open-source PRs (December 2025).

Most developers stop at "drop Claude Code into the terminal and type." That's leaving 80% of the tool on the table. This guide covers the full production setup: MCP server integration, CLAUDE.md project memory, custom slash commands, and specialized subagents — everything you need to move from prototype to repeatable pipeline.

Why Vibe Coding Has a Documented Failure Rate

"Vibe coding" — Andrej Karpathy's term for describing what you want in natural language and accepting whatever the LLM generates — has a real-world reliability problem. Claude Opus 4.6 scores 75.6% on SWE-bench, the benchmark that tests real GitHub issues requiring multi-file edits and passing existing test suites. That's the best available model. It means one in four production tasks fail without intervention.

HumanEval benchmarks routinely show 90%+ pass rates and mean nothing for production. SWE-bench is the honest number — multi-file, existing codebase, real test suite. Design your workflows around the failure cases, not the averages.

The production fix isn't waiting for better models. It's building workflows that catch and handle the failures.

What CLAUDE.md Actually Does (and Why It's the Foundation)

CLAUDE.md is persistent project memory — instructions loaded into every Claude Code conversation automatically.

Two locations matter:

Global: ~/.claude/CLAUDE.md — personal preferences, forbidden patterns, global conventions
Project: <project>/.claude/CLAUDE.md (check this into the repo) — tech stack, database schema notes, API endpoint references, architectural decisions

Project scope overrides global on conflicts. If your global CLAUDE.md says "use TypeScript strict mode" but your project CLAUDE.md says "this repo uses CommonJS, no strict mode," Claude Code will follow the project instruction.

A production CLAUDE.md covers:

Tech stack with exact versions (Next.js 15 App Router, PostgreSQL via Neon, Prisma 7)
Forbidden patterns (never use eval(), no string concatenation in SQL)
Directory conventions (feature logic in /lib, not /utils)
API endpoints Claude should know about
How to run tests locally

This file eliminates the "Claude forgot what we decided" problem. Every session starts with the same context.

MCP Servers: How Claude Code Connects to Your Stack

Model Context Protocol (MCP) is the integration layer between Claude Code and external tools. Claude Code acts as the MCP client; each server exposes capabilities like database access, browser automation, or API calls. The claude mcp add command is the entry point:

claude mcp add github --scope user
claude mcp add playwright --scope project
claude mcp list

Config lives at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS.

Three transport types matter:

stdio — local processes, best for filesystem and direct system access
HTTP — remote servers, recommended for cloud services like Supabase
SSE — deprecated, don't use

The MCP servers worth adding for production SaaS work:

GitHub MCP — PR reviews, issue creation, and repo management without leaving your terminal. No more switching context to a browser for a PR status.

Playwright MCP — End-to-end testing via the accessibility tree (no screenshots needed). Runs across Chromium, Firefox, and WebKit. Scope it to your project, not globally.

claude mcp add playwright --scope project

Supabase MCP — Direct line to your database, auth, storage, and edge functions. Configure it project-scoped with your project ref:

Server URL: https://mcp.supabase.com/mcp?project_ref=<your-ref>
Auth: OAuth

PostgreSQL MCP — Direct SQL queries for non-Supabase setups.

Combining three or more MCP servers in one session eliminates context-switching. Claude can write a feature, verify it against the database schema, create a GitHub issue for the edge case it found, and run the Playwright test suite — without leaving the session.

Custom Slash Commands: Reusable Workflows in One File

Every .md file in .claude/commands/ becomes a /command-name slash command. These are reusable prompts for work you do repeatedly.

.claude/
  commands/
    review-pr.md       → /review-pr
    seed-db.md         → /seed-db
    deploy-check.md    → /deploy-check

A deploy-check.md might contain:

Run the following before any deployment:
1. Check for hardcoded API keys in $ARGUMENTS or the staged diff
2. Verify database migrations are in sync with Prisma schema
3. Run `npm run test` and confirm zero failures
4. Check that environment variable names in .env.example match what the app expects
Report any issues found. If all pass, output "CLEAR TO DEPLOY."

Now /deploy-check is a repeatable gate. The $ARGUMENTS placeholder lets you pass a branch name or PR number.

Subagents: Specialized Workers With Their Own Scope and Memory

Subagents are the most underused feature in Claude Code. They're defined as Markdown files in .claude/agents/ with YAML frontmatter:

---
name: api-tester
description: Tests REST endpoints, validates response schemas, catches regressions
tools: [Bash, Read, WebFetch]
model: sonnet
memory: .claude/memory/api-tester/
---

You are a specialized API testing agent. When invoked, you:
1. Read the OpenAPI spec from /docs/api.yaml
2. Test each endpoint against the running dev server
3. Compare responses against the expected schema
4. Report failures with the specific request, expected output, and actual output

Key frontmatter fields:

tools — restrict to only what this agent needs. An api-tester doesn't need Edit or Write access.
model — use haiku for fast/cheap tasks, sonnet for balanced work, opus for architectural review
memory — persistent directory that survives across conversations

Claude Code can run up to 10 simultaneous subagents (2026). A three-stage production pipeline looks like:

pm-spec agent — reads task input, writes a structured spec with acceptance criteria
architect-review agent — validates the spec against platform constraints, produces a decision record
implementer-tester agent — writes code and tests, updates documentation

The orchestrator (Claude Code itself) coordinates the three. Each agent has limited tool access — the spec agent can only Read and Write to docs, the implementer can Bash and Edit. Principle of least privilege in AI agents is not theoretical.

Context Management: The 200K Token Budget

Claude Code has a 200,000-token context window. That sounds enormous. It isn't, once you factor in file contents, tool outputs, and conversation history.

Three levers for long sessions:

Plan mode halves token consumption by reducing back-and-forth generation. Use it at the start of any complex multi-file task — Claude maps the work before executing.

Multi-session splitting — break large features into targeted sessions. "Add Stripe webhooks" is one session. "Refactor the billing service" is a separate session. Don't carry unrelated context.

Context editing (2026 feature) — automatically clears stale tool call outputs while preserving conversation flow. In a 100-turn evaluation, this cut token consumption by 84% while completing workflows that previously failed from context exhaustion.

Use /clear between unrelated tasks. Use --continue to resume a previous session rather than re-establishing context from scratch.

The Production Pattern That Actually Works

Specification-first, then AI execution. Before touching Claude Code on any non-trivial feature:

Write a structured spec in CLAUDE.md or a dedicated spec file — include scope, constraints, acceptance criteria, what NOT to do
Open Claude Code in Plan mode, share the spec, ask for the implementation plan before any code is written
Review the plan. Make architectural decisions yourself — Claude proposes, you approve
Execute the plan with the appropriate subagent or command, with Playwright MCP watching for regressions

The hybrid rule for what to automate versus hand-code: vibe code the repetitive, well-understood parts (CRUD endpoints, data transformation, form validation). Hand-code the novel, security-sensitive, or architecturally critical parts.

An AI-generated security check that fails one in four times is not a security check.

Key Takeaways

CLAUDE.md is persistent memory — project rules in .claude/CLAUDE.md define behavior across every session; commit it to the repo so your whole team uses the same context
MCP servers eliminate context-switching — GitHub + Playwright + Supabase in one session means Claude can write, test, and track work without leaving the terminal
Subagents enforce least privilege — specialized agents with restricted tool access are more reliable than one omnipotent session; define tools explicitly in frontmatter
Context degrades on long sessions — Plan mode, multi-session splitting, and context editing are the levers; a full context window is not a feature, it's a warning
SWE-bench 75.6% means 1-in-4 failures — design human checkpoints at architectural decision boundaries, not after deployment

What This Means for Builders

Start every project with a CLAUDE.md that includes forbidden patterns, stack versions, and database schema notes — this single file eliminates 80% of "Claude forgot" problems
Add playwright and github MCP servers scoped to the project before writing your first feature, so regression tests run automatically during development
Build subagents for work you do more than twice a week — PR review, database seeding, pre-deploy checks — and commit them to .claude/agents/ with restricted tool access
When Claude's output surprises you (wrong framework assumption, incorrect schema reference), fix the CLAUDE.md instead of re-prompting — fix the context, not the conversation

Built with IntelFlow — open-source AI intelligence engine. Set up your own daily briefing in 60 seconds.

Top comments (2)

Max Quimby • May 7

The CLAUDE.md + subagents + MCP triad is the right shape. Two things I'd push back on or extend:

CLAUDE.md decay is real. We had a project where the file accumulated 18 months of "always do X" notes, half of which contradicted each other or referenced code that no longer existed. Now we treat it like any other piece of code — quarterly prune, and any rule older than ~3 months gets re-justified or removed.

On subagents with restricted permissions — yes, but the bigger win for us was scoping them by context, not just tools. A code-review subagent with read-only permissions still leaks signal if you let it see your secrets file. We give each subagent the minimum directory scope that lets it do its job, and the permission restriction is a second layer.

Curious how you're handling architectural checkpoints in practice — are you gating with hooks, or is it convention-based?

Max Quimby • May 15

The "75.6% on SWE-bench means 1 in 4 tasks fails" reframe is the line every team adopting Claude Code should internalize. Most planning docs I see still treat the benchmark as a ceiling instead of a baseline expected failure rate.

A few patterns that have held up in our own production setup, in case useful:

CLAUDE.md should be small and load-bearing, not a manual. Ours got to ~600 lines and the model started ignoring the back half. We cut it to ~150 lines of rules and pointers and moved the long-form context into skills that get loaded on demand. Quality jumped immediately.
Subagents are best when they have narrower context than the parent. Counterintuitively, a subagent given the full parent context tends to drift in the same direction the parent already drifted. Giving it a minimal, fresh prompt is usually a feature.
One MCP server per concern, even if it means more servers. Bundling "everything GitHub-related" into one server made tool selection harder for the model, not easier. Splitting issues, PRs, and code search into three made tool calls noticeably more accurate.

The Playwright + Supabase + GitHub combo you describe is also the one that finally made end-to-end "write code, verify it ran, file the edge case" work for us. The "without leaving the session" framing is the actual unlock — context-switching between tools is where most of the latency lived before.