Discussion on: How to Structure Claude Code for Production: MCP Servers, Subagents, and CLAUDE.md (2026 Guide)

View post

The "75.6% on SWE-bench means 1 in 4 tasks fails" reframe is the line every team adopting Claude Code should internalize. Most planning docs I see still treat the benchmark as a ceiling instead of a baseline expected failure rate.

A few patterns that have held up in our own production setup, in case useful:

CLAUDE.md should be small and load-bearing, not a manual. Ours got to ~600 lines and the model started ignoring the back half. We cut it to ~150 lines of rules and pointers and moved the long-form context into skills that get loaded on demand. Quality jumped immediately.
Subagents are best when they have narrower context than the parent. Counterintuitively, a subagent given the full parent context tends to drift in the same direction the parent already drifted. Giving it a minimal, fresh prompt is usually a feature.
One MCP server per concern, even if it means more servers. Bundling "everything GitHub-related" into one server made tool selection harder for the model, not easier. Splitting issues, PRs, and code search into three made tool calls noticeably more accurate.

The Playwright + Supabase + GitHub combo you describe is also the one that finally made end-to-end "write code, verify it ran, file the edge case" work for us. The "without leaving the session" framing is the actual unlock — context-switching between tools is where most of the latency lived before.