Six weeks ago I started using Claude Code as my daily pair programmer. Not for demos, not for "look what AI can do" — for actual, ship-to-production work.
The internet is full of hot takes. Either AI will replace all developers, or it's a toy that writes broken code. After a month of real use, neither is true. Here's what actually works.
The Setup
I run Claude Code from my terminal with a few custom MCP servers for web search, file operations, and database queries. No IDE plugin, no chat interface — just /usr/bin/claude and a terminal split.
Key config decisions:
- Always start with a clear goal file (
goal.md) before any session - Use MCP tools for context (search docs, check logs, read schemas) — raw Claude Code without tools hallucinates too much
- Break every task into 15-minute chunks max
Where It Shines
1. Refactoring with confidence. I had a 600-line Express.js route handler that had grown organically over two years. "Split this into controller/service/repository layers, preserve all existing behavior, don't break the session middleware." Claude Code did it in one pass. I reviewed the diff in 10 minutes. Zero regressions.
The trick: be specific about what NOT to change. "Don't touch the error handling pattern in line 120-180" is better than "refactor this properly".
2. Writing boilerplate I'd normally procrastinate. Database migrations, CRUD routes for a new model, API tests for an existing endpoint — these are the tasks I'd estimate as "2 hours" and take 4 because I get distracted. Claude Code finishes them in one shot with 90% accuracy. The last 10% is trivial fixes.
3. Debugging cryptic errors. A WASM WebGL project was rendering a black screen with zero console errors. I pasted the shader code and build config into Claude. It spotted a missing @interpolate(flat) qualifier in the WGSL vertex shader that I'd stared at for an hour. Saved me an evening.
Where It Falls Short
Complex multi-step logic. "Implement a new authentication flow with OAuth, session management, rate limiting, and audit logging" — Claude Code produces working code but misses edge cases. CSRF token rotation on password change. Rate limit headers in error responses. Things a senior dev knows from experience.
Design decisions. Claude Code is great at "how" but bad at "whether". Should we use WebSocket or SSE for this feature? Should this be a new microservice or an endpoint in the existing API? It will happily implement whichever approach you suggest, even if it's wrong.
Large refactors with no tests. Without a test suite to validate against, Claude Code makes plausible-looking changes that break silently. Always have at least one integration test that exercises the critical path before letting it loose.
The Workflow That Works
My current routine:
- Goal.md first — Write 3-5 sentences about what needs to happen, including explicit constraints
- Claude Code takes the first pass — I review the diff, not the conversation
- I fix the subtle stuff — Error handling, edge cases, naming conventions
- Commit with a good message — Claude Code writes the commit message boilerplate, I add the "why"
The biggest productivity gain isn't speed. It's flow state. When Claude Code handles the mechanical parts, I stay in the zone for the parts that matter — architecture decisions, edge case thinking, code review.
The Honest Bottom Line
Claude Code saves me about 2-3 hours per day, mostly on tasks I'd procrastinate anyway. It doesn't eliminate the need for senior engineers — it makes them more effective by removing the grunt work.
The developers who get the most value aren't the ones who use it for everything. They're the ones who know exactly when to use it and when to type the code themselves.
I've been collecting the specific prompts and workflows that work best in practice. If you're curious, I wrote up my complete set of reusable Claude Code configurations and MCP server setups.
Top comments (2)
Exactly the trap I've hit too. Integration tests that hit real infra catch a completely different class of bugs than unit tests. I've started writing at least one "smoke test" per service that exercises the real DB before letting Claude Code loose on refactors — catches the silent breakage every time.
Yeah, the only ones that have actually caught a bad refactor for me are the integration tests that hit the real database. The unit tests stay green because nothing underneath them moved. And if you let the agent write the test too it'll usually stub the dependency, so you're back to a green suite that isn't exercising the thing that broke.
prickles.org/tenet/real-dependency...