TL;DR
Measured reality in Claude Code sessions:
- MCP LangSmith tools → 16,100 tokens always loaded (≈8% of 200k context)
- langsmith-cli as Skill → 91 tokens when activated, 0 tokens when idle
- Difference: ×177 less context overhead
Installation: 30 seconds vs typical 15+ minutes
Field pruning: up to 95% token reduction on responses
Startup: 43–87 ms cold/warm
Skills win for the majority of stateless AI tooling operations.
The Context Tax – Measured Reality
Right now, in my Claude Code session, the LangSmith MCP tools are consuming:
MCP tools · /mcp
├ mcp__langsmith__run_experiment 3.2k tokens
├ mcp__langsmith__push_prompt 2.8k tokens
├ mcp__langsmith__fetch_runs 2.2k tokens
...
└ mcp__langsmith__get_prompt_by_name 146 tokens
TOTAL: 16,100 tokens (≈8% of 200k context window)
These definitions are permanently loaded — even if I never touch LangSmith during the entire conversation.
The same functionality implemented as a Skill (subprocess-based CLI):
Skills · /skills
├ commit-commands:clean_gone 46 tokens
├ agent-sdk-dev:new-sdk-app 19 tokens
...
TOTAL when activated: 91 tokens (0.045% of context)
Inactive: 0 tokens
177× difference.
Not an estimate — actual numbers from /context command.
Why Does This Matter? Context Economics
| Item | Price (Claude Opus 4.5) | Impact of losing 16k tokens |
|---|---|---|
| Input tokens | $15 / million | ~$0.24 per query just overhead |
| 200k context window | shared resource | 8% permanently occupied |
| 3 typical MCP servers | ~36–48k tokens | 18–24% of context gone |
| Freed context (35k+ tokens) | — | ≈30 pages docs / 500+ LOC / long conversation history |
The more MCP servers you add, the faster your effective context window shrinks — before any real work begins.
Architectural Comparison: Persistent vs On-demand
| Aspect | MCP Servers | Skills (subprocess CLI) |
|---|---|---|
| Loading moment | At application start | Only when explicitly activated |
| Context occupation | Permanent | Temporary + very small |
| Startup time (measured) | Usually 1–3+ seconds | 43–87 ms |
| Resource consumption | Persistent process | Starts → works → exits |
| Lifecycle management | Required (start/stop/restart/debug) | None |
| Installation complexity | Medium–high (config, env vars, debugging) | Very low (curl / uv tool) |
| Composability | Limited (JSON only) | Excellent (Unix pipes friendly) |
| Output control | Full objects always | Field pruning + multiple formats |
Most AI tooling operations are stateless queries
→ list, get, create, update, export
→ They don't need persistent connections, pools, watchers, or bidirectional streaming.
Added Value of langsmith-cli (Beyond Context Efficiency)
Aggressive field pruning
FullRunobject ≈ 4.2k tokens
Pruned (name, error, latency, etc.) ≈ 200–300 tokens
→ ~90–95% reductionMultiple output formats
--json,--format csv,--format yamlHuman-friendly + agent-friendly dual UX
Rich tables when interactive, clean JSON when pipedAdvanced filtering presets
--failed,--slow,--today, regex/wildcard on names, etc.Live watching TUI
langsmith-cli runs watch --project production
Real Numbers from Real Session (Debug Example)
Task: Find failed runs from last hour + show error messages
Skills version
Context cost: 91 tokens (skill definition)
Response: ≈500 tokens (pruned fields, 5 runs)
Total ≈ 591 tokens
MCP version
Context cost: 16,100 tokens (always)
Response: ≈2,000 tokens (full objects)
Total ≈ 18,100 tokens
→ ×30.6 more context for the same information
Installation – 30 Seconds vs 15+ Minutes
Recommended (Skills):
# One-liner (creates isolated venv, adds to PATH)
curl -sSL https://raw.githubusercontent.com/gigaverse-app/langsmith-cli/main/scripts/install.sh | sh
# Then in Claude Code
/plugin marketplace add gigaverse-app/langsmith-cli
Typical MCP path:
pip install langsmith-mcp-server- manual editing of
config.json - setting env variables
- debugging python path / permissions / port conflicts
- restart client
- check logs... → frequently 15–40 minutes
When MCP Still Makes Sense (Fair Comparison)
Use MCP servers when you really need:
- persistent expensive state (connection pools, large in-memory caches)
- background processing (file watchers, long-polling)
- bidirectional streaming
- very heavy initialization (5GB+ ML models)
For 90–95% of current LangSmith / tracing / evaluation use-cases → skills are superior.
Quick Start – Measure It Yourself
# Install CLI
curl -sSL https://raw.githubusercontent.com/gigaverse-app/langsmith-cli/main/scripts/install.sh | sh
# Add as skill in Claude Code
/plugin marketplace add gigaverse-app/langsmith-cli
# See the dramatic difference
/context
Repo: https://github.com/gigaverse-app/langsmith-cli
( MIT license – contributions welcome )
Context is the most precious resource in long-context LLMs.
Don't waste it on infrastructure that can be replaced with an 80-millisecond subprocess call.
Try the skills approach.
The numbers don't lie.
Happy (much lighter) hacking!
Aviad
Top comments (0)