Aviad Rozenhek

Posted on Jan 14

LangSmith CLI Why Lightweight Skills Crush Heavy MCP Servers (Context Is All You Need)

#ai #claude #langsmith #mcp

TL;DR

Measured reality in Claude Code sessions:

MCP LangSmith tools → 16,100 tokens always loaded (≈8% of 200k context)
langsmith-cli as Skill → 91 tokens when activated, 0 tokens when idle
Difference: ×177 less context overhead

Installation: 30 seconds vs typical 15+ minutes

Field pruning: up to 95% token reduction on responses

Startup: 43–87 ms cold/warm

Skills win for the majority of stateless AI tooling operations.

The Context Tax – Measured Reality

Right now, in my Claude Code session, the LangSmith MCP tools are consuming:

MCP tools · /mcp
├ mcp__langsmith__run_experiment     3.2k tokens
├ mcp__langsmith__push_prompt         2.8k tokens
├ mcp__langsmith__fetch_runs          2.2k tokens
...
└ mcp__langsmith__get_prompt_by_name    146 tokens
TOTAL: 16,100 tokens   (≈8% of 200k context window)

These definitions are permanently loaded — even if I never touch LangSmith during the entire conversation.

The same functionality implemented as a Skill (subprocess-based CLI):

Skills · /skills
├ commit-commands:clean_gone          46 tokens
├ agent-sdk-dev:new-sdk-app           19 tokens
...
TOTAL when activated: 91 tokens   (0.045% of context)
Inactive: 0 tokens

177× difference.

Not an estimate — actual numbers from /context command.

Why Does This Matter? Context Economics

Item	Price (Claude Opus 4.5)	Impact of losing 16k tokens
Input tokens	$15 / million	~$0.24 per query just overhead
200k context window	shared resource	8% permanently occupied
3 typical MCP servers	~36–48k tokens	18–24% of context gone
Freed context (35k+ tokens)	—	≈30 pages docs / 500+ LOC / long conversation history

The more MCP servers you add, the faster your effective context window shrinks — before any real work begins.

Architectural Comparison: Persistent vs On-demand

Aspect	MCP Servers	Skills (subprocess CLI)
Loading moment	At application start	Only when explicitly activated
Context occupation	Permanent	Temporary + very small
Startup time (measured)	Usually 1–3+ seconds	43–87 ms
Resource consumption	Persistent process	Starts → works → exits
Lifecycle management	Required (start/stop/restart/debug)	None
Installation complexity	Medium–high (config, env vars, debugging)	Very low (curl / uv tool)
Composability	Limited (JSON only)	Excellent (Unix pipes friendly)
Output control	Full objects always	Field pruning + multiple formats

Most AI tooling operations are stateless queries

→ list, get, create, update, export

→ They don't need persistent connections, pools, watchers, or bidirectional streaming.

Added Value of langsmith-cli (Beyond Context Efficiency)

Aggressive field pruning

Full Run object ≈ 4.2k tokens

Pruned (name, error, latency, etc.) ≈ 200–300 tokens

→ ~90–95% reduction
Multiple output formats

--json, --format csv, --format yaml
Human-friendly + agent-friendly dual UX

Rich tables when interactive, clean JSON when piped
Advanced filtering presets

--failed, --slow, --today, regex/wildcard on names, etc.
Live watching TUI

langsmith-cli runs watch --project production

Real Numbers from Real Session (Debug Example)

Task: Find failed runs from last hour + show error messages

Skills version

Context cost: 91 tokens (skill definition)

Response: ≈500 tokens (pruned fields, 5 runs)

Total ≈ 591 tokens

MCP version

Context cost: 16,100 tokens (always)

Response: ≈2,000 tokens (full objects)

Total ≈ 18,100 tokens

→ ×30.6 more context for the same information

Installation – 30 Seconds vs 15+ Minutes

Recommended (Skills):

# One-liner (creates isolated venv, adds to PATH)
curl -sSL https://raw.githubusercontent.com/gigaverse-app/langsmith-cli/main/scripts/install.sh | sh

# Then in Claude Code
/plugin marketplace add gigaverse-app/langsmith-cli

Typical MCP path:

pip install langsmith-mcp-server
manual editing of config.json
setting env variables
debugging python path / permissions / port conflicts
restart client
check logs... → frequently 15–40 minutes

When MCP Still Makes Sense (Fair Comparison)

Use MCP servers when you really need:

persistent expensive state (connection pools, large in-memory caches)
background processing (file watchers, long-polling)
bidirectional streaming
very heavy initialization (5GB+ ML models)

For 90–95% of current LangSmith / tracing / evaluation use-cases → skills are superior.

Quick Start – Measure It Yourself

# Install CLI
curl -sSL https://raw.githubusercontent.com/gigaverse-app/langsmith-cli/main/scripts/install.sh | sh

# Add as skill in Claude Code
/plugin marketplace add gigaverse-app/langsmith-cli

# See the dramatic difference
/context

Repo: https://github.com/gigaverse-app/langsmith-cli

( MIT license – contributions welcome )

Context is the most precious resource in long-context LLMs.

Don't waste it on infrastructure that can be replaced with an 80-millisecond subprocess call.

Try the skills approach.

The numbers don't lie.

Happy (much lighter) hacking!

Aviad

DEV Community