Aviad Rozenhek

Posted on Jan 15

LangSmith CLI: Not Just Efficient — Actually Better Than MCP

#claude #mcp #langsmith #ai

LangSmith gives us incredible visibility into LLM applications: full traces, datasets, prompt versioning, evaluations — everything we need to build reliable AI systems.

But actually using LangSmith day-to-day has always felt clunky:

Constantly refreshing the web UI
Writing custom API scripts
Or using MCP servers that quietly eat 16,100 tokens of context — permanently.

I built langsmith-cli to solve this properly.

It's not only dramatically more efficient (177× less context overhead).

It is fundamentally better for real debugging, analysis, and production monitoring workflows.

Here’s why — with real measurements and concrete examples.

1. Context Is Precious — 177× Less Waste

Straight from /context in Claude Code:

MCP LangSmith tools → 16,100 tokens always loaded (~8% of 200k context)
langsmith-cli as Skill → 91 tokens only when activated, 0 when idle

→ 177× difference in context overhead.

This is not theoretical.

Every extra 10–20k tokens of tool definitions means less room for:

conversation history
source code
documentation
actual reasoning

Add 2–3 more MCP servers → 20–30% of your context disappears before you start working.

2. Real-time Production Monitoring — `runs watch`

The single feature that made me never want to go back:

langsmith-cli runs watch --project production

You get an auto-refreshing, color-coded terminal dashboard:

Live status (🟢 / 🔴)
Latency, token usage, relative time
Instant visibility into error rate and average performance
Filter on the fly: --failed, --slow, --model gpt-4, --tag customer-facing

No browser refresh. No delay.

You literally see production break (or recover) in real time.

MCP + web UI simply cannot match this immediacy.

3. Powerful, Developer-first Filtering

Finding the right runs should not require writing custom code every time.

Examples that MCP/web simply cannot do easily:

# Regex on run names
langsmith-cli runs list --name-regex "^api-v[0-9]+\.[0-9]+"

# Wildcard + smart presets
langsmith-cli runs list --name-pattern "*auth*" --failed --today

# Time ranges (very natural syntax)
langsmith-cli runs list --since "1 hour ago"
langsmith-cli runs list --last 24h
langsmith-cli runs list --since "2025-12-01" --until "2025-12-02"

# Expensive / slow runs
langsmith-cli runs list --min-tokens 8000 --slow --today

These filters are fast, composable, and — most importantly — stay in your terminal flow.

4. Field Pruning: 95% Token Reduction on Responses

A complex multi-agent trace can easily be ~4,200 tokens.

Fetching 10 failed runs full → ~42k tokens just for data.

With --fields:

langsmith-cli --json runs list --failed --limit 10 --fields name,error,latency,status

→ ~214 tokens per run instead of 4,210

→ ~95% reduction

You only pay for the information you actually need.

MCP always returns the complete object. Every time.

5. Dual Excellent UX — Humans + Agents

# Human mode (beautiful rich table)
langsmith-cli runs list --project production --limit 8

→ Color-coded, aggregates, relative times, clean formatting

# Agent / script mode (strict, minimal JSON)
langsmith-cli --json runs list --failed --fields name,error,latency --limit 20

One tool. Two perfect interfaces.

No compromises.

6. Export Formats That Actually Help Teams

--format csv → Excel, pivot tables, stakeholder reports
--format yaml → configs, reproducible environments
--json → agents, automation, monitoring pipelines

langsmith-cli runs list --failed --today --format csv > failed-runs-today.csv

Open → analyze → share. Done.

7. Unix Philosophy — Full Composability

# How many timeout errors today?
langsmith-cli --json runs list --failed --today \
  | jq '.[] | select(.error | contains("timeout"))' \
  | wc -l

# Top 5 most common errors
langsmith-cli --json runs list --failed --limit 200 \
  | jq -r '.[] | .error' \
  | sort | uniq -c | sort -rn | head -5

This is where CLI completely outclasses MCP + web.

You already know these tools.

You already have the scripts.

Now they work with LangSmith too.

Quick Start (Really 30–60 Seconds)

# Install (isolated, safe, works everywhere)
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh

# Or faster with uv:
uv tool install langsmith-cli

# Add as skill in Claude Code
/plugin marketplace add gigaverse-app/langsmith-cli

# First login
langsmith-cli auth login

Then try:

langsmith-cli runs watch --project production
# or
langsmith-cli runs list --failed --today

Final Verdict

langsmith-cli is not just "lighter" than MCP.

It is objectively better at the things that matter most when debugging and operating LLM systems in production:

Real-time visibility
Powerful filtering without code
Massive context & token savings
Beautiful human UX + perfect machine UX
Export formats teams actually use
Full Unix-style composability

177× less context overhead is nice.

But being able to watch production live, find problems in seconds, and export meaningful data instantly — that's why I built it, and why I never want to go back.

Give it 60 seconds.

Run /context before and after.

The numbers don't lie.

Repo → https://github.com/gigaverse-app/langsmith-cli (MIT)

Happy (much faster) debugging!

Aviad

LangSmith #LLM #Observability #AIDevTools #ClaudeCode

DEV Community

LangSmith CLI: Not Just Efficient — Actually Better Than MCP

1. Context Is Precious — 177× Less Waste

2. Real-time Production Monitoring — `runs watch`

3. Powerful, Developer-first Filtering

4. Field Pruning: 95% Token Reduction on Responses

5. Dual Excellent UX — Humans + Agents

6. Export Formats That Actually Help Teams

7. Unix Philosophy — Full Composability

Quick Start (Really 30–60 Seconds)

Final Verdict

LangSmith #LLM #Observability #AIDevTools #ClaudeCode

Top comments (0)

1. Context Is Precious — 177× Less Waste

2. Real-time Production Monitoring — runs watch

3. Powerful, Developer-first Filtering

4. Field Pruning: 95% Token Reduction on Responses

5. Dual Excellent UX — Humans + Agents

6. Export Formats That Actually Help Teams

7. Unix Philosophy — Full Composability

Quick Start (Really 30–60 Seconds)

Final Verdict

LangSmith #LLM #Observability #AIDevTools #ClaudeCode

2. Real-time Production Monitoring — `runs watch`