DEV Community

Cover image for I Replaced My LangSmith MCP Server with a 200-Token CLI Skill
Aviad Rozenhek
Aviad Rozenhek

Posted on

I Replaced My LangSmith MCP Server with a 200-Token CLI Skill

How langsmith-cli gives you 100% MCP parity, 96% less context usage, and features the MCP server doesn't have — all in a single pip install."

If you're using LangSmith with Claude Code (or any AI coding agent), you're probably running the official MCP server. It works. But every session, it injects 5,000+ tokens of tool schemas into your context window — whether you touch LangSmith or not.

I built langsmith-cli to fix that. It's a standalone CLI and a Claude Code plugin that replaces the MCP server with a <200 token skill definition. That's a 95% reduction in context overhead.

And it does more than the MCP server does.

The Problem with MCP Servers

MCP servers are always-on. The moment your agent session starts, every tool definition gets loaded into context. For LangSmith's MCP server, that's 66 parameters across multiple tools — around 5,000 tokens of JSON schema that sits in your context window doing nothing until you actually need to query traces.

For agents that need to do many things — write code, run tests, debug, and occasionally check LangSmith — this is wasteful. Context is your agent's working memory. Every token of schema is a token not available for reasoning.

The Fix: On-Demand Skills Instead of Always-On Schemas

langsmith-cli takes a different approach. Instead of an MCP server, it's a CLI tool with a tiny skill file that teaches your agent how to use it:

# Install the CLI
uv tool install langsmith-cli

# Add as Claude Code plugin
claude plugin marketplace add gigaverse-app/langsmith-cli
claude plugin install langsmith-cli@langsmith-cli
Enter fullscreen mode Exit fullscreen mode

The skill file is ~200 tokens. It loads on-demand. Your agent learns to run shell commands like:

# Get the latest failed run with only the fields you need
langsmith-cli --json runs get-latest --project my-app \
  --failed --fields id,name,error
Enter fullscreen mode Exit fullscreen mode

No schema bloat. No always-on server. Just a CLI your agent calls when it needs observability data.

96% Token Reduction with --fields

This is the feature that matters most for agents. A typical LangSmith run object is 20KB — easily 1,000+ tokens. With --fields, you get only what you asked for:

# Full run object: ~1000 tokens
langsmith-cli --json runs get abc-123

# Just what you need: ~40 tokens
langsmith-cli --json runs get abc-123 --fields name,status,error
Enter fullscreen mode Exit fullscreen mode

--fields works on every list and get command: runs, projects, datasets, examples, prompts. Your agent stays lean.

Built for Two Audiences

Most developer tools pick one audience. langsmith-cli serves both:

For humans — rich terminal tables with color-coded statuses, smart column truncation, syntax highlighting:

langsmith-cli runs list --project my-app --status error --last 24h
Enter fullscreen mode Exit fullscreen mode
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Name         ┃ Status     ┃ Tokens ┃ Latency  ┃ Error       ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ extractor    │ error      │ 2,340  │ 3.2s     │ Rate limit  │
│ classifier   │ error      │ 1,102  │ 12.4s    │ Timeout     │
└──────────────┴────────────┴────────┴──────────┴─────────────┘
Enter fullscreen mode Exit fullscreen mode

For agents — add --json as the first flag and everything switches: strict JSON to stdout, diagnostics to stderr, zero formatting noise:

langsmith-cli --json runs list --project my-app --status error --limit 5
Enter fullscreen mode Exit fullscreen mode

One flag. Two completely different UX modes.

Features the MCP Server Doesn't Have

langsmith-cli has 100% parity with the official MCP server (all 66 parameters mapped). But it also has features the MCP server can't offer:

Live Monitoring with runs watch

A real-time streaming dashboard in your terminal:

langsmith-cli runs watch --project my-app
Enter fullscreen mode Exit fullscreen mode

One-Command Debugging with runs get-latest

No more list | jq | get pipelines:

# Before: three commands piped together
langsmith-cli --json runs list --project X --limit 1 \
  | jq -r '.[0].id' \
  | xargs langsmith-cli --json runs get

# After: one command
langsmith-cli --json runs get-latest --project X --fields inputs,outputs,error
Enter fullscreen mode Exit fullscreen mode

Stratified Sampling with runs sample

Build statistically sound eval datasets:

langsmith-cli runs sample \
  --stratify-by tag:length,tag:content_type \
  --dimension-values "short|long,news|gaming" \
  --samples-per-combination 5 \
  --output eval_samples.jsonl
Enter fullscreen mode Exit fullscreen mode

Aggregate Analytics with runs analyze

Group-by metrics without leaving the terminal:

langsmith-cli --json runs analyze \
  --group-by tag:model \
  --metrics count,error_rate,p50_latency,avg_cost
Enter fullscreen mode Exit fullscreen mode

Schema Discovery with runs fields / runs describe

Don't know what fields your runs have? Discover them:

langsmith-cli --json runs fields --include inputs,outputs
# Returns field paths, types, presence rates, even language distribution
Enter fullscreen mode Exit fullscreen mode

Tag & Metadata Discovery

langsmith-cli runs tags --project my-app
langsmith-cli runs metadata-keys --project my-app
Enter fullscreen mode Exit fullscreen mode

Bulk Export with Pattern Filenames

langsmith-cli runs export ./traces \
  --project my-app --roots --limit 1000 \
  --filename-pattern "{name}-{run_id}"
Enter fullscreen mode Exit fullscreen mode

Production Run to Eval Example in One Command

langsmith-cli --json examples from-run <run-id> --dataset my-eval-set
Enter fullscreen mode Exit fullscreen mode

Smart Filtering That Translates to FQL

Nobody wants to write raw Filter Query Language. The CLI translates human-friendly flags automatically:

# These flags...
langsmith-cli runs list --tag summarizer --failed --last 24h --slow

# ...become this FQL:
# and(has(tags, "summarizer"), eq(error, true),
#     gt(start_time, "2026-03-03T..."), gt(latency, "5s"))
Enter fullscreen mode Exit fullscreen mode

Time presets like --recent (last hour), --today, --last 7d, and --since 2026-01-01 all work. Content search with --grep supports regex and field-specific matching. Everything composes.

What's New in v0.4.0

The v0.4.0 release focused on type safety and code quality:

  • Zero pyright errors — every function has proper type annotations. client: langsmith.Client, not client: Any. Return types are real SDK Pydantic models, not object.
  • datasets delete command with confirmation prompts and JSON mode support
  • Improved error handling across prompts and runs commands using specific SDK exception types (LangSmithNotFoundError, LangSmithConflictError) instead of broad except Exception
  • 702 unit tests passing with real Pydantic model instances (no MagicMock for test data)

Getting Started

# Install
uv tool install langsmith-cli
# or: pip install langsmith-cli

# Authenticate
export LANGSMITH_API_KEY="lsv2_..."
# or: langsmith-cli auth login

# Start exploring
langsmith-cli runs list --project my-app --last 24h
langsmith-cli --json runs get-latest --failed --fields name,error
Enter fullscreen mode Exit fullscreen mode

If you're using Claude Code, add the plugin for the best agent experience:

claude plugin marketplace add gigaverse-app/langsmith-cli
claude plugin install langsmith-cli@langsmith-cli
Enter fullscreen mode Exit fullscreen mode

The code is MIT licensed and on GitHub: gigaverse-app/langsmith-cli

If you're building with LangSmith and tired of context-heavy MCP servers, give it a try. Happy to hear feedback in the issues.

Top comments (0)