How langsmith-cli gives you 100% MCP parity, 96% less context usage, and features the MCP server doesn't have — all in a single pip install."
If you're using LangSmith with Claude Code (or any AI coding agent), you're probably running the official MCP server. It works. But every session, it injects 5,000+ tokens of tool schemas into your context window — whether you touch LangSmith or not.
I built langsmith-cli to fix that. It's a standalone CLI and a Claude Code plugin that replaces the MCP server with a <200 token skill definition. That's a 95% reduction in context overhead.
And it does more than the MCP server does.
The Problem with MCP Servers
MCP servers are always-on. The moment your agent session starts, every tool definition gets loaded into context. For LangSmith's MCP server, that's 66 parameters across multiple tools — around 5,000 tokens of JSON schema that sits in your context window doing nothing until you actually need to query traces.
For agents that need to do many things — write code, run tests, debug, and occasionally check LangSmith — this is wasteful. Context is your agent's working memory. Every token of schema is a token not available for reasoning.
The Fix: On-Demand Skills Instead of Always-On Schemas
langsmith-cli takes a different approach. Instead of an MCP server, it's a CLI tool with a tiny skill file that teaches your agent how to use it:
# Install the CLI
uv tool install langsmith-cli
# Add as Claude Code plugin
claude plugin marketplace add gigaverse-app/langsmith-cli
claude plugin install langsmith-cli@langsmith-cli
The skill file is ~200 tokens. It loads on-demand. Your agent learns to run shell commands like:
# Get the latest failed run with only the fields you need
langsmith-cli --json runs get-latest --project my-app \
--failed --fields id,name,error
No schema bloat. No always-on server. Just a CLI your agent calls when it needs observability data.
96% Token Reduction with --fields
This is the feature that matters most for agents. A typical LangSmith run object is 20KB — easily 1,000+ tokens. With --fields, you get only what you asked for:
# Full run object: ~1000 tokens
langsmith-cli --json runs get abc-123
# Just what you need: ~40 tokens
langsmith-cli --json runs get abc-123 --fields name,status,error
--fields works on every list and get command: runs, projects, datasets, examples, prompts. Your agent stays lean.
Built for Two Audiences
Most developer tools pick one audience. langsmith-cli serves both:
For humans — rich terminal tables with color-coded statuses, smart column truncation, syntax highlighting:
langsmith-cli runs list --project my-app --status error --last 24h
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Name ┃ Status ┃ Tokens ┃ Latency ┃ Error ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ extractor │ error │ 2,340 │ 3.2s │ Rate limit │
│ classifier │ error │ 1,102 │ 12.4s │ Timeout │
└──────────────┴────────────┴────────┴──────────┴─────────────┘
For agents — add --json as the first flag and everything switches: strict JSON to stdout, diagnostics to stderr, zero formatting noise:
langsmith-cli --json runs list --project my-app --status error --limit 5
One flag. Two completely different UX modes.
Features the MCP Server Doesn't Have
langsmith-cli has 100% parity with the official MCP server (all 66 parameters mapped). But it also has features the MCP server can't offer:
Live Monitoring with runs watch
A real-time streaming dashboard in your terminal:
langsmith-cli runs watch --project my-app
One-Command Debugging with runs get-latest
No more list | jq | get pipelines:
# Before: three commands piped together
langsmith-cli --json runs list --project X --limit 1 \
| jq -r '.[0].id' \
| xargs langsmith-cli --json runs get
# After: one command
langsmith-cli --json runs get-latest --project X --fields inputs,outputs,error
Stratified Sampling with runs sample
Build statistically sound eval datasets:
langsmith-cli runs sample \
--stratify-by tag:length,tag:content_type \
--dimension-values "short|long,news|gaming" \
--samples-per-combination 5 \
--output eval_samples.jsonl
Aggregate Analytics with runs analyze
Group-by metrics without leaving the terminal:
langsmith-cli --json runs analyze \
--group-by tag:model \
--metrics count,error_rate,p50_latency,avg_cost
Schema Discovery with runs fields / runs describe
Don't know what fields your runs have? Discover them:
langsmith-cli --json runs fields --include inputs,outputs
# Returns field paths, types, presence rates, even language distribution
Tag & Metadata Discovery
langsmith-cli runs tags --project my-app
langsmith-cli runs metadata-keys --project my-app
Bulk Export with Pattern Filenames
langsmith-cli runs export ./traces \
--project my-app --roots --limit 1000 \
--filename-pattern "{name}-{run_id}"
Production Run to Eval Example in One Command
langsmith-cli --json examples from-run <run-id> --dataset my-eval-set
Smart Filtering That Translates to FQL
Nobody wants to write raw Filter Query Language. The CLI translates human-friendly flags automatically:
# These flags...
langsmith-cli runs list --tag summarizer --failed --last 24h --slow
# ...become this FQL:
# and(has(tags, "summarizer"), eq(error, true),
# gt(start_time, "2026-03-03T..."), gt(latency, "5s"))
Time presets like --recent (last hour), --today, --last 7d, and --since 2026-01-01 all work. Content search with --grep supports regex and field-specific matching. Everything composes.
What's New in v0.4.0
The v0.4.0 release focused on type safety and code quality:
-
Zero pyright errors — every function has proper type annotations.
client: langsmith.Client, notclient: Any. Return types are real SDK Pydantic models, notobject. -
datasets deletecommand with confirmation prompts and JSON mode support -
Improved error handling across prompts and runs commands using specific SDK exception types (
LangSmithNotFoundError,LangSmithConflictError) instead of broadexcept Exception - 702 unit tests passing with real Pydantic model instances (no MagicMock for test data)
Getting Started
# Install
uv tool install langsmith-cli
# or: pip install langsmith-cli
# Authenticate
export LANGSMITH_API_KEY="lsv2_..."
# or: langsmith-cli auth login
# Start exploring
langsmith-cli runs list --project my-app --last 24h
langsmith-cli --json runs get-latest --failed --fields name,error
If you're using Claude Code, add the plugin for the best agent experience:
claude plugin marketplace add gigaverse-app/langsmith-cli
claude plugin install langsmith-cli@langsmith-cli
The code is MIT licensed and on GitHub: gigaverse-app/langsmith-cli
If you're building with LangSmith and tired of context-heavy MCP servers, give it a try. Happy to hear feedback in the issues.
Top comments (0)