When building AI-assisted development workflows, the documentation explains what each approach does—but not the real cost implications or when to use which.
I instrumented network traffic and ran controlled experiments across five approaches using identical tasks: same 500-row dataset, same analysis requirements, same model (Claude Sonnet). The results revealed that architecture matters more than protocol choice.
MCP Optimized consumed 60,420 tokens. MCP Vanilla consumed 309,053 tokens. Same protocol. Same task. 5x difference—driven entirely by one decision: file-path references vs. data-array parameters.
This article provides a decision framework based on measured data, not marketing claims.
The Decision Framework
Before diving into data, here's the framework I developed from these experiments:
Quick Decision Guide
| If your situation is... | Use this approach |
|---|---|
| Repeating task (>20 executions), large datasets, need predictable costs | MCP Optimized |
| One-off exploration, evolving requirements, prototyping | Code-Driven (Skills) |
| User must control when it runs, deterministic behavior needed | Slash Commands |
| Production system with security requirements | MCP Optimized (never Skills) |
Decision Flowchart
Q1: One-off task (< 5 executions)?
YES → Code-Driven or direct prompting
NO → Continue
Q2: Dataset > 100 rows AND need < 5% cost variance?
YES → MCP Optimized
NO → Continue
Q3: User needs explicit control over invocation?
YES → Slash Commands
NO → Continue
Q4: Execution count > 20 AND requirements stable?
YES → MCP Optimized
NO → Code-Driven (prototype, then migrate)
NEVER:
- MCP Vanilla for production (always suboptimal)
- Skills for multi-user or sensitive systems
The Three Approaches Explained
MCP (Model Context Protocol)
A structured protocol for AI-tool communication. The model calls tools with JSON parameters, the server executes and returns structured results.
// MCP tool call - structured, typed, validated
await call_tool('analyze_csv_file', {
file_path: '/data/employees.csv',
analysis_type: 'salary_by_department'
});
Characteristics: Structured I/O, access-controlled, model-decided invocation, reusable across applications.
Critical distinction: There's a 5x token difference between vanilla MCP (passing data directly) and optimized MCP (passing file references). Same protocol, vastly different economics.
Code-Driven (Skills & Code Generation)
The model writes and executes code to accomplish tasks. Claude Code's "skills" feature lets the model invoke capabilities based on semantic matching.
# Claude writes this, executes it, iterates
import pandas as pd
df = pd.read_csv('/data/employees.csv')
result = df.groupby('department')['salary'].mean()
print(result)
Characteristics: Maximum flexibility, unstructured I/O, higher variance between runs, requires sandboxing.
Slash Commands
Pure string substitution. You type /review @file.js, the command template expands, and the result injects into your message.
<!-- .claude/commands/review.md -->
Review the following file for security vulnerabilities,
performance issues, and code quality:
{file_content}
Focus on: authentication, input validation, error handling.
Characteristics: User-explicit, deterministic, single-turn, zero tool-call overhead.
Measured Data: What the Numbers Show
Methodology
- Same workload: load 500-row CSV, perform grouping, summary stats, two plots
- Same model: Claude Sonnet, default settings
- 3-4 runs per approach with logged request/response payloads
- Costs calculated at current Claude Sonnet pricing
Token Consumption
Token consumption per API request. MCP Optimized achieves consistently low usage through file-path architecture.
| Approach | Avg tokens/run | vs Baseline | Why |
|---|---|---|---|
| MCP Optimized | 60,420 | -55% | File-path parameters; zero data duplication |
| MCP Proxy (warm) | 81,415 | -39% | Shared context + warm cache |
| Code-Skill (baseline) | 133,006 | — | Model-written Python; nothing cached |
| UTCP Code-Mode | 204,011 | +53% | Extra prompt framing |
| MCP Vanilla | 309,053 | +133% | JSON-serialized data in every call |
Cost at Scale
At 1,000 monthly executions:
| Approach | Per Execution | Monthly | Annual |
|---|---|---|---|
| MCP Optimized | $0.21 | $210 | $2,520 |
| Code-Skill | $0.44 | $440 | $5,280 |
| MCP Vanilla | $0.99 | $990 | $11,880 |
$9,360 annual difference between optimized and vanilla MCP for a single workflow.
Scalability
Cumulative token consumption. MCP Optimized maintains low growth; vanilla approaches accumulate steeply.
| Approach | Scaling Factor | 10K Row Projection |
|---|---|---|
| MCP Optimized | 1.5x | ~65K tokens |
| Code-Skill | 1.1-1.6x | ~150-220K tokens |
| MCP Vanilla | 2.0-2.9x | ~500-800K tokens |
MCP Optimized exhibits sub-linear scaling because file paths cost the same tokens regardless of file size. MCP Vanilla exhibits super-linear scaling because larger datasets require proportionally more tokens for JSON serialization.
Variance
| Approach | Coefficient of Variation | Consistency |
|---|---|---|
| MCP Optimized | 0.6% | Excellent |
| MCP Proxy (warm) | 0.5% | Excellent |
| Code-Skill | 18.7% | Poor |
| MCP Vanilla | 21.2% | Poor |
MCP Optimized hit 60,307, 60,144, and 60,808 tokens across three runs. Code-Skill ranged from 108K to 158K. High variance breaks capacity planning and makes cost prediction unreliable.
Latency
Skills and sub-agents use tool-calling, which means two LLM invocations instead of one:
User message → Model decides → Tool call → Tool result → Final response
Slash commands avoid this—they're just prompt injection with direct response.
Key Lessons
1. Architecture Trumps Protocol
The 5x token difference between MCP Optimized and MCP Vanilla uses the same protocol. The difference is entirely architectural: file paths vs data arrays. Focus on data flow design, not protocol debates.
2. The File-Path Pattern
The single biggest efficiency gain: eliminate data duplication.
// Anti-pattern: 10,000 tokens just for data
await call_tool('analyze_data', {
data: [/* 500 rows serialized */]
});
// Pattern: 50 tokens for the same operation
await call_tool('analyze_csv_file', {
file_path: '/data/employees.csv'
});
The MCP server handles file I/O internally. Data never enters the context window.
3. Prototype with Skills, Ship with MCP
Skills execute arbitrary code—bash commands, file system access, network calls. They're excellent for figuring out what tools you need. They're inappropriate for production systems where security matters.
4. Slash Commands Are Underrated
When you need deterministic, user-controlled workflows, slash commands win. No tool-call overhead, no model surprises, no latency penalty. Use them for repeatable tasks like code review checklists or deployment procedures.
5. Sub-Agent Context Isolation
Sub-agents can't see your main conversation history. If they need context, you must explicitly pass it in the delegation prompt. This is by design—clean delegation—but requires explicit information passing.
6. CLAUDE.md Costs Compound
CLAUDE.md content injects into every message, including sub-agent conversations. Keep it concise. Use file references to pull in additional docs only when needed:
<!-- CLAUDE.md -->
# Project Standards
See @docs/CODING_STANDARDS.md for detailed guidelines.
Key rules:
- Use TypeScript strict mode
- No any types
7. Measure Before Optimizing
Instrument your network traffic. The Anthropic API returns token usage in every response—log it. You might be surprised where tokens are actually going.
Implementation Patterns
Parallel Tool Execution
File-path architecture enables parallel calls:
// Four visualizations, one API call, ~400 tokens total
await Promise.all([
call_tool('create_viz', { file: '/data/emp.csv', type: 'bar', x: 'dept', y: 'salary' }),
call_tool('create_viz', { file: '/data/emp.csv', type: 'scatter', x: 'exp', y: 'salary' }),
call_tool('create_viz', { file: '/data/emp.csv', type: 'pie', col: 'department' }),
call_tool('create_viz', { file: '/data/emp.csv', type: 'bar', x: 'location', y: 'salary' }),
]);
Progressive Tool Discovery
For large tool catalogs (20+ tools), use meta-tools for on-demand discovery instead of loading all tools upfront:
// Initial context: 2 tools, ~400 tokens
const meta_tools = [
{ name: 'describe_tools', description: 'Discover available tools' },
{ name: 'use_tool', description: 'Execute a specific tool' }
];
// Instead of: 50 tools, ~50,000 tokens upfront
Phased Migration Strategy
For uncertain repeatability:
- Phase 1: Use code-driven to validate the task. Accept higher per-execution cost for flexibility.
- Phase 2: If the task stabilizes and will repeat, invest in MCP Optimized.
- Phase 3: Track actual execution count and token consumption. Migrate when patterns are clear.
Summary
| Approach | Best For | Avoid When |
|---|---|---|
| MCP Optimized | Production workloads, large datasets, predictable costs, security requirements | One-off tasks, evolving requirements |
| Code-Driven | Prototyping, novel requirements, maximum flexibility | Production systems, multi-user environments |
| Slash Commands | User-controlled workflows, deterministic behavior, zero overhead | Automation, context-dependent decisions |
The core insight: how you architect data flow matters more than which protocol you choose. The 5x token difference between optimized and vanilla MCP—for the same task—demonstrates this clearly.
Match the tool to your constraints. Measure the results.
References
- Token Efficiency in AI-Assisted Development - Full analysis of token consumption across approaches
- Claude Code Internals: Reverse Engineering Prompt Augmentation - Deep dive into how Claude Code's prompt mechanisms work
- MCP Specification
- AICode Toolkit (GitHub) - MCP servers and tools for AI-assisted development
- Token efficiency experiments (GitHub)
- Prompt augmentation analysis (GitHub)
All claims are reproducible using the open-source data and tooling in the referenced repositories.


Top comments (0)