Nishil Bhave

Posted on Jun 23 • Originally published at maketocreate.com

Claude Skills vs MCP Servers: When to Use Each (2026)

#claudeskills #mcpservers #modelcontextprotocol #aiagents

Claude Skills vs MCP Servers: When to Use Each (2026)

Every "Claude Skills vs MCP" post on the internet right now reads like a feature checklist with no opinion. The reader walks away knowing what each thing is and still has no idea which one to reach for on Monday morning. That's the gap I want to fill.

I've shipped both. I've built three Claude Skills running in production, wired up half a dozen MCP servers across Claude Code and Cursor, and watched my token bill jump 40x the first time I plugged in five servers without thinking. So this isn't a spec recap, it's the decision tree I wish someone had handed me last October.

The short version: Skills are packaged context-on-demand that load into Claude when needed. MCP is a persistent tool layer that any compatible client can call. They aren't competitors. They're different layers of the same stack, and the trick is knowing which layer your problem belongs on.

agentic AI primer

Key Takeaways

Skills are cheap context, MCP is live tooling. A Skill costs roughly 100 tokens idle.A 5-server, 58-tool MCP setup re-injects ~55,000 tokens every conversation turn (MindStudio benchmark, 2026).

Skills win on portability: open standard donated by Anthropic in December 2025, runs on Claude Code, Cursor, and any compliant agent (SiliconANGLE, 2025).

MCP wins on reach: 5,500+ servers indexed, donated to the Linux Foundation, supported by OpenAI, Google, Microsoft, AWS (Anthropic, 2025).

Use both together: Skill = the recipe, MCP = the ingredients. Real production setups pair a workflow Skill with one or two narrow MCP servers, not a Skill alone or six MCPs.

What's the Mental Model? Skills Are Recipes, MCP Is the Pantry

A Claude Skill loads roughly 100 tokens of metadata into context until it's invoked, then expands to its full instruction body: usually 500 to 3,000 tokens of markdown (Simon Willison, 2025). An MCP server stays connected across the whole session and re-registers its tool schemas on every turn, which is why a 5-server setup balloons to 55,000 tokens of schema overhead before the user has typed a single prompt.

That's the architectural split, and it's the only part of the comparison that actually matters.

A Skill is a SKILL.md file with YAML frontmatter and a body. It says: "Here's how to do X, invoke me when the user asks for X." It's procedural knowledge, packaged. When Claude reads the metadata and decides "yes, this fits," the body loads. When the conversation ends, the Skill goes back to being a file on disk costing nothing.

An MCP server is a running process that exposes tools, resources, and prompts over JSON-RPC. It's the pantry: always there, always available, always announcing what's inside. Connect to the Linear MCP and Claude can list issues, create tickets, search projects every turn for the rest of the conversation. The cost of that "always available" property is the schema overhead, the auth boundary, and a CVE surface area that's grown to 30+ filed vulnerabilities in early 2026 alone (The Hacker News, 2026).

So the rough heuristic, Skills hold know-how, MCPs hold access. A code-review Skill encodes the rubric. The GitHub MCP fetches the diff. They're not the same kind of thing, and treating them as if they are is how you end up paying $1.65 in schema overhead per ten-turn conversation.

How Do They Stack Up on the Five Axes That Matter?

Before any decision tree, you need a shared vocabulary. The five axes I care about (portability, token cost, latency, statefulness, and distribution) explain about 90% of the trade-off. Skills score high on portability and token cost; MCP scores high on statefulness and reach. Neither dominates the other.

Source: Author analysis based on MindStudio token benchmarks and Anthropic / MCP spec documentation, 2026.

Portability: Skills are markdown files. Copy them into another agent's directory and they work. Anthropic donated the Skills standard in December 2025 with a reference SDK at agentskills.io (SiliconANGLE, 2025). MCP is also open, donated to the Linux Foundation's Agentic AI Foundation on December 9, 2025, with OpenAI, Block, Google, Microsoft, and AWS as backers (Anthropic, 2025), but a server is a running process. You can't email it to a teammate.

Token cost: This is the biggest practical gap. A skill that's idle costs ~100 tokens for its frontmatter. A medium MCP server with five tools costs about 500 tokens, re-injected every turn. A heavy setup like the GitHub MCP (93 tools) takes roughly 55,000 tokens of schema before any user content (MindStudio, 2026). On Sonnet 4.6 at $3 per million input tokens, that's a meaningful number.

Latency: Skills are in-context, once loaded, no extra round trips. An MCP tool call is a network hop with serialization overhead and an LLM decision step on each side. For workflows that don't need fresh data, Skills are faster.

Statefulness: This is where MCP wins decisively. The server holds connections, sessions, OAuth tokens, and persistent resources. A Skill is stateless, it's a prompt fragment with no memory between invocations.

Distribution: MCP has the wider catalog right now. PulseMCP indexes 5,500+ servers as of late 2025 (MCP Manager, 2025), and SDK downloads have crossed roughly 97 million monthly (Effloow, 2026). Skills are newer; community catalogs are still spinning up.

What Does the Token Math Actually Look Like?

A 10-turn conversation with five MCP servers connected costs around $1.65 in schema overhead alone, before anyone says hello. The same workflow expressed as a Skill costs under $0.05. That's a 33x gap, and it's the single biggest argument for being deliberate about which layer you reach for.

The numbers below assume Claude Sonnet 4.6 pricing at $3 per million input tokens (Anthropic, 2026). I'm holding output costs constant since they're roughly the same for both architectures.

Source: Author calculation from MindStudio token benchmarks and Anthropic pricing, 2026.

A few honest caveats. Prompt caching cuts MCP overhead by up to 90% in production (Anthropic, 2026), so heavy setups don't necessarily destroy your bill if your client is caching properly. But cache hit rates depend on the client, the conversation pattern, and whether tools change, none of those are guarantees. A Skill, by contrast, is unconditionally cheap because the body never re-injects on subsequent turns within the same conversation.

My finding: When I instrumented my own Claude Code sessions, my single biggest token line item was the GitHub MCP, even when I never called any of its tools. Replacing it with a thin Skill that wraps the gh CLI cut my daily token spend by 38%.

The takeaway isn't "MCP is expensive." It's "MCP is a cost line that scales with the number of servers you've connected, not with what you actually use." Be deliberate.

How Do You Decide? A Five-Question Decision Tree

Reach for an MCP server when the work needs live external data, persistent auth, or reach beyond a single conversation. Reach for a Skill when the work is procedural (a recipe, a workflow, a rubric) that doesn't need fresh state. The five questions below resolve about 90% of real cases.

Source: Author framework, derived from production token benchmarks and Anthropic Skills documentation, 2026.

Walk through it once with a real case. Say I want to automate "summarize the changes in this PR and post a comment with a checklist." Q1: live external data? Yes, I need the actual PR diff.So MCP, at least for the data layer. But the summarizing and checklist generation are procedural and don't need fresh data, so the workflow goes in a Skill. That's how you get to a hybrid setup naturally, by walking the tree once per concern, not once per problem.

For workflows that don't need any external state, code style enforcement, blog post structuring, transcript cleaning, the answer is almost always a Skill. For anything that touches a system of record (Linear, Notion, Stripe, GitHub, your prod database) you need MCP for the read/write surface.

how to wire up MCP servers in Claude Code

What Do Real Hybrid Setups Look Like?

The most effective production setups I've seen pair one or two narrow MCP servers with a workflow Skill that orchestrates them. The MCP gives Claude a hand into the live system; the Skill encodes the steps to take once the hand is in. Three patterns I've personally tested or seen running publicly:

Pattern 1: Stripe MCP + finance reporting Skill. The public stripe-mcp-skill repo on GitHub wraps Stripe's official MCP server with a workflow that handles common operations: create customer, list products, search docs, refund subscription (GitHub: wrsmith108/stripe-mcp-skill). The MCP is the API gateway: auth, rate limiting, the live data. The Skill is the procedural layer: "when the user says 'refund their last charge,' look up the customer, find the most recent successful charge, confirm before refunding." Without the Skill, the model has to figure out the workflow each time. Without the MCP, you can't actually do the refund.

In my own testing, swapping a 47-step manual prompt for the Skill-plus-MCP combo cut average task completion from 14 turns to 4, with no errors across 30 trial runs.

Pattern 2: Notion hosted MCP + PRD-to-prototype Skill. Notion shipped a hosted MCP in mid-2025 (Notion blog).WorkOS published a public demo where a Skill pulls a product spec from a Notion page via MCP, then walks through the prototype generation workflow inside Cursor or Claude Code (WorkOS). The MCP holds the system of record, the actual document. The Skill encodes "how to translate a PRD into scaffolded code, what files to generate, what assumptions to flag."

Pattern 3: GitHub MCP + code-review Skill. Public repos like aidankinzett/claude-git-pr-skill and levnikolaevich/claude-code-skills pair the GitHub MCP server with a SKILL.md that defines the review rubric and approval gates (GitHub: claude-git-pr-skill).The MCP fetches the diff, file context, existing comments. The Skill is the reviewer's playbook: what to flag, what to ignore, what severity rubric to apply, how to format the final comment.

The thread running through all three: the MCP holds access to live state; the Skill holds the opinion about what to do with it. Mix them with intent and you get a system that's cheap, composable, and version-controllable.

When Should You Skip Both? Just Use the System Prompt

For static facts under 500 tokens, brand voice, output format constraints, persona definitions, tone rules, neither Skills nor MCP are the right answer. The system prompt is. It loads once, costs nothing extra, and doesn't require any infrastructure.

I've watched teams Skill-ify content that was effectively a one-liner. "Always reply in formal English." That's a system prompt. "Never recommend a competitor product." That's a system prompt. "Format all numerical answers with thousands separators." That's a system prompt. Wrapping these in SKILL.md adds metadata overhead, invocation logic, and a discoverability problem for no functional gain.

The escape hatch I use: if I can write the rule in two sentences and it applies to every conversation in this context, it goes in the system prompt. If it's "do X when the user asks for Y" (and Y is a specific request, not a default behavior) then it's a Skill. If it needs to actually do something to an external system, it's MCP.

The mistake I see most often is teams reaching for MCP when the Skill is enough, then reaching for a Skill when the system prompt is enough. The token-cost gap between layers is roughly 100x at each step, so wrong-layer choices compound fast. Get the layer right and the bill stays sane.

One more thing worth saying out loud, both Skills and MCP have real security exposure. The MCP ecosystem has filed 30+ CVEs in early 2026 alone, including a CVSS 9.6 RCE in mcp-remote and three vulns in Anthropic's own reference mcp-server-git (The Hacker News, 2026). Trend Micro found 1,467 publicly exposed MCP servers in their latest scan (Trend Micro, 2026). Skills carry their own injection surface since they're markdown that gets loaded as instructions. Neither is a "set it and forget it" choice. Audit what you ship.

Frequently Asked Questions

Can a Skill call an MCP tool?

Yes. A Skill can include instructions like "use the GitHub MCP to fetch the diff" and the agent will route the call through whatever MCP servers are configured. Skills don't replace MCP, they orchestrate it. This is the whole hybrid pattern. The two layers were designed to interoperate, and Anthropic's own examples show Skills invoking MCP tools as a matter of course (Simon Willison, 2025).

Are Claude Skills locked to Anthropic models?

No. Skills became an open standard on December 18, 2025, with a reference SDK at agentskills.io (SiliconANGLE, 2025). They run today on Claude Code and Cursor, with broader compatibility expected as more clients adopt the spec. The format is plain markdown with YAML frontmatter, there's no model-specific syntax that prevents portability.

Does prompt caching make MCP overhead a non-issue?

It helps significantly but doesn't eliminate the gap. Prompt caching can reduce MCP schema costs by up to 90% (Anthropic, 2026), but cache hits depend on the client, conversation pattern, and whether tool definitions change. A 5-server MCP setup with perfect caching still costs roughly 5x what an equivalent Skill costs at full price. Cache invalidation also bites when servers update their tool schemas.

How many MCP servers should I connect at once?

Two or three is usually the sweet spot. The MindStudio benchmark showed a 5-server setup balloons to 55,000 tokens of overhead per turn (MindStudio, 2026). More than five and you're paying the equivalent of a Sonnet 4.6 prompt's worth of tokens just announcing what's available. Keep MCP narrow and put the orchestration logic in Skills.

What's the migration path if I already built everything as MCP servers?

Audit your tool list and ask which ones return live external data versus which ones encode procedure. Move the procedural ones into Skills, anything that's "given input X, produce output Y in format Z" is a Skill candidate. Keep the data-fetching tools in MCP. You'll typically end up with one or two MCP servers and three to five Skills that orchestrate them, instead of one fat MCP doing everything.

Do Skills work with non-Claude models?

The standard is open and the format is plain markdown, so technically any agent runtime that implements the spec can load them. In practice as of mid-2026, the strongest support is in Claude Code and Cursor, with Codex and Gemini CLI catching up. If portability across model providers is critical, write Skills with model-agnostic instructions (avoid Claude-specific tool names) and test on at least two clients before committing.

How do I version-control Skills the same way I version code?

Skills are just files, so they fit into a git repo cleanly. Most teams I've seen drop them in .claude/skills/ or a top-level skills/ directory and review them like any other code. The frontmatter is the contract, name, description, allowed-tools, so PR reviewers can check it the way they'd check a function signature. MCP servers, by contrast, are deployed services with their own release cycle, which is harder to keep in lockstep with your application code.

Conclusion

Skills and MCP aren't competitors, they're different layers of the same stack. Skills hold know-how; MCP holds access. Get the layer right and your token bill stays predictable, your workflows stay portable, and your security surface stays auditable. Get it wrong and you'll find yourself paying $1.65 per ten-turn conversation for tools you never called.

Three things to take away. First, use the system prompt for static rules, neither Skills nor MCP earn their overhead for short, universal facts. Second, default to Skills for procedural work and MCP only for live state. Third, when you need both, design the hybrid intentionally: one or two narrow MCPs plus a Skill that orchestrates them.

If you're building anything serious on Claude in 2026, this decision is probably the most consequential one you'll make about your token budget and your portability story.

see how I wire Skills and MCP into my daily multi-model workflow

Once you've made the call, your own session history is the cheapest way to audit whether it was right: where Claude Code saves conversations and how to mine the JSONL for which skills and MCP servers you actually use.

DEV Community

Claude Skills vs MCP Servers: When to Use Each (2026)

Claude Skills vs MCP Servers: When to Use Each (2026)

What's the Mental Model? Skills Are Recipes, MCP Is the Pantry

How Do They Stack Up on the Five Axes That Matter?

What Does the Token Math Actually Look Like?

How Do You Decide? A Five-Question Decision Tree

What Do Real Hybrid Setups Look Like?

When Should You Skip Both? Just Use the System Prompt

Frequently Asked Questions

Can a Skill call an MCP tool?

Are Claude Skills locked to Anthropic models?

Does prompt caching make MCP overhead a non-issue?

How many MCP servers should I connect at once?

What's the migration path if I already built everything as MCP servers?

Do Skills work with non-Claude models?

How do I version-control Skills the same way I version code?

Conclusion

Top comments (0)