DEV Community

Cover image for Skill, MCP, Plugin, or just a CLI: how I pick a Claude Code extension, lightest first
Rapls
Rapls

Posted on • Edited on • Originally published at zenn.dev

Skill, MCP, Plugin, or just a CLI: how I pick a Claude Code extension, lightest first

Nested tool logic and security pitfalls

I was building a plugin release with Claude Code, and the changelog draft came together nicely. Pull git log from the last tag to now, drop it under == Changelog ==. That's a procedure, so it just worked.

The next step is where I tripped. I wanted to add the current WordPress.org active install count to the release post, so I added a line to the same procedure file: "fetch the stats and write them in." It didn't work. Of course it didn't. A file that holds a procedure teaches Claude the steps, but it has no legs to go out and fetch today's numbers from a website. To go get them, you need a mouth that talks to the outside. That was a different tool's job.

Claude Code has several ways to add capability: Skill, MCP, Plugin. The names sound alike and the explanations blur together, and a CLI is in the mix too. I build WordPress plugins and use coding agents most days, and I still hesitated every time about which one to reach for. This post is how I draw the line, settled into one rule: reach for the lightest thing first.

A quick note on setup

These four move fast. Treat the list below as true when I checked it, and verify on your own machine with /context, /mcp, and /plugins.

  • Claude Code: the value from claude --version (swap in your real number)
  • Slash commands are now folded into Skills. If you read this expecting "commands" to be a separate thing, it won't line up.
  • My targets are self-built WordPress plugins and themes.

First: it's nested, not a flat choice of three

Laid out in a comparison table, these three confuse people, because you end up comparing things with different jobs on the same axis. They're actually nested.

A Plugin is a distribution container. Inside it go Skills, MCP configs, slash commands, subagents, and hooks. A Skill is a bundle of procedure or knowledge, and from inside it you can call an MCP tool or a CLI. Slash commands now live inside Skills. Think recipe card (Skill), the plumbing that connects your kitchen to the outside market (MCP), and the whole stocked kitchen that packages it all (Plugin). Nobody argues about whether a recipe card is better than plumbing. They do different jobs. Same here: you don't compare, you ask which layer your problem lives in.

Hold that nesting in your head and my opening mistake explains itself in one line: I tried to do an MCP's job (fetch outside numbers) with a Skill (teach a procedure). I was confusing layers.

The rule: reach for the lightest first

Here's the whole decision. Apply it top to bottom.

  1. Just teaching a procedure or knowledge? Skill.
  2. Need live outside data? Is there a CLI for it? If yes and you use it rarely, call the CLI from a Skill.
  3. No CLI, or you use it deeply every day? MCP.
  4. Want to bundle and reuse or share all of the above? Plugin.

I say "lightest first" because the cost in context and effort grows as you go down. Most of what you want is item 1. But the shiny new tool pulls your eye, and you start thinking from MCP or Plugin. That was me. Ask whether a Skill is enough, then whether a CLI reaches it, and only then reach for the heavy tools.

A few recent calls, run through this. Standardize commit message style: a procedure, so Skill. Peek at the staging post count: outside, but wp-cli handles it and I use it rarely, so call the CLI from a Skill. Operate the production dashboard deeply, every day: a CLI isn't enough and the frequency is high, so MCP. Ship a shared lint config and release steps across plugins: distribution, so Plugin.

The rest of this is each layer in that order.

Skill: procedure and knowledge

A folder with a SKILL.md: YAML frontmatter on top, Markdown body below. The description sits ready, lightly, and the body opens only when a related task comes up.

---
name: release-build
description: "Release prep for a plugin. Version bumps, changelog entry, build the distribution zip."
disable-model-invocation: true
allowed-tools:
  - Read
  - Edit
  - Bash(git log *)
  - Bash(unzip -l *)
---
Enter fullscreen mode Exit fullscreen mode

One thing people misread is allowed-tools. It does not restrict which tools can run; it pre-approves the listed ones to run without a confirmation prompt. Tools not on the list can still be called, subject to your normal permission settings. So something you truly want blocked isn't stopped by leaving it off this list. The other is disable-model-invocation: true, which I use for side-effecting work like a release: I'd rather invoke /release-build myself than have Claude helpfully start it.

The key point: a Skill reaches local files and commands Claude Code can run. Fetching a website's current numbers, anything live, won't happen no matter how you word the steps. That's where I tripped. A Skill memorized the steps; it has no legs to go outside.

MCP: live data and real systems

MCP (Model Context Protocol) connects Claude to outside tools and data. You declare servers in config, for example .mcp.json:

{
  "mcpServers": {
    "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] }
  }
}
Enter fullscreen mode Exit fullscreen mode

WordPress.org stats, a staging database, GitHub issues. When you want the value of the moment from outside, that's MCP. Active install counts change daily, so yesterday's number baked into a Skill is wrong tomorrow. Fixed things you can write down go in a Skill; things that change each time need MCP. /mcp lists connected servers and lets you disconnect them.

But MCP is heavy, and not by a little. I'll get to the numbers below, but it's enough weight that "connect everything that looks useful" is a bad default.

Call a CLI from a Skill: think before reaching for MCP

When I want outside data, I don't jump straight to MCP. First I check whether a CLI already does it. GitHub has gh. WordPress has wp-cli. If the CLI exists, calling it from a Skill's steps via Bash is usually lighter.

wp post list --post_type=post --post_status=publish --format=count
wp option get tmfs_settings --format=json
Enter fullscreen mode Exit fullscreen mode

The reason is how context loads. An MCP server loads its full tool schema the moment you connect, and carries it every turn. A CLI called through Bash loads nothing at startup; you pay only when the command runs. For something you touch occasionally, that difference is large. One reported benchmark put the GitHub CLI at 4 to 32 times cheaper per operation than the GitHub MCP. A daily, deep integration may justify MCP's structured results, but holding a full schema in context for a tool you touch twice a month is a bad trade.

Plugin: bundle, reuse, share

A Plugin packages Skills, MCP config, commands, subagents, and hooks into one distributable unit with a plugin.json.

my-plugin/
├── .claude-plugin/plugin.json
├── skills/
└── .mcp.json
Enter fullscreen mode Exit fullscreen mode

Install with /install from a marketplace, manage with /plugins. If a release procedure is common across several plugins, bundle the release-build Skill and its build script into a Plugin and pull it into a new repo in one shot, instead of copy-pasting files around. A Plugin doesn't solve the content; the content is Skills and MCP. It solves distribution only. Decide the content first, wrap it in a Plugin when you want to ship it. Starting from "let's make a Plugin" usually stalls.

One thing I watch when installing someone else's Plugin: a Plugin can carry MCP configs and hooks, and those run code on my machine. The official marketplace is curated; community ones vary. I check what MCP and hooks are inside before installing.

The context cost, with real numbers

"Lightest first" is grounded in a real weight difference. Numbers vary by version and setup, so measure your own, but the reported figures are stark.

A session starts heavy before you type anything: system prompt, CLAUDE.md, memory, the tool schemas of connected MCP servers, and the names and descriptions of your Skills, all loaded at startup. Reported sessions begin around 20k to 30k tokens with nothing typed.

MCP adds the most. A connected server loads its whole schema regardless of use, and it rides every turn, even in a session that never touches it. A publicly shared /context breakdown looked like this on a 200k window:

System prompt    3.2k   (1.6%)
System tools    16.1k   (8.0%)
MCP tools       98.7k   (49.3%)   <- tool definitions of connected servers
Memory files     3.x k
Enter fullscreen mode Exit fullscreen mode

Nearly half the window on MCP tool definitions. Another report had 7 servers eating 67,300 tokens (about 34%) before any conversation. Per server: the official GitHub MCP runs about 18k for its 27 tools even when you never touch GitHub, and a fuller build hits roughly 55k across 93 tools. In one /doctor dump, a 20-tool server cost about 14.1k, Playwright (21 tools) about 13.6k, a SQLite server (19 tools) about 13.4k, roughly 700 tokens per tool. And it reloads every turn: at 15k of overhead over a 20-turn session, that's 300k tokens spent on tool definitions alone, whether or not you used them.

Skills are the opposite. At startup only the names and descriptions load, tens of tokens each, with the body opening on demand. You can keep many Skills and barely move the startup cost. Same "add capability," completely different weight. That's why lightest first holds up.

To measure and cut: run /context and read the MCP Tools section, then /mcp to disconnect what you're not using. Claude Code's tool search defers schema loading until a tool is needed; one measurement dropped main-thread usage from 51k to 8.5k, about a 47% cut. Allow-listing only the tools you use can cut a 50-tool server to 5 for roughly a tenth of the cost.

One caution: /context historically overstated MCP usage by counting shared overhead per tool, showing nearly 3x the real figure, which was corrected around late January 2026. Discount old numbers and old posts, and measure on your own version.

Common mix-ups

  • Making a fixed procedure or template into an MCP server. That's a Skill. You carry the weight and the capability is no better, and you burn startup tokens for nothing.
  • Standing up an MCP for something gh or wp-cli already do. The startup load grows and the capability doesn't.
  • Cramming long task procedures into CLAUDE.md and bloating it. CLAUDE.md is always loaded; per-task steps belong in a Skill that opens on demand.
  • Treating slash commands as separate. They're folded into Skills now.
  • Standing up a Plugin or MCP for a one-off. A prompt in the moment is enough. Build machinery only for what you know you'll repeat.

Skill lives in the repo, MCP lives on the machine

Disclaimer: The experiences and decisions in this post are my own. English isn't my first language, so I use an AI assistant to help draft and edit the writing.

Sharing differs too, and that affects the choice. A Skill in .claude/skills/ is a file in git: committed, reviewed, traveling with the code. MCP connection config tends to hold secrets like API keys, so it doesn't go in git and stays tied to a machine or environment, different per person. Keep keys in environment variables rather than inline in .mcp.json. A Plugin is the explicit way to package and ship even the machine-bound parts. What you want to share, and with whom, is another axis for choosing.

A note to my next self

When in doubt: procedure, data, or distribution. Procedure is a Skill; outside data is a CLI first, then MCP; bundling to share is a Plugin. Apply lightest first, and stop at Skill if Skill is enough. Don't start from the heavy tool. That alone prevented most of my mix-ups, including trying to fetch live numbers with a procedure file.

How much execution you let an agent do, and how you stop the irreversible commands, is a separate question I wrote up elsewhere. Bundling my release setup into a Plugin is still on my list. When I do it, I'll write that up too. Leaving this map here so future-me doesn't freeze in front of the three again.

References

Token figures above are public measurements from other setups. Measure your own with /context.


Originally written in Japanese on Zenn. I build WordPress plugins.

Top comments (40)

Collapse
 
itskondrat profile image
Mykola Kondratiuk

when the tool needs to touch anything external or stateful, lightest-first reverses on you. a CLI that silently fails mid-procedure is harder to debug than an MCP that surfaces the failure cleanly. I would rather front-load the complexity.

Collapse
 
rapls profile image
Rapls

That's a fair pushback, and I think we're optimizing for two different axes. I was arguing on context cost. You're arguing on failure observability, and on that axis you're right: a CLI that dies mid-procedure with a swallowed exit code is genuinely worse to debug than an MCP that surfaces the error in a structured way.

Where I'd still hold the line is the trigger for front-loading. I'd make it the failure characteristics, not the tool layer. A CLI you call from a Skill can be wrapped to fail loud (check the exit code, surface stderr, stop the procedure), and once you do that, the cheap-context path is still on the table. But for something stateful where a partial failure leaves the world half-changed, I'm with you, the clean failure surface earns its weight, and lightest-first should bend.

Maybe the rule is "lightest first, until the cost of a silent partial failure outweighs the context savings." That moves the decision onto the thing you're actually worried about.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

failure observability is the harder axis to retrofit - context cost recovers, a swallowed exit at step 3 of a stateful write doesn't. what's your front-load trigger: state type, or error class?

Thread Thread
 
rapls profile image
Rapls

You're right that observability doesn't retrofit. Context cost is a dial you can turn back later; a swallowed exit at step 3 of a stateful write is a hole you don't find until something's already half-written. They're not symmetric.

To your either/or: my trigger is state type first, error class second, because the two aren't independent. What makes a failure unrecoverable isn't the error itself, it's whether the operation mutated shared state before it died. A silent failure in a read is an annoyance; the same silent failure mid-write is a half-changed world. So I'd gate on reversibility: does a partial failure leave something committed that I can't cleanly roll back. If yes, that's where I front-load the structured-failure tool, regardless of error class. Error class only changes how loud the failure is; state type decides how much it costs. I optimize for the cost.

Put plainly: stateless or idempotent, stay light and let it fail loud. Mutating and non-idempotent, front-load the clean failure surface. The write is the thing that earns the complexity, not the external call.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

state type first is right. most error classes look recoverable until you ask whether anything has been written - at that point the error taxonomy is a distraction and you are just triaging state.

Thread Thread
 
rapls profile image
Rapls

"You're just triaging state" is the right place to land this. That's the sentence I'll keep. Error taxonomy feels like the question because it's visible and structured, but it's downstream of the only thing that actually matters: did this touch state I can't take back. Answer that first and the error class is just detail about a problem you've already classified.

Which loops back to the original rule in a way I like. "Lightest first" was never really about the tool, it was about reversibility all along. Stay light where a failure is recoverable, front-load structure where it isn't. The whole thread basically refactored my context-cost argument into a state-recoverability one, and the second framing is better. Good exchange.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

right, and once you frame it that way the rest of the error handling almost writes itself - the taxonomy is just naming a thing you already know

Thread Thread
 
rapls profile image
Rapls

Right. Once reversibility is the question, the error handling stops being a design problem and turns into bookkeeping: you already know what's recoverable, you're just writing it down. The taxonomy was never generating the answer, it was labeling one you'd reached by asking about state.

Genuinely one of the better threads I've had here. You took a context-cost rule and left it a state-recoverability one, and it's the better rule. Thanks for pushing on it.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

bookkeeping is exactly the right word for it - once reversibility settles the design question you are just transcribing what you already know. good thread.

Thread Thread
 
rapls profile image
Rapls

That's the one. Reversibility settles the design question, and everything after is transcription. Glad this one landed where it did. Threads like this are why I still write these posts. Thanks, Mykola.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

That gap stays open even after you pick your reversibility threshold - the action might be reversible, the trust cost often is not. Treat it as a second filter now, especially for anything that crosses a stakeholder boundary.

Thread Thread
 
rapls profile image
Rapls

That's a sharp addition, and you're right that it's a separate filter. A database write can roll back; a stakeholder who watched the agent do the wrong thing across a trust boundary doesn't roll back the same way. The action is reversible, the confidence isn't. So the rule needs a second gate: technical reversibility, then trust reversibility, and the second one tightens fast the moment a failure is visible to someone whose trust you depend on. Good one to end on. I'll be carrying that filter forward.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

and the trust cost does not show up at error time - it shows up two sprints later when someone starts manually checking everything the agent does. that is the invisible tax.

Collapse
 
xulingfeng profile image
xulingfeng

The "nested, not a flat choice" framing is the piece I keep coming back to. Most comparison posts treat them like three columns in a table — which misses the point entirely.

The changelog example lands well because it's the exact reason I reach for a Skill first too: if the knowledge is static, there's no reason to involve MCP plumbing. But the tricky part (and the one I don't have a clean answer for yet) is when a Skill starts as static and gradually needs live data. At what point do you refactor — when the second external call appears, or the fifth?

Your lightest-first rule feels right, but the migration path between layers is where I keep hesitating.

Collapse
 
rapls profile image
Rapls

The "three columns in a table" trap is exactly what I was trying to get out of. Once you see the layers as nested, the question stops being "which is best" and becomes "which layer does this problem live in," and that reframing is most of the work.

On the migration point, the one you keep hesitating at: I'd stop counting calls. Second versus fifth measures the wrong thing. The trigger I'd use now is whether the workflow needs durable state and a stable contract: does call 2 depend on call 1, does auth or session carry across, do several tools need the same schema. A static Skill that grows three independent read-only fetches is still a Skill. The day one fetch has to feed the next, that's the refactor signal, regardless of the count.

One middle step makes the move painless: wrap the CLI output in a tiny script that normalizes it into a predictable JSON shape. That keeps the Skill cheap now, and if the pattern repeats, that shape is already the blueprint for the MCP schema. The migration becomes a promotion of something you shaped, not a rewrite.

Collapse
 
xulingfeng profile image
xulingfeng

Yo that CLI normalization tip you dropped? Tried it. Works. Running before scheming is so much nicer than MCP upfront. And vice versa too — when MCP gets annoying to touch, CLI fallback > rewrite any day.

Thread Thread
 
rapls profile image
Rapls

Made my day that you actually ran it. "Running before scheming" is a better tagline for the whole post than the one I used.

And the reverse you found is the half I undersold: CLI fallback over rewrite. We treat the migration as one-directional, Skill grows up into MCP, but it goes both ways. When an MCP gets annoying to touch, dropping back to a CLI call is almost always cheaper than maintaining the server. Lightest-first isn't just where you start, it's where you return to when the heavy thing stops paying for itself. Thanks for closing that loop.

Collapse
 
theuniverseson profile image
Andrii Krugliak

Lightest-first is the rule I wish I'd started with. A CLI you can read in a minute beats an MCP server you end up debugging at 2am. I only reach for the heavier thing once the light one clearly can't do the job.

Collapse
 
rapls profile image
Rapls

The 2am-debugging line is exactly it. Readability is a cost too, and the light tool wins on that axis before you even count tokens. A CLI you can read in a minute is one you can also fix in a minute, half-asleep. I'd only add your own caveat back at you: "clearly can't do the job" is the right trigger, and the honest version of it is stateful work, where one call has to feed the next. Until then, the thing you can read wins.

Collapse
 
theuniverseson profile image
Andrii Krugliak

That stateful-work line is the cleanest version of the rule I've seen. The minute step N has to remember step N-1, the light tool stops being light and I reach for the heavier one. Short of that, the thing I can read wins.

Collapse
 
xulingfeng profile image
xulingfeng

The "lightest first" rule hit home. We've been running into the exact same trap — reaching for MCP every time something needs to reach outside, when a CLI call from a Skill would do the job at a fraction of the context cost. That 49% MCP tool definitions stat is brutal but real.

The nesting model (Skill → CLI → MCP → Plugin) is the clearest explanation I've seen. The way you framed it — "recipe card vs plumbing vs stocked kitchen" — makes it stick. How do you handle the case where a CLI exists but its output needs structured parsing before it's useful? Do you pipe it through a small script in the Skill, or does that push it into MCP territory?

Collapse
 
rapls profile image
Rapls

Pipe it through a small script in the Skill, every time. A CLI whose output needs parsing doesn't push it into MCP territory on its own. I wrap the command in a tiny script that normalizes the output into a predictable JSON shape, and the Skill works against that shape instead of raw text. Context stays cheap, and the agent gets a stable contract without a server riding every turn.

The thing that would push it to MCP isn't the parsing, it's session state: the day call 2 depends on call 1, or auth has to carry across. Output shape and durable state are different problems, and only the second one earns MCP's weight.

One bonus: that normalize script is also your migration path. If the pattern repeats and you do move to MCP later, the JSON shape you already settled on is basically the schema. The move becomes a promotion of something you built, not a rewrite.

Collapse
 
mininglamp profile image
Mininglamp

The nested mental model is spot on. Biggest mistake devs make is reaching for MCP when a skill file with a shell command would do the same thing in 2 lines. MCP makes sense for persistent state or auth across sessions, but for one-off data fetches its pure overhead. Lightest first saves a lot of debugging time down the road.

Collapse
 
rapls profile image
Rapls

The 2-lines-vs-MCP gap is exactly the trap. A shell command in a Skill file does the job and costs almost nothing at startup, while the MCP rides every turn whether you touch it or not. And you put the boundary in one line better than my whole post did: persistent state or cross-session auth is what MCP is for, and a one-off fetch is pure overhead. That's the split I'd lead with if I rewrote it.

Collapse
 
mudassirworks profile image
Mudassir Khan

the allowed-tools section is the one most people misread. 'does not restrict which tools can run — marks listed ones to skip confirmation' means it is not a security boundary. teams find that out when the skill runs something off the list they assumed was blocked. PreToolUse hooks on Bash are the actual guardrails; allowed-tools is just UX.

on Mykola's point: lightest first and 'surfaces failures cleanly' aren't in tension. a Skill can wrap a CLI call with explicit error handling. the real split is stateful vs stateless: if one call's output needs to influence the next in the same session, MCP earns the context cost. pure reads that reset each run, CLI wins.

what does your deep WordPress dashboard integration look like: MCP over REST API, or a custom wp-cli wrapper?

Collapse
 
rapls profile image
Rapls

You're right to flag allowed-tools, and harder than I put it: it's UX, not a security boundary. PreToolUse hooks on Bash are the actual guardrail, because a hook can fail the action while allowed-tools only skips a confirmation. Something you truly want blocked isn't stopped by leaving it off the list, and that's exactly where teams get surprised. I should have drawn that line sharper in the post.

On the dashboard: it's a wp-cli wrapper, not MCP over REST. The reads reset each run and I touch it occasionally, so a Skill calling wp-cli through Bash stays cheaper, and I normalize the output into a small JSON shape so the agent gets a stable contract without a server riding every turn. If it ever grows session state, that JSON shape is also my migration path into MCP.

And your stateful-versus-stateless split is the line I'd draw now too. Call count was the wrong thing to measure. The day call 2 depends on call 1, MCP earns its cost.

Collapse
 
mnemehq profile image
Theo Valmis

Your lightest-first axis is reach: how much does the thing need to talk to the outside. There's a second axis the table doesn't have, and it changes some of these calls: suggestion versus guarantee. A Skill teaches Claude the steps but can't make Claude follow them, it's advice the model can drop under load or in a long context. For most things that's fine, a style preference, a changelog procedure. It stops being fine when the rule is load-bearing, an architectural invariant the generated code must never cross, because a Skill that gets ignored once is worse than no rule, you believe it's enforced and it isn't. That's where hooks earn their weight: a hook can fail the action, a Skill can only ask nicely. We've been building Mneme around that gap, the rules a codebase can't afford to leave advisory, checked before the change lands instead of described in a procedure file. Same lightest-first instinct, with a "does this need teeth" column added.

Collapse
 
rapls profile image
Rapls

This is the axis my post is missing, and you named it well: reach is about talking to the outside, but suggestion versus guarantee is about whether the rule holds. A Skill teaches the steps and then trusts the model to follow them, which is fine for a changelog procedure or a style preference. It stops being fine when the rule is load-bearing, an invariant the generated code must never cross, because a Skill that gets dropped once is worse than no rule. You believed it was enforced, and it wasn't.

That's the real gap between "described in a procedure file" and "checked before the change lands." A hook can fail the action; a Skill can only ask nicely. I'd add your "does this need teeth" column to the table on the strength of that alone. For load-bearing invariants, lightest-first has to yield to enforceable-first.

Collapse
 
ngoclinh93qt profile image
ngoclinh93qt

The migration point I use is not "second external call" or "fifth external call"; it is whether the workflow needs durable state and a stable contract.

For a read-only lookup, a Skill wrapping a CLI is usually enough. I want the command, output, and failure mode to be visible in the transcript.

For a workflow where call 2 depends on call 1, where auth/session state matters, or where multiple tools need the same schema, MCP starts earning its cost.

One useful middle step is a tiny script that normalizes CLI output into a predictable JSON shape. That keeps the agent workflow cheap, while making the eventual MCP schema obvious if the pattern repeats.

Collapse
 
rapls profile image
Rapls

This is a cleaner statement of the rule than I managed in the post: durable state and a stable contract, not a call count. Counting external calls measured the wrong thing, and you named the right thing.

The normalize-to-JSON middle step is the part I want to underline. A tiny script that turns CLI output into a predictable shape keeps the agent workflow cheap, and it also makes the eventual MCP schema obvious if the pattern repeats. It turns the migration from a rewrite into a promotion of something you already shaped. That's the piece I should have had in the post, and I'm stealing the framing.

Wanting the command, output, and failure mode visible in the transcript is the other half I'd underline. That visibility is exactly what you give up when you jump to MCP too early, and for a read-only lookup it's worth keeping.

Collapse
 
pizza_cat profile image
Pizza Cat

Great framework. I follow a similar "lightest first" approach and it's saved me from over-engineering more times than I can count.
One thing I'd add: CLI tools have an underrated advantage in composability. An MCP server only works within Claude. A Skill only works within Claude Code. But a CLI works in any shell, any CI pipeline, any editor — and you can still wrap it in a Skill or MCP server if you need deeper integration later.
I built my own publishing CLI (content-bridge) exactly this way — CLI first, then a thin wrapper for the tools I use daily. The CLI is the contract; the wrapper is just ergonomics.
Curious — which type have you found best for tools you share with a team vs tools you keep for yourself?

Collapse
 
rapls profile image
Rapls

"The CLI is the contract; the wrapper is just ergonomics" is the part I wish I'd put in the post. Composability is the axis I underweighted. I was ranking by context cost, but you're pointing at portability: a CLI outlives the host it runs in, while a Skill or MCP server is only alive inside the tool that hosts it. Build the CLI and you can re-wrap it later for whatever comes next. The wrapper is disposable, the contract isn't.

On your question, the split fell out almost exactly along that line for me. Tools I keep for myself, I leave as a Skill calling a CLI and never bother wrapping further, because I'm the only caller and I already know the contract. Tools I share with a team, I push down to a plain CLI with a real interface, documented flags, stable output, exit codes that mean something. Reason being, a Skill or MCP wrapper smuggles in assumptions about how I work, and the moment someone else runs it those assumptions break silently. The CLI forces me to write the contract down, which is exactly what a second person needs and a solo me could skip. Shared tools earn the contract; private ones can stay ergonomics.

content-bridge sounds like the right shape. Is its interface stable enough that the daily wrapper almost never changes, or do you find the contract still shifting as you use it?

Collapse
 
chneg_cheng_64b33ab703938 profile image
chneg cheng

Great framework. I follow a similar "lightest first" approach and it's saved me from over-engineering more times than I can count.

One thing I'd add: CLI tools have an underrated advantage in composability. An MCP server only works within Claude. A Skill only works within Claude Code. But a CLI works in any shell, any CI pipeline, any editor — and you can still wrap it in a Skill or MCP server if you need deeper integration later.

I built my own publishing CLI (content-bridge) exactly this way — CLI first, then a thin wrapper for the tools I use daily. The CLI is the contract; the wrapper is just ergonomics.

Curious — which type have you found best for tools you share with a team vs tools you keep for yourself?

Collapse
 
rapls profile image
Rapls

The composability point is the one I'd underline too, and "the CLI is the contract, the wrapper is ergonomics" is a cleaner way to say it than I managed in the post. content-bridge sounds like exactly the right shape.

On shared vs personal, the split I've landed on isn't really about weight, it's about how stable the interface has to be. Anything I hand to other people, I push toward a CLI, because the value is that the contract doesn't move. People wire it into their own shells and pipelines, and a stable stdin/stdout/exit-code surface is what lets them depend on it without me being in the loop. The Skill or MCP wrapper, if it exists, is just the ergonomic layer on top for the people also inside Claude Code.

For tools I keep for myself, I go the other way and stay as light as the post argues. A Skill that only works inside Claude Code is fine, because I'm the only consumer and I can break it tomorrow and rebuild it without warning anyone. There's no contract to honor, so I don't pay the cost of one. The rule I'd phrase it as: the moment a second person depends on a tool, it earns a CLI; until then, the lightest thing that works is the right thing.