I was building a plugin release with Claude Code, and the changelog draft came together nicely. Pull git log from the last tag to now, drop it unde...
For further actions, you may consider blocking this person and/or reporting abuse
when the tool needs to touch anything external or stateful, lightest-first reverses on you. a CLI that silently fails mid-procedure is harder to debug than an MCP that surfaces the failure cleanly. I would rather front-load the complexity.
That's a fair pushback, and I think we're optimizing for two different axes. I was arguing on context cost. You're arguing on failure observability, and on that axis you're right: a CLI that dies mid-procedure with a swallowed exit code is genuinely worse to debug than an MCP that surfaces the error in a structured way.
Where I'd still hold the line is the trigger for front-loading. I'd make it the failure characteristics, not the tool layer. A CLI you call from a Skill can be wrapped to fail loud (check the exit code, surface stderr, stop the procedure), and once you do that, the cheap-context path is still on the table. But for something stateful where a partial failure leaves the world half-changed, I'm with you, the clean failure surface earns its weight, and lightest-first should bend.
Maybe the rule is "lightest first, until the cost of a silent partial failure outweighs the context savings." That moves the decision onto the thing you're actually worried about.
failure observability is the harder axis to retrofit - context cost recovers, a swallowed exit at step 3 of a stateful write doesn't. what's your front-load trigger: state type, or error class?
You're right that observability doesn't retrofit. Context cost is a dial you can turn back later; a swallowed exit at step 3 of a stateful write is a hole you don't find until something's already half-written. They're not symmetric.
To your either/or: my trigger is state type first, error class second, because the two aren't independent. What makes a failure unrecoverable isn't the error itself, it's whether the operation mutated shared state before it died. A silent failure in a read is an annoyance; the same silent failure mid-write is a half-changed world. So I'd gate on reversibility: does a partial failure leave something committed that I can't cleanly roll back. If yes, that's where I front-load the structured-failure tool, regardless of error class. Error class only changes how loud the failure is; state type decides how much it costs. I optimize for the cost.
Put plainly: stateless or idempotent, stay light and let it fail loud. Mutating and non-idempotent, front-load the clean failure surface. The write is the thing that earns the complexity, not the external call.
state type first is right. most error classes look recoverable until you ask whether anything has been written - at that point the error taxonomy is a distraction and you are just triaging state.
"You're just triaging state" is the right place to land this. That's the sentence I'll keep. Error taxonomy feels like the question because it's visible and structured, but it's downstream of the only thing that actually matters: did this touch state I can't take back. Answer that first and the error class is just detail about a problem you've already classified.
Which loops back to the original rule in a way I like. "Lightest first" was never really about the tool, it was about reversibility all along. Stay light where a failure is recoverable, front-load structure where it isn't. The whole thread basically refactored my context-cost argument into a state-recoverability one, and the second framing is better. Good exchange.
The "nested, not a flat choice" framing is the piece I keep coming back to. Most comparison posts treat them like three columns in a table — which misses the point entirely.
The changelog example lands well because it's the exact reason I reach for a Skill first too: if the knowledge is static, there's no reason to involve MCP plumbing. But the tricky part (and the one I don't have a clean answer for yet) is when a Skill starts as static and gradually needs live data. At what point do you refactor — when the second external call appears, or the fifth?
Your lightest-first rule feels right, but the migration path between layers is where I keep hesitating.
The "three columns in a table" trap is exactly what I was trying to get out of. Once you see the layers as nested, the question stops being "which is best" and becomes "which layer does this problem live in," and that reframing is most of the work.
On the migration point, the one you keep hesitating at: I'd stop counting calls. Second versus fifth measures the wrong thing. The trigger I'd use now is whether the workflow needs durable state and a stable contract: does call 2 depend on call 1, does auth or session carry across, do several tools need the same schema. A static Skill that grows three independent read-only fetches is still a Skill. The day one fetch has to feed the next, that's the refactor signal, regardless of the count.
One middle step makes the move painless: wrap the CLI output in a tiny script that normalizes it into a predictable JSON shape. That keeps the Skill cheap now, and if the pattern repeats, that shape is already the blueprint for the MCP schema. The migration becomes a promotion of something you shaped, not a rewrite.
Yo that CLI normalization tip you dropped? Tried it. Works. Running before scheming is so much nicer than MCP upfront. And vice versa too — when MCP gets annoying to touch, CLI fallback > rewrite any day.
Made my day that you actually ran it. "Running before scheming" is a better tagline for the whole post than the one I used.
And the reverse you found is the half I undersold: CLI fallback over rewrite. We treat the migration as one-directional, Skill grows up into MCP, but it goes both ways. When an MCP gets annoying to touch, dropping back to a CLI call is almost always cheaper than maintaining the server. Lightest-first isn't just where you start, it's where you return to when the heavy thing stops paying for itself. Thanks for closing that loop.
Lightest-first is the rule I wish I'd started with. A CLI you can read in a minute beats an MCP server you end up debugging at 2am. I only reach for the heavier thing once the light one clearly can't do the job.
The 2am-debugging line is exactly it. Readability is a cost too, and the light tool wins on that axis before you even count tokens. A CLI you can read in a minute is one you can also fix in a minute, half-asleep. I'd only add your own caveat back at you: "clearly can't do the job" is the right trigger, and the honest version of it is stateful work, where one call has to feed the next. Until then, the thing you can read wins.
That stateful-work line is the cleanest version of the rule I've seen. The minute step N has to remember step N-1, the light tool stops being light and I reach for the heavier one. Short of that, the thing I can read wins.
The "lightest first" rule hit home. We've been running into the exact same trap — reaching for MCP every time something needs to reach outside, when a CLI call from a Skill would do the job at a fraction of the context cost. That 49% MCP tool definitions stat is brutal but real.
The nesting model (Skill → CLI → MCP → Plugin) is the clearest explanation I've seen. The way you framed it — "recipe card vs plumbing vs stocked kitchen" — makes it stick. How do you handle the case where a CLI exists but its output needs structured parsing before it's useful? Do you pipe it through a small script in the Skill, or does that push it into MCP territory?
Pipe it through a small script in the Skill, every time. A CLI whose output needs parsing doesn't push it into MCP territory on its own. I wrap the command in a tiny script that normalizes the output into a predictable JSON shape, and the Skill works against that shape instead of raw text. Context stays cheap, and the agent gets a stable contract without a server riding every turn.
The thing that would push it to MCP isn't the parsing, it's session state: the day call 2 depends on call 1, or auth has to carry across. Output shape and durable state are different problems, and only the second one earns MCP's weight.
One bonus: that normalize script is also your migration path. If the pattern repeats and you do move to MCP later, the JSON shape you already settled on is basically the schema. The move becomes a promotion of something you built, not a rewrite.
The nested mental model is spot on. Biggest mistake devs make is reaching for MCP when a skill file with a shell command would do the same thing in 2 lines. MCP makes sense for persistent state or auth across sessions, but for one-off data fetches its pure overhead. Lightest first saves a lot of debugging time down the road.
The 2-lines-vs-MCP gap is exactly the trap. A shell command in a Skill file does the job and costs almost nothing at startup, while the MCP rides every turn whether you touch it or not. And you put the boundary in one line better than my whole post did: persistent state or cross-session auth is what MCP is for, and a one-off fetch is pure overhead. That's the split I'd lead with if I rewrote it.
the
allowed-toolssection is the one most people misread. 'does not restrict which tools can run — marks listed ones to skip confirmation' means it is not a security boundary. teams find that out when the skill runs something off the list they assumed was blocked.PreToolUsehooks on Bash are the actual guardrails;allowed-toolsis just UX.on Mykola's point: lightest first and 'surfaces failures cleanly' aren't in tension. a Skill can wrap a CLI call with explicit error handling. the real split is stateful vs stateless: if one call's output needs to influence the next in the same session, MCP earns the context cost. pure reads that reset each run, CLI wins.
what does your deep WordPress dashboard integration look like: MCP over REST API, or a custom
wp-cliwrapper?You're right to flag allowed-tools, and harder than I put it: it's UX, not a security boundary. PreToolUse hooks on Bash are the actual guardrail, because a hook can fail the action while allowed-tools only skips a confirmation. Something you truly want blocked isn't stopped by leaving it off the list, and that's exactly where teams get surprised. I should have drawn that line sharper in the post.
On the dashboard: it's a wp-cli wrapper, not MCP over REST. The reads reset each run and I touch it occasionally, so a Skill calling wp-cli through Bash stays cheaper, and I normalize the output into a small JSON shape so the agent gets a stable contract without a server riding every turn. If it ever grows session state, that JSON shape is also my migration path into MCP.
And your stateful-versus-stateless split is the line I'd draw now too. Call count was the wrong thing to measure. The day call 2 depends on call 1, MCP earns its cost.
Your lightest-first axis is reach: how much does the thing need to talk to the outside. There's a second axis the table doesn't have, and it changes some of these calls: suggestion versus guarantee. A Skill teaches Claude the steps but can't make Claude follow them, it's advice the model can drop under load or in a long context. For most things that's fine, a style preference, a changelog procedure. It stops being fine when the rule is load-bearing, an architectural invariant the generated code must never cross, because a Skill that gets ignored once is worse than no rule, you believe it's enforced and it isn't. That's where hooks earn their weight: a hook can fail the action, a Skill can only ask nicely. We've been building Mneme around that gap, the rules a codebase can't afford to leave advisory, checked before the change lands instead of described in a procedure file. Same lightest-first instinct, with a "does this need teeth" column added.
This is the axis my post is missing, and you named it well: reach is about talking to the outside, but suggestion versus guarantee is about whether the rule holds. A Skill teaches the steps and then trusts the model to follow them, which is fine for a changelog procedure or a style preference. It stops being fine when the rule is load-bearing, an invariant the generated code must never cross, because a Skill that gets dropped once is worse than no rule. You believed it was enforced, and it wasn't.
That's the real gap between "described in a procedure file" and "checked before the change lands." A hook can fail the action; a Skill can only ask nicely. I'd add your "does this need teeth" column to the table on the strength of that alone. For load-bearing invariants, lightest-first has to yield to enforceable-first.
The migration point I use is not "second external call" or "fifth external call"; it is whether the workflow needs durable state and a stable contract.
For a read-only lookup, a Skill wrapping a CLI is usually enough. I want the command, output, and failure mode to be visible in the transcript.
For a workflow where call 2 depends on call 1, where auth/session state matters, or where multiple tools need the same schema, MCP starts earning its cost.
One useful middle step is a tiny script that normalizes CLI output into a predictable JSON shape. That keeps the agent workflow cheap, while making the eventual MCP schema obvious if the pattern repeats.
This is a cleaner statement of the rule than I managed in the post: durable state and a stable contract, not a call count. Counting external calls measured the wrong thing, and you named the right thing.
The normalize-to-JSON middle step is the part I want to underline. A tiny script that turns CLI output into a predictable shape keeps the agent workflow cheap, and it also makes the eventual MCP schema obvious if the pattern repeats. It turns the migration from a rewrite into a promotion of something you already shaped. That's the piece I should have had in the post, and I'm stealing the framing.
Wanting the command, output, and failure mode visible in the transcript is the other half I'd underline. That visibility is exactly what you give up when you jump to MCP too early, and for a read-only lookup it's worth keeping.
Great framework. I follow a similar "lightest first" approach and it's saved me from over-engineering more times than I can count.
One thing I'd add: CLI tools have an underrated advantage in composability. An MCP server only works within Claude. A Skill only works within Claude Code. But a CLI works in any shell, any CI pipeline, any editor — and you can still wrap it in a Skill or MCP server if you need deeper integration later.
I built my own publishing CLI (content-bridge) exactly this way — CLI first, then a thin wrapper for the tools I use daily. The CLI is the contract; the wrapper is just ergonomics.
Curious — which type have you found best for tools you share with a team vs tools you keep for yourself?
"The CLI is the contract; the wrapper is just ergonomics" is the part I wish I'd put in the post. Composability is the axis I underweighted. I was ranking by context cost, but you're pointing at portability: a CLI outlives the host it runs in, while a Skill or MCP server is only alive inside the tool that hosts it. Build the CLI and you can re-wrap it later for whatever comes next. The wrapper is disposable, the contract isn't.
On your question, the split fell out almost exactly along that line for me. Tools I keep for myself, I leave as a Skill calling a CLI and never bother wrapping further, because I'm the only caller and I already know the contract. Tools I share with a team, I push down to a plain CLI with a real interface, documented flags, stable output, exit codes that mean something. Reason being, a Skill or MCP wrapper smuggles in assumptions about how I work, and the moment someone else runs it those assumptions break silently. The CLI forces me to write the contract down, which is exactly what a second person needs and a solo me could skip. Shared tools earn the contract; private ones can stay ergonomics.
content-bridge sounds like the right shape. Is its interface stable enough that the daily wrapper almost never changes, or do you find the contract still shifting as you use it?
Great framework. I follow a similar "lightest first" approach and it's saved me from over-engineering more times than I can count.
One thing I'd add: CLI tools have an underrated advantage in composability. An MCP server only works within Claude. A Skill only works within Claude Code. But a CLI works in any shell, any CI pipeline, any editor — and you can still wrap it in a Skill or MCP server if you need deeper integration later.
I built my own publishing CLI (content-bridge) exactly this way — CLI first, then a thin wrapper for the tools I use daily. The CLI is the contract; the wrapper is just ergonomics.
Curious — which type have you found best for tools you share with a team vs tools you keep for yourself?
The composability point is the one I'd underline too, and "the CLI is the contract, the wrapper is ergonomics" is a cleaner way to say it than I managed in the post. content-bridge sounds like exactly the right shape.
On shared vs personal, the split I've landed on isn't really about weight, it's about how stable the interface has to be. Anything I hand to other people, I push toward a CLI, because the value is that the contract doesn't move. People wire it into their own shells and pipelines, and a stable stdin/stdout/exit-code surface is what lets them depend on it without me being in the loop. The Skill or MCP wrapper, if it exists, is just the ergonomic layer on top for the people also inside Claude Code.
For tools I keep for myself, I go the other way and stay as light as the post argues. A Skill that only works inside Claude Code is fine, because I'm the only consumer and I can break it tomorrow and rebuild it without warning anyone. There's no contract to honor, so I don't pay the cost of one. The rule I'd phrase it as: the moment a second person depends on a tool, it earns a CLI; until then, the lightest thing that works is the right thing.
Lightest first is a good default. If a checklist plus a CLI command solves the workflow, jumping straight to MCP or a plugin can add maintenance before there is enough proof.
The "before there is enough proof" part is the one I'd underline. MCP and Plugins are machinery, and machinery is worth building for a pattern you've proven you'll repeat, not for one you're guessing at. A checklist plus a CLI command costs almost nothing to keep, so it's the right place to sit until the workflow proves it needs more. Build the heavy thing once the repetition shows up, not before.
Exactly. The lightweight version is a cheap test of whether the workflow deserves machinery. If the same checklist keeps coming back, then a CLI or MCP has evidence behind it. Otherwise it is just architecture before the habit exists.
"A cheap test of whether the workflow deserves machinery" is the framing I'll keep. The lightweight version isn't a compromise you settle for, it's the experiment that earns the heavy version, and the repetition is the evidence. "Architecture before the habit exists" is the exact failure mode, building the structure for a pattern you imagined instead of one you observed. Light first, and let the habit prove itself before you pour concrete. Good exchange.
This “lightest first” rule is probably the most practical way to think about it.
Most teams jump to MCP too early. If it is just procedure, use a Skill. If it is occasional tool access, call a CLI. MCP should be for repeated, stateful, or deeper integrations.
You compressed the whole post into four lines, and the last one is the key: MCP for repeated, stateful, or deeper integrations. Procedure goes to a Skill, occasional access to a CLI, and the day the work needs durable state is the day MCP earns its cost. "Most teams jump to MCP too early" is the exact trap, and it usually happens because the heavy tool is the one that catches your eye first. Reach for it last, not first.