Ryan Palo for Daily Context

Posted on Jul 3

Choosing the Right Tooling Layer for Your Agent

#aie #agents #ai

AI Engineer World's Fair Coverage

Selecting the right abstraction layer is not a new problem in software. It's common to have some experimental restructuring to find the right balance between being abstract enough to consolidate all the duplication that belongs together and going too far, making users jump through hoops to use your overly abstracted MiddlewareManagerAbstractFactoryProvider. There are entire books and undergraduate courses on exactly this problem: what level of abstraction is enough?

We now have a similar analog in the space of how we help our AI agents perform tasks. Do we need an MCP for that capability? A skill? Will a simple CLI tool work? The answer depends on a few questions.

Note: For more context, check out Nikita Kothari's AIE World's Fair talk from Thursday, "MCPs, CLIs, and Skills: Choosing the Right Tooling Layer for Agentic Development."

All the MCPs: The Super-Agent Approach

One option would be to load every possible MCP you might ever need access to at startup. This has the benefit of discoverability. If you're not sure what the agent might need to do and you want it to be as unattended as possible, giving access to all of those tool descriptions allows it to be flexible and combine sources and tools as needed. It also allows it to branch out and parallelize, gathering data from multiple sources at the same time.

There's a sneaky trap here, though: MCP server schemas may gobble up your context window. Nikita references example numbers where the MCP schemas for 55 tools from five MCP servers could take up about 16,500 tokens that live in your context constantly.

Skills and CLIs: The Developer's Approach

This is the full opposite end of the spectrum from the previous approach: make your agent aware of any skills that it needs and use those to instruct it on which CLI tools to use and how. Context- and speed-wise, CLI tools have a huge advantage over MCP tools. There's no extra network call to the MCP server, they eat next to no context, and the output of a CLI tool is reasonably predictable. Additionally, if the agent is running locally, you get the benefit of being familiar with its environment and being able to step in and help it along if things get stuck.

The drawback here is security. CLI tools are one of the highest-risk tools for an agent to use because, by definition, if it can access a CLI tool, it can access the shell itself. It's not an unmanageable risk; with careful sandboxing and attention to permissions, the risk can be managed. You also have to be cognizant of what sort of credential issues you may be signing on for. You need to be aware of what credential files the agent has access to, and whether the roles for those credentials are appropriate for its actions.

Why Not Both: The Hybrid Approach

If you have enough MCP servers and tools that it's eating enough of your context to be problematic, you can combine both of the above: bundle your MCP tool usage into skills to avoid the token bloat. Modern clients don't necessarily bundle the entire schema into the context if it's large enough. They do a mixture of caching and compaction/indexing so that they inject just the bare minimum and then load the actual tool schema as needed. If you wrap your process in a skill that describes exactly what tools it needs and how to use them, your client can unpack just the schemas for those tools more fully, saving you context and a bit of load time. This is often the best choice if it warrants the little bit of extra configuration work.

It Depends

So which one should you stick to as the right choice? We started with a storied computer science question, and we're ending with a storied computer science answer: it depends. It depends on what the environment is, what kind of tasks you're trying to accomplish, whether there are CLI tools for the functionality you need, and what your organizational auth/governance requirements are. MCPs are definitely more secure and more governable. CLIs are definitely faster and more reliable. The answer probably lies somewhere in the middle for your use case. The main takeaway here may actually be that the ability to write and maintain really good Skills is a huge advantage.

If you want to do more research into this topic, the Firecrawl blog has a much more comprehensive deep dive into the differences, benefits, and drawbacks of each approach.

Top comments (6)

Armorer Labs • Jul 3

The useful distinction for me is less "which layer is best" and more "which layer owns the operational contract." MCP, a skill file, and a CLI can all expose capability, but they should not all be trusted to carry the same safety semantics.

For agent tools that can mutate state, spend money, touch customer data, or trigger retries, I would want the runtime layer to own a few boring fields regardless of the surface: normalized input schema, side-effect class, approval policy, idempotency key, timeout/budget, provenance of inputs, and a receipt that says what actually happened. The model-facing layer can stay lightweight and ergonomic, but the operational layer needs enough structure that an operator can inspect or replay the run later.

That also helps with the context-tax problem you describe. The agent does not need every full tool schema in prompt context all the time; it needs a small routing view first, then the runtime can reveal or bind the exact contract only when the candidate tool is actually in play.

Disclosure: I work on Armorer Labs.

CAI • Jul 10

The token bloat point is real. Once you cross about 10 MCP servers, the schema footprint becomes the thing that dictates whether your context window is usable or not. Nikitas 16,500 token example is not even worst case if you load servers with very large schemas.

The hybrid approach he mentions is where this gets interesting. Bundling MCP tools into skills so the client only loads schemas on demand is a good pattern, but it relies on the client actually supporting that kind of lazy loading. Not all do yet.

One angle that does not get enough attention is tool discovery latency. MCP adds a network roundtrip for every tool call, which matters when your agent is doing dozens of small operations in a loop. CLI tools win on latency even if the expressiveness is lower. For agents that need to iterate fast, that difference adds up.

The security tradeoff is also worth being explicit about. MCP servers run as separate processes with their own permission surface, which is easier to sandbox than giving an agent arbitrary shell access through CLI tools. If your org already has auth infrastructure for web APIs, MCP slots in more naturally than trying to lock down which CLI flags an agent is allowed to invoke.

Choosing the layer is not just a context size question. It is also a latency, security, and operability question, and those answer differently depending on whether the agent runs locally or in a hosted environment.

Mykola Kondratiuk • Jul 5

the analogy holds but with one difference - devs can work around bad abstractions, agents mostly can not. wrong boundary compounds across every call. the cost of getting this wrong is higher than with regular software.

Nazar Boyko • Jul 3

That 16,500 tokens sitting in context permanently for 55 tools is the number that reframes the whole thing for me. It turns "which layer is nicer" into a running tax you pay on every single turn whether the agent touches those tools or not. One angle the hybrid section hints at but doesn't say outright: the CLI security worry and the MCP context worry pull in opposite directions, so the real design question is often "how do I get CLI speed without handing over the whole shell," and a narrow wrapper script the agent calls instead of raw bash access covers a surprising amount of that. Good breakdown of a tradeoff people mostly discover by running out of context.

Raju Dandigam • Jul 3

The framing here is useful because it treats MCPs, skills, and CLIs as different tradeoffs instead of pretending one layer should win every time. The context tax from loading too many tool schemas up front is real, and I like that you make it concrete rather than hand-wavy. In practice I’ve found the sweet spot is often skills plus focused CLIs by default, then MCP only where you truly need live structured access and a richer contract. That same tradeoff is why I care so much about execution visibility in tools like agent-inspect: once an agent has multiple layers available, you need to see which path it chose and why. Curious whether you think teams should optimize first for agent flexibility or for keeping the default tool surface aggressively small.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.