DEV Community

cucoleadan
cucoleadan

Posted on • Originally published at vibestacklab.substack.com on

When to Use MCPs, CLIs, or Your Own Tool

This post was originally published on my Substack publication as When to Use MCPs, CLIs, or Your Own Tool.

A while back, I wanted my AI agent to help manage my Asana tasks. Like anyone following the current agent meta, my first instinct was to plug in an Asana MCP server. Of course, this either flat-out broke or took an eternity to load a single task because the agent was trying to digest a massive, complicated integration.

Frustrated, I ripped the MCP out and installed a lightweight Asana CLI instead. It took a little bit of setup, but it worked. I took it one step further and created a custom skill teaching my agent exactly how to trigger those specific CLI commands. Checking my tasks went from a sluggish, bloated mess to happening instantly. I detailed this setup in my morning automation guide.

That experience explains why the default advice in agent-land right now, to connect every integration you can find and sort it out later, is a trap.

I get why people do it. Plug-and-play tools are everywhere right now. Every week another company ships one, another app exposes itself to AI, and another setup thread turns into a shopping list. An agent with more tools feels more capable the same way a dashboard with more widgets feels more complete.

The friction starts soon after. You notice the agent taking longer to think because it's trying to juggle too many complex instructions at once. Simple tasks start driving up your token costs. One tool fails with a timeout, another dumps a wall of messy data when you only needed a single sentence, and eventually, you lose track of what your own setup can do on its own.

A technical illustration of a decision framework comparing CLI, MCP, and custom tools for AI agents.

In my Hermes setup, I rely on three distinct patterns. A lightweight GitHub CLI handles my repository work because it's fast and focused. The Brave Search MCP handles broad web research. My custom OpenCode Cowork Proxy Worker exists because neither an off-the-shelf integration nor a basic command line was the right fit for routing Claude through OpenCode models.

There's a fine line between over-integrating and building everything yourself. I touched on this balance in my build vs buy scorecard. How do you know which type of tool fits which job? Read on to see how to decide.

TL;DR: When deciding how to connect your AI agent:

  • Use CLIs for local, internal tasks where speed matters and you own the credentials.
  • Use MCPs to cross boundaries into external SaaS systems where structured data and secure auth are required.
  • Build Custom Wrappers when you need translation, formatting, or a narrower interface than what off-the-shelf tools provide.

In this edition:

  1. Why MCP vs CLI is the wrong argument
  2. When a CLI is the better interface for Hermes
  3. When an MCP server earns its place
  4. When your own small tool beats both
  5. The 60-second test I use before adding a new tool

MCP vs CLI: Asking a Better Question

Most MCP vs CLI arguments sound cleaner than the real problem. People talk about protocols, tokens, and elegance. When you are in the middle of actual work you are usually trying to answer a simpler question. You want to know the least messy way to let your agent do this one job.

An MCP server gives an AI app a standard way to discover and call external tools. It exposes actions, inputs, and outputs in a format the model-facing app understands. Anthropic introduced MCP in November 2024 as an open standard for connecting AI assistants to data sources, business tools, content repositories, and developer environments.

A CLI gives the agent the same command-line tool a human developer would use. Think git, gh, docker, kubectl, wrangler, gws, or a tiny script you wrote for your own stack. The model writes commands, reads stdout or stderr, and adjusts from there.

Both let an agent act, but they package control differently. MCP gives the agent a typed menu of actions with structured inputs. CLI gives the agent a terminal surface with familiar commands and visible output.

The filter I use is simpler than the debate. Look at where the work happens, who owns the data, and what breaks when the agent gets it wrong. Use a CLI when the agent works as you inside your own workspace. Use an MCP server when the agent needs structured access to external systems or authenticated data. Build your own tool when MCP is too broad, CLI is too loose, or the workflow needs a narrow bridge between two systems.

The custom option matters more than people admit because many agent problems are shape problems, not model problems. A full protocol server or open shell gives the workflow too much room to drift. One small action with the right inputs, rejection rules, and clean output often fits better.

When to Start With CLI for Local AI Workflows

I start with CLI far more often than MCP. If Hermes works inside my own environment, on local files and repos, build commands, deployment checks, server diagnostics, GitHub tasks, or small scripts where I already know the command, the terminal is usually the right first stop.

A command-line tool has a structural advantage here. Most frontier models have years of examples for common command-line patterns. They know how git status behaves, how gh pr list --json returns fields, and how to trim output before the context window fills up.

Local work becomes easier to debug. When a CLI command fails, Hermes gets an exit code and an error message. I rerun the same command myself, copy it into a terminal, and see the failure without translating through a protocol layer.

Scalekit ran a useful benchmark on this in March 2026. They compared CLI, CLI plus skills, and GitHub's MCP server across 75 runs using the same model and the same GitHub tasks. In their test, CLI won on cost and reliability. CLI hit 100 percent reliability, MCP completed 72 percent of runs, and MCP used 4 to 32 times more tokens depending on the task.

I wouldn't stretch that benchmark into a universal law. It tells us something narrower and more actionable: schema weight is real. If the agent connects to a GitHub MCP server with dozens of available tools, it carries descriptions for actions it will never touch during a repo language lookup. A local gh command gives the answer with less ceremony.

StackOne makes the same split from an architecture angle. CLI fits local developer tools like Git, Docker, gh, Terraform, kubectl, and AWS CLI because these tools already have mature command cultures around them. The agent reuses patterns baked into the model and the docs instead of learning a strange new interface from scratch.

My Hermes setup leans on CLI for repo work. If I ask Hermes to clean up a branch, summarize open PRs, or check the status of a deploy, I want it using tools I run myself. I use gh for GitHub, wrangler for Cloudflare Workers, and gws for narrow Google Workspace experiments.

The trade-off is permission shape. Most CLI tools inherit local credentials. Hermes using gh after gh auth login acts with my GitHub access. That works for my own repo on my own machine, then breaks fast once a product needs to act across other users, accounts, or shared business systems.

One user on one machine inside one workspace is CLI territory. Many users across many accounts turns CLI into complex auth plumbing.

An architectural diagram comparing CLI local access versus MCP networked access for AI agents.

When to Reach For MCP At The Boundary of External Data

I don't reach for MCP first. I reach for it when the data lives somewhere else and I want Hermes to touch it without wandering around with raw shell access.

Search tools, shared SaaS systems, business databases, internal APIs, and third-party services need scoped auth, audit logs, and structured actions more than terminal speed.

A command-line tool assumes a person already logged in. That person owns the machine, the credentials, and the risk. An MCP server exposes a narrower set of actions to the agent with defined inputs and outputs. It gives the agent a tool boundary instead of raw shell access.

Anthropic's original MCP pitch makes sense through that lens. Every agent stack eventually hits the same wall: the model works well, but the data lives outside its reach. MCP gives AI systems a standard way to connect to those data sources without every app inventing its own format.

I use Brave Search via MCP because research is external, variable, and structured. I want Hermes calling a defined search action with a defined result format instead of guessing URLs or scraping pages with shell commands.

SaaS tools often fit the same pattern. If Hermes needs to read from Notion, Gmail, Linear, Slack, Greenhouse, or a database with scoped access, MCP is cleaner than a homemade CLI script. The farther the workflow moves from your own machine, the more identity matters.

Descope puts the identity question well: choose based on who the agent works for. If the agent acts as a solo developer inside their own workflow, CLI is enough. If the agent acts across customer data, employee accounts, partner systems, or shared business tools, auth becomes the primary concern.

At that point, you care about scopes, consent, logs, tenant boundaries, and revocation. One ambient shell token doing everything in the background becomes a liability, even if it feels faster during local tests.

Every MCP server still has to earn its place. A bloated server slows the agent down, a badly designed one returns excess data, and a broad action list hands the agent more control than the task needs. The best MCP servers feel boring: a small list of tools, clear input fields, tight output, and an auth model that matches the risk.

You want a clean tool drawer, not a giant toy box.

Build The Bridge Yourself

This sounds like extra work at first. Then you try to force a bad fit through MCP or CLI for hours and realize the small custom tool would've been the simpler path all along.

By small tool, I mean a tiny adapter, wrapper, Worker, script, webhook, or endpoint that does one job in the exact shape your workflow needs.

I used this lane for my OpenCode Cowork Proxy Worker. Claude Code speaks Anthropic's API format. OpenCode Go and Zen models mostly use OpenAI-compatible routes. I wanted Claude Code and Claude Cowork as the interface, with OpenCode as the model layer. A generic MCP server or raw CLI would've made the flow messier.

The workflow lacked translation, so I built a Cloudflare Worker that sits in the middle. Claude sends an Anthropic-style request. The Worker rewrites it for OpenCode. The response returns in the format Claude expects. That is a custom tool doing its job by removing ambiguity.

I wrote the full setup in How to Use Claude Code For Free With OpenCode Models. For this article, the decision matters more than the proxy details. When the workflow needs a translation layer, build the translation layer.

Safer wrappers around risky commands follow the same pattern. Say Hermes needs to deploy a project. One path gives it raw shell access and asks it to remember the right sequence. The better path gives it one command:

deploy-preview --project yahini
Enter fullscreen mode Exit fullscreen mode

That command runs checks, prints the diff, refuses production deploys without a flag, and outputs a clear summary. Hermes gets one safe action instead of an open-ended terminal adventure.

Task tools work the same way. My Asana setup has three possible shapes: raw API calls, MCP, or a small wrapper:

asana-task create --project hermes --title "Research MCP auth tradeoffs" --due tomorrow
Enter fullscreen mode Exit fullscreen mode

That wrapper hides the noisy parts. Hermes gets the project, title, and due date without carrying the project GID, JSON payload shape, or field rules in every prompt. The tool encodes the boring decisions once.

An engineering diagram showing an AI agent workflow using custom tools and approval gates.

Custom tools pay off when the existing interface adds ambiguity. A translator fixes format mismatch, a filter trims redundant output, and a validator stops bad inputs before they reach the real system. The common thread is narrower access to the machinery underneath.

Approval gates fit on top of this lane. Your custom tool prepares a draft, validates inputs, or creates a preview. The final send, publish, delete, deploy, or purchase still pauses for review. I covered the safety layer in How to Add Approval Gates to Your Hermes Agent, and it pairs well with custom tools because the interface and approval rule solve different problems.

A custom tool gives Hermes the right action. An approval gate decides whether Hermes gets to complete it alone.

The Rule I Use

Before giving Hermes a new way to act, I sort the workflow into one of three lanes. The categories are plain enough to use while building, which matters more to me than making the taxonomy perfect.

Use CLI for local work. Choose CLI when the tool is mature, the docs are everywhere, the output is controllable, and Hermes is acting inside your own environment. Good fits include GitHub PR summaries through gh, Cloudflare Worker deploy checks through wrangler, local file operations, build commands, server diagnostics, and one-off scripts. If I would run the command myself in a terminal, and the worst mistake affects my own workspace, CLI is the starting point.

Use MCP for structured external systems. Choose MCP when Hermes needs a defined tool boundary, scoped auth, a remote data source, or runtime tool discovery. Search, Google Drive, Slack, Gmail, CRM data, ATS data, internal databases, and shared business tools fit this lane when permissions and structure matter. If the agent touches data outside your own local workspace, MCP deserves a look.

Build your own tool when the job is narrow. Choose a custom tool when the problem is translation, filtering, validation, or repeatability. API format translation, safer deploy wrappers, task helpers, memory update commands, webhook receivers, cronjob helpers, and scripts compressing risky command chains into one reviewed action fit here. If you keep writing long prompts to make the agent use a tool in the same careful way, that prompt wants to become a tool.

Concrete Hermes examples make the rule easier to apply. GitHub PR cleanup goes through CLI because gh is mature and easy to inspect. Competitive research goes through MCP because search needs structured external results. Morning briefings use connectors or MCP for sources like Gmail and calendars, then a prompt or custom formatter turns those inputs into the briefing structure I covered in my Hermes morning briefing article.

For higher-risk work, I mix lanes. Production deploys should use CLI wrapped in a custom command, plus an approval gate before production. Claude Code to OpenCode routing belongs in a custom Worker. Project memory updates from research should use a custom command or proposed-change format, then pause for review before permanent memory changes.

That last one matters because wrong memory is worse than no memory. If Hermes reads a weak article and updates project memory with a sloppy summary, I pay for that mistake later. A custom "propose memory update" tool is safer than letting the agent edit memory directly.

60-Second Tool Test

Before adding a new MCP server or writing a wrapper, run this test. It takes about a minute, and it saves an afternoon of cleanup.

1. Check for a mature CLI. If the tool has a strong CLI, structured output flags, and common examples in the docs, start there. The agent gets a smaller surface to reason through, and you get commands worth replaying.

2. Check whether the agent acts only as you. If Hermes works inside your own machine, repo, or server, CLI works well. Slow down when the workflow crosses into shared systems.

3. Check whether auth shape matters. MCP moves up the list when you need scopes, consent, tenant boundaries, or audit logs. Local credentials are convenient until the agent needs to act inside a shared business system.

4. Check whether the MCP is too broad. If a server exposes fifty actions and your workflow needs two, consider a custom wrapper or filtered gateway. A smaller interface beats a bigger config when the task has a narrow shape.

5. Check whether a small tool would remove repeated prompting. If your instruction keeps repeating the same safety rules and formatting rules, build a tool that enforces the shape. Repeated prompting points to an interface problem.

After the test, the answer usually sorts itself. CLI handles local work, MCP handles structured external systems, and your own tool handles narrow bridges, translations, and repeatable actions. Approval gates sit on top of all three when the action is expensive, destructive, external-facing, or hard to undo.

This is the shift I wrote about in The Agentic Engineering Shift. The work is moving from asking the model better to designing the system around the model better. Tool choice is part of that system.

More Control, Less Clutter

A well-designed agent stack earns trust when the agent knows what to do, shows what it did, and uses the smallest interface that fits the work. MCP hype tends to blur that distinction because installing another server feels like progress.

A systems design diagram sorting AI tasks into CLI, MCP, and custom tool hierarchies.

MCP is useful. CLI is underrated. Your own small tools save you from both when the workflow has a shape neither one matches.

Start by auditing one workflow. Pick the last task you gave Hermes that involved a tool and ask which lane it belonged in. If the task was local, try CLI. If it crossed into shared systems, look at MCP. If you kept explaining the same careful sequence over and over, build the tiny tool. Use an agent you trust because every interface has a reason to exist.

Top comments (0)