DEV Community

Cover image for Add a CLI to Your App or Watch Claude Code Ping You on Every Feature
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

Add a CLI to Your App or Watch Claude Code Ping You on Every Feature

CLI is the new MCP. Slogan aside: CLI is super powers handed to Claude, Codex, and every agent that codes for you. Letting coding LLMs verify their own work programmatically gives them an unfair advantage over classic fullstack apps that didn’t ship that surface.

CLI is the new full stack.

TLDR: Two apps, same modern stack, a 1.8x gap in commits shipped over 30 days. The gap doesn't come from the AI, the framework, or the backend, but from a layer that "stack 2026" guides forgot, and that your scripts/ folder won't replace.

The Friday night before I left for the Costa Brava, I wanted to ship one small thing before closing the laptop (without my wife yelling at me).

Two Claude Code windows side by side. Same prompt to each. Two different apps, same stack at 95%.

By the time I shut the laptop, one app had shipped a feature in three autonomous iterations. The other had me clicking in the admin like a junior trying to debug prod from a phone. Same agent, same me, one folder of difference.

Same Stack, One Folder of Difference

First app, left window. Claude Code writes the mutation, types a command into its terminal, reads the JSON that comes back, spots that one field is wrongly cast, fixes it, retypes. Three iterations in total autonomy. I come back, say “have a good weekend”. The commit is ready, validated at 100%.

Second app, right window. Claude Code writes the mutation, then stops. It pings me. Can you open the admin and check that this works? I click. I refresh, then re-click. Re-ping. My wife starts raising her voice. Whatever, we'll see Monday. Otherwise I would have spent the entire evening being the mouse of an agent that was supposed to code in my place.

Same stack, same me, same Claude Code on both. The only difference fits in one folder.

The two apps are mine. One is a back-office that syncs a WooCommerce catalog through a partner API every night, plus a weekly CSV feed from a distributor. The other is a back-office piloting a network of WooCommerce client e-shops (deploys, theme updates, plugin sync, the usual fleet thing). Both built since six months. Both running Next.js, Convex, shadcn, Vercel. Same CLAUDE.md, same conventions.

One has a CLI as its central layer. The other has a scripts/ folder.

That's it. That's the whole gap.

By "CLI" I mean a real entrypoint with sub-commands named after business actions (catalog refresh, partner sync, site init), wired into the exact same business layer that the dashboard uses. You type bun run cli partner sync --dry-run, and the same code path that runs when an admin clicks "Sync" runs, except it returns JSON to stdout.

The other app has none of that. Just .mjs files with names like fix-thing-2025-08.mjs (admit it, you have a folder like that too). Each one written to "pass once". Most of them never ran a second time.

That's the entire difference. And it changed how the agent worked at every level.

30 Days of Commits Don't Lie: 1.8x More Features, Half the Fixes

I went back through the git history of both repos over the same 30-day window in May.

The CLI-app shipped 272 commits. The scripts-app shipped 150. That's a 1.8x ratio, on the same me, same agent, same daily routine.

Inside the CLI repo, every single sub-command got touched at least once during the window. 100%. Inside the scripts/ folder, only 29% of the files saw any activity. The rest were dormant. 41% of all the script files in the scripts-app had been written, run once, and never opened again. The oldest one I found that fits this profile hadn't been touched in 57 days. I had completely forgotten it existed until I went looking.

There's one more number that's interesting, but I want to flag it as a hypothesis, not as proof. Looking at commit messages tagged fix: versus feat:, the CLI-app had a fix-to-feat ratio of 0.44. The scripts-app sat at 0.82. Roughly twice as many fix commits per feature on the side without a CLI.

I can't prove the CLI causes that gap. The two apps have different domain maturity, different complexity, different coverage of edge cases. Half the difference might come from the fact that the back-office for client sites is simply older and more fiddly than the partner API one. But the gap is consistent with what I observe daily, and it tracks the autonomy gap I described in the intro: the agent ships cleaner work when it has a way to verify itself, so fewer regressions sneak through.

The orphan rate (41% versus 0%) and the velocity gap (1.8x) aren't hypotheses. Those I can read straight from git log.

The Real Mechanism: Agents Need a Text-Structured Surface

The reason the CLI-app produces autonomous iterations while the scripts-app produces ping-fests has nothing to do with code quality or model size.

It's about the surface the agent has to validate its own work.

Think about what Claude Code actually does in a feature loop. It writes code, then it needs to know if the code does what it was supposed to do. If the only way to check is "open the dashboard, click around, look at the screen", the agent can't do it. Browsers return DOM. DOM without a human eye to interpret what's rendered is opaque to an agent. The colors, the loading states, the modal that popped up, the validation message at the bottom, all of it is meaningful to a person and noise to a model. The agent has no ground truth, so it stops and asks you.

A CLI returns text. JSON, structured stdout, exit codes. Things an agent can read, parse, reason about. The agent runs the command, reads the output, sees that partnerStatus: "rejected" means the mutation didn't go through, fixes the code, runs again. No human in the loop. The feedback signal is natively legible to the model.

That's the whole principle. Surface text-structured equals agent autonomous. Surface DOM-only equals agent that pings you on every iteration.

This is also why MCP servers, REST APIs, tRPC endpoints, GraphQL all work for agents calling your service. They're all text-structured surfaces. A CLI is just the simplest, most local incarnation of this principle for the agent that's coding your own app. Not calling a remote service. Writing code in your repo and needing to test it now.

You can simulate this with Playwright pointed at your dashboard. People do. It works, sort of. It also costs a 10x slowdown, a flaky retry layer, and a screenshot-comparison step that breaks every time you ship a UI change. A CLI returns the same answer in milliseconds with no flakiness, because text was always the thing the agent wanted in the first place.

The 2026 Stack Forgot a Layer (And Every AI Code Gen Tool Skips It Too)

TITRE


The Four-Layer Development Stack: Three Bright, One Forgotten

Go read any "best stack to launch your AI-coded tool" guide written between February and April 2026. KDnuggets, Idlen, Context Studios, MindStudio, you can pick one at random. They all converge on the same six or seven layers. Next.js for the frontend. shadcn for the UI kit. Supabase or Convex for the backend. Clerk for auth. Stripe for payments. Resend for transactional email. Vercel for hosting. Some add Tailwind, OpenAI, Claude, Gemini.

There are at least 50 of those guides published in the last three months. None of them mentions a CLI for your app.

Same blind spot on the AI side. Cursor, v0, Bolt, Lovable, Claude Code itself when it scaffolds a new project. All of them generate a frontend, a backend, a hosting config. Zero of them generate a CLI as a first-class layer. If you ask Claude Code to "set up a Next.js app with Convex and Stripe", you'll get those three things and nothing else. The CLI, if any, will appear later as scaffolding (next dev, convex dev) and that's it.

This wasn't a problem in 2020. In 2020, you wrote your own code, and your IDE was your feedback loop. F5, F12, console.log, console.log, console.log. The DOM was fine because you were the one reading it.

In 2026, you're not the one writing most of the code. The agent is. And the agent doesn't have eyes.

A 2026 stack with no CLI layer forces the agent to depend on you for every iteration. The agent writes a mutation, you click in the admin, you tell the agent if it worked. The agent writes a sync job, you tail -f the logs, you tell the agent what you saw. Every feature loop has you as the mandatory middle node. You think you're prompting an agent to ship for you, you're actually playing browser intermediary for the agent.

The fourth layer follows from one fact: if you want the agent to ship autonomously, you need to give it a surface it can read.

Idlen's piece argues that picking the wrong backend means rewriting your data models at 2am. Yeah, and it's worse if you don't have a CLI, because you're rewriting them by hand instead of running bun run cli model migrate.

Why Scripts Rot and CLIs Live

The 41% orphan rate doesn't come from laziness. It comes from the fact that a scripts/ folder doesn't ask anything of you architecturally.

You write scripts/migrate-orders-2025-04.mjs because you have an emergency. You run it once. It works. You commit it (or you don't, depending on how panicked you were). Three weeks later, another emergency. You write scripts/migrate-orders-fix.mjs. Same problem, slightly different name. You don't reuse the first one because you don't remember it exists. There's no scripts/ --help. There's just an ls.

The whole folder ends up like Karen from Accounting's filing cabinet: technically organized, practically unusable. Everything is "there", nobody knows where, even Karen has stopped looking.

A CLI forces a different shape. You can't add partner sync as a sub-command without registering it in the entrypoint, which means you see all the other sub-commands every time you add a new one. Discoverability is built into the tool. New sub-commands inherit the same flags (--dry-run, --limit, --verbose), the same logger, the same error handling. Idempotence becomes easy because you're already passing through a shared business layer that the dashboard also uses.

That's why the touched-rate sits at 100% on the CLI side. I'm not more disciplined when I use a CLI. The CLI is just architecturally hostile to throwaway code in a way scripts/ never is.

And --help is doing more than helping you. It's the entrypoint for any agent that lands on your repo. Claude Code types bun run cli --help once and now knows every business action it can trigger, with its flags and its description. No prompt engineering, no doc to feed. The CLI documents itself, to humans and to agents at the same time. That's what scripts/ will never give you, no matter how clean your filenames are.

Caveat I should put right here, while I'm bragging. My own CLI has a real weakness. Out of 14 sub-commands, 11 have no description in the --help output. That's 79% of my commands appearing as bare names with no explanation. The CLI forced execution discipline. It did not force documentation discipline. Claude Code can still discover every command, parse the JSON output, and use it. A junior dev opening the repo for the first time would have to read the source. I'm fixing it slowly, but the lesson stands: the architecture solves the running problem, not the teaching problem. You still have to write the docstrings.

Your App Is Already Agentic by Accident

The thing nobody tells you in the stack-2026 guides: a CLI that shares the business layer with your UI makes your app natively agent-ready. Not as a separate product. As a free side effect.

Three concrete ways to expose your CLI to an agent that isn't sitting in your IDE.

Wrap it as an MCP server. Maybe 50 lines of TypeScript. You write a thin MCP server that registers each sub-command of your CLI as an MCP tool. The tool input maps to CLI flags. The tool output is the JSON the CLI already returns. Boom, any MCP client (Claude Desktop, Cursor, anything that speaks MCP) can call your CLI as a native tool. You wrapped your existing CLI and called it an MCP server.

Cron plus agent. A scheduler runs bun run cli catalog refresh every six hours. The JSON output goes into a Convex table. A background agent reads the latest row, decides if the refresh hit a partner error, and if so triggers a follow-up bun run cli partner reconnect. No browser. No human. The agent makes decisions based on text the CLI emits, then triggers more CLI commands. You just turned your back-office into a self-healing loop.

HTTP gateway shell-out. You expose a tiny Express or Hono endpoint that takes a CLI command name plus args, shells out to the CLI, returns the JSON. Authenticated of course. Now any external agent that speaks HTTP can drive your app. No SDK to maintain. The CLI is the SDK.

None of those three asks for a refactor of your business logic. They're pure exposure layers on top of code you already wrote. One stack, two modes: dashboard for humans, CLI for agents. The dashboard didn't know it had a twin. Now it does.

The Three Integration Patterns (Pick One, Pick Right)

If you're going to bake a CLI into your stack, there are three ways to wire it. Only one of them gives you the autonomy gap I described earlier.

Pattern 1: CLI shares the business layer with the UI. The dashboard "Sync partner" button calls a Convex mutation. The CLI partner sync command calls the same mutation, with the same Drizzle schema, same TypeScript types end-to-end. Same idempotence guarantees. Same audit log. This is the one you want. Everything I've been describing assumes this pattern. (Convex pairs particularly well with Claude Code for this exact setup, because the typed end-to-end API makes the CLI a thin wrapper around mutations rather than a parallel implementation of them.)

Pattern 2: CLI as HTTP client of your own API. The CLI calls your REST or tRPC endpoints. Easier to isolate, language-agnostic, you can ship the CLI to clients who don't run your monorepo. But you lose the typing benefits, you have to handle auth manually, and idempotence is up to whoever wrote the endpoint. Acceptable as a fallback if your backend is in a different repo than your CLI consumer. Not optimal.

Pattern 3: DevOps CLI, separate from the app. Deploy commands, backup scripts, monitoring tools. Useful, but it's not a substitute. If your app lives, you also need Pattern 1 or 2 alongside it. Pattern 3 alone is what most teams ship and what gets confused for "we have a CLI". It's just a deploy script.

Verdict: Pattern 1 is the only one that returns the velocity gap. Pattern 2 is half the work for a fraction of the benefit. Pattern 3 is hosting plumbing dressed up as a CLI.

If you can only build one pattern, build the first.

Tooling: cac vs citty vs the Rest in 2026

Quick rundown of what's actually worth using to build the CLI itself, since this is where most people get stuck for a weekend.

cac is my default. About 2 KB, zero dependencies, ESM-first. If your CLI has fewer than 20 sub-commands, this is the right tool. Small enough that you don't think about it, and Claude Code generates clean cac code on the first try.

citty from the UnJS folks is the ascending pick for 2026. Type-safe, lazy-loading sub-commands (matters when you start hitting 30+), ESM-first, plays nicely with Nitro and the rest of the UnJS world. Migrate to it when your CLI grows past where cac feels cramped.

commander is the legacy mature option. Stable, well-documented, will do the job, but the API feels older and the bundle is heavier than it needs to be. Choose it only if your team already knows it.

clipanion is OOP-flavored, used by Yarn. Good if you like classes and want strict typing. Niche.

oclif is over-architected unless your CLI itself is the product (think Heroku, Salesforce). For a CLI that supports an app, oclif is bringing a forklift to move a couch.

For the rest of the experience, you want clack for prompts (gorgeous TUI, very recent), picocolors for colors (smaller and faster than chalk now), consola for logging, listr2 if you have multi-step tasks with progress bars, and bun shell or zx for embedded scripts.

Start on cac. Migrate to citty when you cross 20 sub-commands.

Don't overthink it.

When the Missing CLI Hurts (Four Scenarios)

Four moments where the absence of a CLI costs you specifically, in case the abstract argument hasn't landed.

Onboarding a new client e-shop. Without a CLI, each new client is two to three hours of clicking in the admin: provision domain, set theme, install plugins, seed catalog, configure DNS. Multiply by ten clients in a month. With a CLI, site init shop.example.com runs the whole sequence in five minutes. The agent can run it on its own when a Stripe webhook fires "new customer".

Recurring data fix. A partner sometimes returns malformed prices in their API. Without a CLI, every incident means rewriting the fix mutation by hand, or digging through scripts/ to find "the one that worked last time". With a CLI, you have bun run cli prices reconcile --dry-run, idempotent, versioned, documented in --help. The agent invokes it itself when the alert fires.

Audit during incident. Something broke in prod, you need to know which orders were affected. Without a CLI, you grep through scripts/ for "that audit thing I wrote in March". With a CLI, cli orders audit --since=2026-04-01 exists, is documented, and the agent can run it while you're still typing in Slack.

External data refresh. Cron has to refresh a partner catalog every night. Without a CLI, the cron points to node scripts/old-thing.mjs and the file slowly drifts out of sync with the schema, until one Tuesday it fails silently for 48 hours before someone notices. With a CLI, the cron points to bun run cli partner refresh, which shares the same business layer as the dashboard, so a schema change breaks the cron at the next deploy instead of in the middle of the night.

Same four problems. The CLI makes each one boring.

The 30-Second Test Your Stack Has to Pass Today

Open your terminal. cd into your repo. Type bun run cli --help (or yarn cli --help, or npm run cli -- --help, whatever your package manager).

There are exactly three possible outcomes.

Outcome A. Nothing comes out. Or "command not found". Or package.json doesn't have a cli script. You don't have a CLI. You have a UI bolted onto a backend. The agent that codes your app depends on you for every iteration, and the orphan rate of your scripts/ folder is climbing slowly toward 41% whether you measure it or not.

Outcome B. A list shows up, but the sub-commands are generic devops things (build, dev, test, deploy) with no business actions. You have devops scaffolding. Useful, but the agent can deploy your code and not validate that a feature works. You're at Pattern 3 of the three patterns above. Half the journey.

Outcome C. A list shows up with sub-commands named after business actions (site init, partner sync, catalog refresh), each with a description. You have a 2026-ready stack. The agent that writes your code has a way to verify it. Your scripts/ folder is empty or has fewer than five files. You can stop reading.

If you got A or B, this is where you start. Pick one or two business actions you do most often (the ones that show up in your scripts/ folder under three different names), and make them the first two sub-commands of a real CLI. Wire them through the same business layer the dashboard uses. Make the output JSON-shaped. That's the smallest possible Pattern 1, and it'll change how Claude Code works on your repo by tomorrow morning.

I already wrote about CLIs as the interface for agents calling your tools. This one is about CLIs as the interface for the agent writing your code from the inside. Different problem, same layer. The first one is about MCP versus CLI as a remote calling convention. This one is about whether the agent in your IDE has a way to ship.

The next time you start an app, decide on day one whether the CLI is a kernel or an afterthought. That choice decides how much time Claude Code spends coding for you, versus how much time you spend being the mouse of Claude Code.


Six months building two apps in parallel and I didn't realize I was running a controlled experiment on myself. As an old Linux head, I already knew CLI beats most things, intuitively. What I didn't see coming was the part that mattered most: not just speed and scriptability, but giving the agent a feedback loop it could read on its own.

Claude and Codex won't suggest this by default. So tell them yourself: bake a CLI layer as the kernel, day one.

I'm out, piña colada's waiting 😎

CLI was the layer the whole time.

Sources

  • KDnuggets, Tech Stack for Vibe Coding Modern Applications (February 2026)
  • Idlen, The Best Stack to Launch Your AI-Coded Tool in 2026 (April 2026)
  • Context Studios, The Perfect Vibe Coding Tech Stack 2026: 10 Tools Every App Needs (February 2026)
  • First-hand audit, two of my own apps, 30 days of git history (May 2026)

Top comments (0)