A few weeks ago I've retired @flxbl-dev/mcp from the FLXBL repo. In its place: a CLI with --json, --stdin, --dry-run, stable exit codes, and an llms.txt. The agents — Claude Code, Cursor, my homegrown ones — got faster, cheaper, and far less weird. This is the post-mortem and the design rationale, with sources for every cited claim because I respect your time and skepticism.
I was an MCP enthusiast. I wrote one of the first FLXBL MCP servers when the protocol was still warm. It exposed the obvious things — validate_schema, publish_schema_version, check_migration_status, generate_client_types — and for a while it felt like the future. Plug FLXBL into Claude Desktop, watch a model reason about your graph schema, ship a feature. Magic.
Then I started using my own product through the MCP, every day, with real agents in real coding loops. The seams showed up fast.
This post is about those seams — and why I think a well-designed, opinionated CLI is the better default surface for AI agents for most products. Not all. Most.
Why MCP looked like the right answer
Let me steel-man my own past decision. MCP solves a real problem: how do you let a model take meaningful actions against a remote system with structured inputs and outputs, without the model having to invent HTTP requests on the fly?
The protocol gives you a standard way to advertise tools, validate arguments against JSON Schema, and stream results back. It's elegant. It works. Anthropic, OpenAI, Google, and the IDE vendors have all converged on it. There's a reason every backend service is racing to ship one.
For a product like FLXBL — schema-first, multi-tenant, lots of structured operations — MCP felt like a perfect fit. So I shipped one. Then I lived with it.
What killed the MCP server (the four problems)
1. Tool definitions eat your context window before you say "hello"
This is the one Anthropic's own engineering team finally said the quiet part out loud about. In Code execution with MCP: building more efficient AI agents, they describe a real workflow that consumed roughly 150,000 tokens of context just on tool definitions and intermediate results — and showed how a code-execution-based pattern dropped that to about 2,000 tokens, a ~98.7% reduction. From Anthropic. About their own protocol.
I'm not picking a fight with MCP servers that expose two carefully-scoped tools. I'm pointing at what happens when you actually want an agent to do things with a real product — schemas, entities, relationships, webhooks, identity, data, GraphQL — and you naively map every operation to a tool. You end up with 30+ tool definitions front-loaded into every conversation, every turn.
Writer's engineering team published a great breakdown of the same problem (When too many tools become too much context) showing how tool metadata in production agents routinely consumes 25–30% of even a 200k-token context window. The New Stack covered the broader trend in 10 strategies to reduce MCP token bloat. Those tokens aren't free. You pay for them on every turn, and they crowd out the actual context — the user's intent, your schema, the conversation history.
A CLI doesn't have this problem. The agent runs flxbl --help once, or reads flxbl context --full --json once, and the rest is on demand.
2. Tool count degrades model accuracy, not just speed
Even if you can afford the tokens, more tools makes the model worse at picking the right one. This is the well-documented "tool overload" problem — when an LLM's attention is spread across many similar-sounding tool schemas, it confuses parameters between them, hallucinates names, and calls the wrong tool with arguments lifted from another's schema. Industry write-ups like The MCP Tool Trap and the Lunar.dev tool-overload analysis both land on a practical ceiling somewhere between 15 and 20 tools per server before quality measurably drops.
FLXBL needs more than 20 verbs to be useful. So either I split into multiple MCP servers (more setup friction, more handshakes, the same tokens) or I accept that an agent driving my product through MCP is going to be measurably worse than an agent driving the same product through a single binary with a coherent verb structure.
3. Debuggability is brutal when something goes wrong
When a CLI fails, I get a non-zero exit code, a stderr message, and — if I designed it right — a structured JSON error I can pipe into jq. I can re-run the exact command. I can paste it into a terminal. I can hand it to a colleague.
When an MCP tool call fails, I get... whatever the agent decided to surface. The actual JSON-RPC frame, the request ID, the schema validation error — it's all behind layers of agent UI. Reproducing a failure means reconstructing what the agent sent, which is a guess at best.
I lost more time debugging "did the model call the tool wrong, or did the tool actually break?" than I want to admit. Every CLI-shaped failure mode is a problem I already know how to solve. Every MCP-shaped failure mode is a new one.
4. The MCP runtime is one more thing to host, monitor, and version
A CLI is a binary on npm. Users install it, pin a version, run it. The "deployment" is npm publish. Failures are local.
An MCP server is a long-lived process that needs to be launched, kept alive, possibly auth'd against a backend, possibly proxied through Docker, and version-coordinated with the client. For a hosted-only product that's fine. For a product that wants to live in CI pipelines, GitHub Actions, local dev loops, and agent shells, it's friction with no upside.
Why a CLI fits an LLM agent's hand
Here's the thesis: an LLM agent and a UNIX shell are shaped almost identically. Both consume text, produce text, run small programs, pipe outputs into other programs, and treat exit codes as signals. The 1978 Unix philosophy of small composable tools with clean stdin/stdout/stderr is, accidentally, the perfect interface for a 2026 transformer.
The Command Line Interface Guidelines — written for humans, before agents were a serious concern — read like a spec for LLM-friendly tooling. Composability, predictable output, machine-parseable formats, structured output behind a flag and pretty-printing when stdout is a terminal. Every guideline aimed at a careful human happens to also be a guideline for a careful agent.
A CLI gives you, for free:
| What you get | Why an agent loves it |
|---|---|
| Stable exit codes | Branching without parsing prose. if exit == 4 { re-auth }
|
| JSON on stdout, errors on stderr | Pipe one command into the next without a parser-of-parsers |
| stdin payloads | No 4KB JSON arguments crammed into a tool-call schema |
--dry-run |
Agent can preview a destructive op and reason about it before sending |
| Bash composition | `flxbl schema diff --json |
You don't have to teach a model what {% raw %}| and > and $? mean. It already knows.
What the FLXBL CLI actually looks like
The full reference lives in the docs and the LLM-Friendly CLI page, but here's the shape so you can judge the design.
Verb structure: flxbl <noun> <verb>. Nouns are the things in your tenant — schema, entity, relationship, webhook, team, role, access-key, identity, api. Verbs are what you do to them — list, get, create, patch, delete, validate, diff, versions. A handful of top-level verbs handle the agent loop itself: login, doctor, context, generate, pull, dev, data, graphql, whoami. That's it. An agent that has seen kubectl or gh or git can predict the rest.
Five affordances every command supports. This is the part I obsessed over, because it's where the LLM-friendliness actually lives:
# 1. --json: deterministic machine-readable output, every command, every time
flxbl schema show --json
flxbl entity list Product --where '{"status":"active"}' --fields id,name --json
# 2. --stdin: pipe payloads in, skip shell-escaping hell
cat new-schema.json | flxbl schema create --stdin --json
# 3. --dry-run: preview the mutation, see the diff, decide
flxbl schema migrate --file ./schema.yaml --dry-run --json
# 4. Single-command bootstrap for any agent starting cold
flxbl context --full --json
# Returns: tenant, active schema, generated endpoints, identity config,
# command inventory, exit-code reference. One call, full picture.
# 5. Stable exit codes, documented and unchanging
# 0 success | 1 general | 2 usage | 3 auth-required | 4 auth-expired
# 5 not-found | 6 validation | 7 breaking-schema-change | 8 rate-limit | 9 network
The exit codes deserve a paragraph. When an agent sees exit code 7, it doesn't have to read the error message to know "the schema migration would have broken existing clients." It can branch on the number, surface the diff to the user, and ask. That's the agent loop I wanted: deterministic at the boundary, conversational only when conversation is what's actually needed.
--dry-run is the one I'm proudest of. Every mutation has it. The agent can plan a sequence of changes, show me what each one would do, and only after I (or another agent) approves does it run for real. It turns a destructive surface into a reviewable one. Try doing that cleanly in MCP.
The AI dev loop: CLI + Context7 + llms.txt
The CLI is half the story. The other half is making sure the agent knows what FLXBL is without me explaining it every session.
I publish three things specifically for agents:
flxbl context --full --json — for runtime knowledge. The current state of this tenant, this schema, these endpoints. It's what the agent runs first. It's the thing tool definitions in MCP were trying to be, except it's pulled on demand and it includes only what exists.
flxbl.dev/llms.txt — for static knowledge. A curated agent-readable map of the canonical FLXBL docs, in the proposed llms.txt convention. When an agent sees an unfamiliar product, this is the first thing it should fetch. Mine is small on purpose: a one-line product description, links to the philosophy page, the CLI reference, and the schema design guide. Nothing more. Curation matters more than completeness here.
Context7 — for indexed semantic knowledge. The full FLXBL docs are mirrored at context7.com/websites/flxbl_dev (currently 75,812 tokens across 464 snippets, trust score 4.9). Any agent with a Context7 client can pull current FLXBL knowledge into context — not the version baked into its training cut-off, but what's actually true today. The combination — CLI for actions, Context7 for reference, llms.txt for orientation — is what I'd call the "AI-friendly stack" for a developer-facing product. None of it requires running an MCP server.
This is the part I find most interesting in retrospect. Replacing MCP didn't mean replacing structured-tool-use with chaos. It meant disaggregating the three things MCP was trying to bundle: knowledge of the product, operations against the product, and runtime state of my installation. Each of those wants a different shape. A CLI handles operations. Context7 handles knowledge. flxbl context handles runtime state. None of them needs a long-lived JSON-RPC server in between.
When MCP is still the right answer
I'm not telling anyone to delete their MCP server. There are real cases where MCP wins, and I want to be honest about them:
- Tightly scoped, semantically rich tool surfaces. A document-search tool with three parameters and a great schema is a better fit for MCP than for a CLI. The model gets schema-validated arguments and structured results without inventing flags.
- Hosted-only services with no install story. If your users will never run a binary locally — they'll only ever talk to your service through an agent — MCP's transport story is genuinely useful.
- Streaming, long-lived sessions. MCP's bidirectional nature pays off for tools that emit progress, intermediate results, or cancellation. CLIs can do this with streamed JSON, but it's clunkier.
-
Discovery in unfamiliar environments. An agent that's never seen your product before benefits from MCP's
tools/listmore than from--help. (Thoughflxbl context --full --jsonis my counter-argument here.)
If you're reading this and thinking but my product is one of those — you're probably right. Ship the MCP. The point isn't that CLIs win every fight. It's that CLIs win the fights MCP servers are most often used for, and people don't realize it because the MCP gold rush is loud.
The bottom line
MCPs are great when you have one or two well-shaped tools, a hosted-only product, and an agent that needs schema-validated structured calls. They're a tax when you have a real product surface — dozens of operations, evolving schemas, and users who want to script things from CI.
For FLXBL, the rewrite was a clean win. The agents got measurably faster (no 30k-token tool preamble per turn), the failures got debuggable (exit codes, stderr, paste-able commands), and I got my Sundays back (no MCP server to babysit). The CLI does everything the MCP did, with --dry-run for safety and --json for composition, and it's also a perfectly normal CLI for humans who don't want an agent in the loop.
If you're building a developer-facing product and you're about to ship an MCP server — read Anthropic's own post on the token cost, read Simon Willison on the security model, then read clig.dev and ask yourself whether a tightly-designed CLI would do the same job for one-tenth the agent overhead and zero new infrastructure.
I think you'll come out where I came out.
Try it yourself
npm install -D @flxbl-dev/cli
flxbl login
flxbl context --full --json # ← give this to your favourite coding agent
- Docs: flxbl.dev/docs
- LLM-Friendly CLI page: flxbl.dev/docs/llm-friendly-cli
- llms.txt: flxbl.dev/llms.txt
- On Context7: context7.com/websites/flxbl_dev
What's the most painful MCP server you've integrated against this year? And — counter-question — what's the case where MCP genuinely beats a CLI for your product? I want the disagreements as much as the agreements.
Marko Mijailović is the creator of FLXBL, a graph-based Backend-as-a-Service for multi-tenant applications with evolving schemas, relationship-heavy data, generated REST and GraphQL APIs, and an LLM-friendly CLI. You can find him on LinkedIn, reach out through email, or join the FLXBL Discord.
Top comments (0)