Austin Vance for Focused

Posted on May 4 • Originally published at focused.io

MCP Is Packaging. Agent-Operable Interfaces Are the Product | Focused Labs

#programming #ai

MCP packages tools, but the real product is the narrow, typed, auditable interface an agent can actually operate.

Austin Vance, CEO of Focused

MCP is not the hard part.

The hard part is designing a system that an agent can use, as opposed to guessing, wandering, or mangling it. The protocol is the distribution rather than the architecture

This is kind of important. Every enterprise AI conversation I’ve had will, at some point, boil down to this: we have a model, we have a workflow, and we have a tangle of internal tools designed for humans to interact with them through a web interface at human speeds. Then the question becomes “should we make an MCP server to handle all of this?”

Fine. But for what?

The Model Context Protocol makes it easy for applications to expose tools and model context. That’s useful and I'm not opposing MCP. I am opposing the use of this protocol to justify exposure of a useless shortcut as being also useful.

Harrison Chase broke down the lock-in problem well: switching model providers is easy, switching harnesses is less so, and model providers want to lock teams in through the harness. The harness is where the agent learns about the actions in an application, the state, the model’s memory, what can be retried, what needs approval, and what telemetry gets written down.

But then there is the interface below the harness, which gets little recognition.

A bad interface can turn an excellent harness into a nightmarish pain. A good interface can make any harness only fair at worst.

I see why “just build an MCP server” isn’t the entire answer. An MCP server can send a messy action. It can wrap up a sharp action. But deciding which action exists in the first place is up to the team. And it's a design / experience problem not engineering.

Teams build integrations for internal agents by wrapping around existing APIs, often structured to hide awkward frontend decisions, like why the API returned an object with an object with an object inside of it. An endpoint might have a side effect of updating state because it’s an admin screen. Exceptions include human-readable error messages, implicit permissions, opaque pagination parameters, no support for dry running, and no idempotency keys. The most lacking verb in this system is “after policy rules apply, approve this one invoice,” and that ends up on an agent with the verb updateInvoice. Stricter prompts don’t work.

Welcome to production.

After reading yet another question about whether a given subsystem has an MCP server, I paused for an instant to ask myself whether I missed something here. We shouldn't be asking "is an MCP server," instead we should ask if the system in question has handles for the agent that just got invited in.

A handle is a small, typed, boring action, describing what it intends to do with some data. It describes what the data contains, what the operation needs from it, and what it will look like afterward. It fails in a way that the caller can understand. Handle-based operations are easy to test without a full model. Finally, handles leave traces of their prior actions.

Do the new examples reinforce the point? Google’s MCP Toolbox for Databases might sound utterly bland because “database plus MCP” is a magical phrase. But in this case, the interesting new aspect is that databases require controlled, auditable work that can be inspected by the software agent. MathWorks has released an official MATLAB MCP server, which is interesting because the interface to MATLAB’s mature technical environment is vastly more appropriate than a chat window. Browserbase and LangChain are demonstrating Deep Agents with search, fetch, and browser subagents. Again, a cheap, light subagent performs quick retrieval, followed by a heavier browser-based operation if necessary.

I don’t mean that every single thing suddenly becomes an MCP server. I mean that more of the important tools in a business can become something controlled through an agent instead of through a browser tab or terminal command.

There is a difference.

An MCP server is just one package boundary among several, each with its own strengths and weaknesses. An agent-operable interface is a product decision, choosing specific verbs, inputs, outputs, reversible operations, and mandatory human pause actions. A protocol can then move that interface around, but it cannot make the interface good.

MCP moves an interface around. It does not make the verbs worth trusting.

This is the same anti-pattern we saw with APIs. Companies would publish a REST API to tremendous fanfare, convinced that integration problems were now solved. In practice, the nouns and mutations provided by the API would prove inadequate for anything beyond the simplest cases. Docs would sometimes contradict behavior. And while most of the workflow might be automatable, the remaining chunk still required a human being logged into the admin console.

The gap costs more as agents move further into it, since they typically stop short of explicitly stating the ambiguities at the boundary, and instead select tools, insert missing fields, retry operations, and give misleading summaries of the results as if they were progress. Agents do not intend to fail in workflows. Instead, they are given an irregular surface to work on for which they have no clear mandate and for which they must pretend to be competent.

A useful way to think about this is Developing AI Agency. The word “agency” comes with unfortunate connotations of personality, so I try to think about it in terms of the required affordances for any agent: a goal, some tools to pursue it with, memory, feedback, and permission to act. When the tool layer is too vague, the AI ends up with fake agency. It can talk about work and even generate a lot of thoughtful-sounding design language, but it can’t actually do the work.

The current gold rush of building MCPs obfuscates this problem because when people say “server” they think of code and physical hardware. Code and hardware are tangible. There is a repo, a README, and a demo of someone, usually Claude or Cursor, opening up the tool and something happening.

That demo is not the test.

Test whether the interface still behaves when the request is boring, partial, duplicated, late, unauthorized, or wrong. Test whether a reviewer can always reconstruct what happened to an object after the agent touched the handle of the thing. Test whether the action can be replayed in staging without accidentally sending the email to customers. Everybody Tests, even when the thing under test is an agent holding a tool handle.

A useful agent-operable interface has a few properties.

The verbs are narrow. A verb for “create refund request” instead of “update order.” A verb for “draft response” instead of “send message.” A verb for “propose schema migration” instead of “run SQL.” Narrow verbs help by letting the operation name strongly suggest the operation’s intent.

All inputs are provided in a form that the domain expects, not just pure JSON schema for the sake of it. Real domain constraints are used where possible, to reflect the kind of validation that matters in the application. This means providing an account ID that actually exists in the system, a payment amount that has a meaningful currency, and a date and time with timezone rules that have real-world meaning to the user. And when using enums, the validated output should contain meaningful strings, not just values used in the demo.

Outputs should be machine-readable and human-readable at the same time. The agent expects certain fields to be populated. A human reviewer wants to read a simple statement of what changed, what didn’t change, and what still needs work.

There’s a dry-run path. A dry run is the cheapest safety mechanism available, and almost nobody shipping generated code tries it first. A dry run turns “can the agent do this?” into “can the agent explain the diff before doing this?” That is where human judgment is better.

Interfaces are idempotent to the degree possible. Networks fail, agents retry, and tool calls time out while the downstream system was actually working. If creating an invocation of create_refund_request also creates a second refund, or a second ticket, or a second production deploy, then the interface is not yet ready for an agent.

Every interface has contract tests that don’t involve a model. This matters. If every single correctness check has to run an LLM, we have built a slot machine and only looked at the CI badge. The tool’s schema, how it validates, what a dry run looks like, how permissions fail, and what audit records are generated should all be tested by normal software tests. Save the model evals for when there’s a model involved.

The interface leaves evidence. Not vibes, though it could strive for better ones. Tangible records of who acted, through which agent, under which policy, against which object, with what proposed change, and with what final result. Here I’m talking about connecting observability to governance without inverting into another dashboard cult.

A useful handle is a contract the agent cannot creatively reinterpret.

The Google Cloud conversation with Harrison Chase framed harness engineering as the path from demo to production. I think that is right, and I think the next practical step is interface engineering. The harness made sense once it had an interface for composing sane things.

This is why abstractions on top of LangChain are useful too. Start with a basic agent primitive, then a graph, and finally a Deep Agent that can even use browser subagents and human interruption. Every level of abstraction still ultimately bottoms out at a tool call, which either corresponds to a clean domain operation or a tangled mess of code that happens to work on the backend.

In practice, Multi-Agent Orchestration in LangGraph is only half the story. The other half is whether the interface lets the worker do anything worth trusting.

It’s getting said out loud in the community now: “Stop building MCP servers. Build CLIs that agents can use”. I don’t care what the end result is, as long as it’s a CLI, OpenAPI endpoint, MCP tool, database management procedure, internal command bus, or whatever boring thing is observable, testable, and readable by others.

Interesting new projects are emerging around this idea too. agent-install treats agent capabilities as installable surfaces across coding agents. loadam turns OpenAPI specs into tests, MCP output, and drift reports. freeCodeCamp’s LangGraph, MCP, and A2A guide also illustrates the progress from single-agent demos to more structured systems with protocols between them.

Good. Just make the distinction between what the protocol diagram shows and what the system can actually do.

The work is deciding what actions the agent can take within Salesforce, Jira, GitHub, Postgres, SAP, Stripe, and the lingering internal admin app that is totally going to get replaced tomorrow. Deleting broad verbs is the new favorite hobby. Adding dry runs is straightforward. Making failures typed is tedious. Writing tests for contracts before a single model sees the tool is boring.

Boring is the point.

Stop Eager-Loading MCP Tools Into the Context Window. A giant pile of tools is not capability. It is usually confusion with a larger token bill. Agents need fewer, sharper handles to their tools, and tool catalogs should feel more like a well-designed command line than a junk drawer with JSON schemas bolted on.

Agent-operable interfaces should be treated as part of product architecture, not just sweeping up integration bits and pieces that product teams don’t want anymore. Enterprise teams should own the verbs the same way they own the database schema. Version them. Deprecate them. Test them and document the failure modes. Have review for dangerous actions. Make the interface boring enough that the agent has no creative wiggle room around the important bits.

MCP will help distribute interfaces. Harnesses will help compose them. Models will get better at calling them.

Companies will not win by having the most MCP-capable servers. They will win by having the cleanest handles in their systems.