Rhumb

Posted on Apr 12

Read-Only MCP Removes a Failure Class, But Only if the Whole Tool Boundary Is Actually Read-Only

#api

Read-Only MCP Removes a Failure Class, But Only if the Whole Tool Boundary Is Actually Read-Only

A lot of current agent-safety discussion still treats approval prompts as the main defense.

If the model wants to write a file, call a dangerous tool, or push a change, ask a human first.

That is better than nothing.

But it is a weak substitute for a simpler and often more useful design move.

Sometimes the safest system is the one that cannot write through that surface at all.

That is why the recent read-only MCP conversation matters.

Not because read-only solves everything.
Not because it makes agents magically safe.
And not because every system should stay read-only forever.

It matters because read-only removes an entire failure class.

A hallucinated instruction, a prompt-injected string, or a confused planning step may still happen.
But if the visible surface only supports inspection, the mistake dies as text instead of becoming a real mutation.

That is a meaningful trust boundary.

The catch is that many systems claim read-only in one narrow place while leaving write-capable side doors open everywhere else.

So the useful question is not just, "Does this MCP server expose only read calls?"

It is, "Does the surrounding runtime preserve a real read-only trust class across the whole tool boundary?"

That distinction is the difference between a nice label and real containment.

1. Approval prompts are not the same thing as structural containment

Human approval flows are attractive because they feel flexible.

You can expose a broad surface, then ask for confirmation when something risky happens.

In practice, that creates three recurring problems.

First, humans stop learning from the prompts.

If the system asks for approval constantly, the prompt turns into background noise. The operator is no longer evaluating the action deeply. They are clearing friction.

Second, approval prompts happen late.

By the time the user is asked, the model has already discovered the tool, selected it, framed the action, and often carried a large amount of untrusted or ambiguous context into the plan.

Third, prompts do not simplify the trust story.

The system is still write-capable. The operator still has to reason about whether this write is reversible, whether it crosses a tenant boundary, whether it leaks through shell access, or whether another tool could achieve the same effect by a side path.

That is why approval-heavy systems often feel safer than they really are.

They are adding ceremony around power, not necessarily reducing the power itself.

A real read-only boundary is different.

It changes the reachable authority surface before the model acts.

That is a control-plane move, not just a UI move.

2. What read-only actually buys you

Read-only is valuable because it removes mutation from the allowed action set.

That sounds obvious, but the practical consequence is larger than it first appears.

When a read-only boundary is real, several bad outcomes disappear together:

accidental writes from sloppy planning
prompt-injection-driven mutation through that surface
approval fatigue around low-confidence write prompts
hidden side effects from tools that mix retrieval and mutation
post-incident confusion over whether the system could have changed state at all

That is why read-only deserves to be treated as a trust class.

It is not merely a product feature.
It changes what kinds of failures are possible.

That makes it especially useful in a few cases:

early operator workflows where the agent should inspect before acting
retrieval-heavy systems where evidence gathering matters more than execution
local assistant setups where read-only reduces ambient blast radius without requiring a full security stack
shared or semi-trusted environments where the team has not yet earned confidence in write paths

In those cases, read-only does not just lower risk.
It also improves clarity.

A human can reason more quickly about what the system is allowed to do.
The model has a smaller authority surface to choose from.
And the logs become easier to interpret because denied escalation attempts stand out clearly.

3. Most read-only claims fail because the rest of the boundary is still write-capable

This is where the market language gets fuzzy.

A server can expose only read-oriented MCP tools and still live inside a runtime that is absolutely not read-only.

For example:

the MCP surface is inspect-only, but the agent also has shell access
filesystem reads are blocked from mutation, but the agent can still write via another mounted tool
the server itself is read-only, but the agent can exfiltrate data over open network egress
the model cannot edit a target system directly, but it can generate executable artifacts that another tool will apply later

At that point, "read-only" is a local property on one interface inside a broader write-capable system.

That is still useful to know.
But it is not the same thing as a true read-only trust class.

This is why the real authority classes matter more than a single marketing label.

The practical set usually looks something like this:

inspect: read and observe without mutation
write: change data or configuration
execute: trigger actions whose side effects may be indirect or broad
egress: send information or outputs to another system or party

A system only deserves strong read-only language when those classes stay visibly separate.

If inspect is read-only but execute or egress remain open next to it, the operator still needs to reason about side effects and escape paths.

That does not make the inspect surface worthless.

It just means the correct claim is narrower: read-only MCP surface, not read-only runtime.

That distinction should be explicit.

4. Discovery-time legibility matters as much as runtime enforcement

A lot of teams focus on whether a dangerous action is blocked at execution time.

That matters, but it is not enough.

The model starts shaping plans much earlier, when it sees what tools exist and how those tools are described.

If inspect, write, and execute capabilities are flattened into one tool catalog, the agent is already reasoning across a mixed-authority surface before any denial happens.

That creates planning confusion and weakens the operator's trust model.

A stronger design keeps authority classes legible at discovery time.

That can mean:

separate read-only and write-capable namespaces
explicit authority labels in tool metadata
distinct manifests for inspect-only versus mutate-capable roles
discovery filtered by actor, task, or policy context
capability descriptions that reveal irreversible side effects clearly

The goal is not only to stop bad calls.

The goal is to make the reachable surface understandable before the agent commits to a plan.

That is especially important in MCP, where visible tool shape influences both token cost and behavioral drift.

A read-only trust class should feel obvious from discovery alone.

If the operator has to inspect implementation details to find out whether the system can write, the boundary is too blurry.

5. Deterministic governors are more useful than hopeful policy statements

Another important split in the current conversation is between declared policy and enforced policy.

Many systems say they are read-only because the prompt tells the model not to write, or because the intended workflow is inspect-first.

That is not a reliable boundary.

A real read-only trust class needs deterministic enforcement before execution.

That can take several forms:

capability manifests that omit mutation entirely
proxy layers that reject writes before they reach the tool
policy engines that classify actions by authority class
allowlists that admit inspect calls and deny mutation by default
separate credentials for inspect and write paths, with write credentials absent from the read-only runtime

The important point is that the control should live outside the model's goodwill.

The model can request an escalation.
It should not be able to silently perform one.

This is where typed denials start to matter too.

A blocked write should not disappear into a generic error.
A good denial tells the operator and the runtime what happened:

this was a write attempt
the current trust class is inspect-only
the request was blocked before execution
the attempted escalation is available as evidence

That turns denial into more than friction.

It becomes a governance signal.

6. Denied escalations are evidence, not just failures

One of the most underused ideas in agent safety is that blocked actions are informative.

If an agent repeatedly tries to cross from inspect into write, that tells you something about one or more of the following:

the task is underspecified
the tool descriptions are misleading
the model is over-generalizing from prior context
untrusted inputs are trying to steer the system toward mutation
the workflow probably needs a different authority tier

In other words, denied writes are not only proof that the guardrail worked.

They are also proof that an escalation pressure exists.

That is why read-only systems should log denied escalation attempts clearly.

Not as noise.
As data.

A trustworthy read-only runtime should preserve at least:

attempted action class
target resource or tool
caller identity or session context
policy reason for denial
whether the attempt came from model planning, user request translation, or chained tool output

That evidence matters operationally.

It tells you whether read-only is the right steady-state boundary, whether a task needs a privileged lane, or whether the agent is drifting into action classes it should never have seen.

7. How Rhumb should evaluate a read-only trust class

This is where the read-only discussion becomes more than a design opinion.

It should become an evaluation surface.

If Rhumb is going to treat read-only as a meaningful trust class, the useful questions are not just "Does a read-only mode exist?"

They are questions like:

A. Authority-class separation

Are inspect, write, execute, and egress visibly separated?
Is the read-only claim scoped to one tool, one namespace, or the whole runtime?

B. Discovery-time legibility

Can an agent and operator tell which tools are read-only before execution?
Are mutation-capable tools hidden or labeled distinctly?

C. Deterministic enforcement

Are writes impossible through that surface, or merely discouraged?
Does enforcement happen before the call reaches the tool?

D. Side-door consistency

Do shell, filesystem, browser, or network channels undermine the read-only claim?
Is the broader runtime aligned with the claimed trust class?

E. Typed denials and evidence

Are blocked escalations visible and attributable?
Do denials preserve enough context to be operationally useful?

F. Escalation model

If a task really needs write access, is there a clear separate lane?
Or does the system fall back into vague prompt-based approval?

That is the difference between a marketing checkbox and a real governance property.

8. The practical recommendation

The near-term recommendation for many agent systems is simple.

Start with a real read-only trust class where you can.

Not forever.
Not everywhere.
But as a deliberate boundary.

Let the agent inspect first.
Make write access a separate, explicit authority tier.
Keep side doors visible.
And treat denied escalation attempts as evidence that helps refine the workflow.

That approach tends to outperform approval-heavy broad-authority systems in the early stages because it gives both the human and the runtime a clearer story.

The system either can write here or it cannot.

That is much easier to trust than a broad tool surface wrapped in constant warnings.

Read-only does not remove the need for governance.

It is governance.

Or more precisely, it is one of the cleanest trust classes an operator can give an agent before they are ready to hand it more authority.

And in the current MCP landscape, that is not a small design choice.

It is one of the few moves that removes a failure class instead of merely apologizing for it later.

DEV Community

Read-Only MCP Removes a Failure Class, But Only if the Whole Tool Boundary Is Actually Read-Only

Read-Only MCP Removes a Failure Class, But Only if the Whole Tool Boundary Is Actually Read-Only

1. Approval prompts are not the same thing as structural containment

2. What read-only actually buys you

3. Most read-only claims fail because the rest of the boundary is still write-capable

4. Discovery-time legibility matters as much as runtime enforcement

5. Deterministic governors are more useful than hopeful policy statements

6. Denied escalations are evidence, not just failures

7. How Rhumb should evaluate a read-only trust class

A. Authority-class separation

B. Discovery-time legibility

C. Deterministic enforcement

D. Side-door consistency

E. Typed denials and evidence

F. Escalation model

8. The practical recommendation

Top comments (0)