What happens to your agent when the team renames users to accounts, and why the tool name you picked six months ago decides whether anything breaks
TL;DR: I have shipped MCP servers where the tool names were a thin shell over the underlying REST API, and I have shipped servers where the names came from the domain model instead. The domain-named ones survived backend refactors with close to zero churn. The pass-through ones broke every time someone renamed a table or split a service. After ranking six naming patterns by two axes (how well a name survives a refactor of the system underneath it, and how cleanly the model can pick the right tool), my house pick is ubiquitous-language naming inside a bounded-context prefix, with Pydantic discriminated-union return types doing the schema work. The one-line version of my opinion: Domain-Driven Design plus tool-use schemas is the production fix for agents. The MCP layer is where the anti-corruption-layer belongs, not an afterthought you bolt on later.
A quick scoping note. I am going to use a refund as a naming example throughout, because everyone has a mental model of what a refund is. I am not describing a system that lets a model issue refunds on its own. The refund here is a stand-in for any operation whose name you have to choose. Treat it as a label, not as a production design I am endorsing.
The two axes
Refactor-survival is whether the name stays correct when the system underneath it changes. A name with high refactor-survival describes something stable (an intent in the business domain) rather than something volatile (the current shape of the database or the current REST route).
LLM-selection clarity is whether the model reliably picks the right tool from the name and description alone. Names that are too clever, too abstract, or too collision-prone make the model hesitate or pick wrong. This axis sometimes rewards the literal names that the first axis punishes, which is the whole tension.
| # | Pattern | Example | Refactor-survival | LLM-selection clarity |
|---|---|---|---|---|
| 1 | Pass-through (mirrors the API) | create_user, delete_user | Low | High |
| 2 | Verb-first | process_refund | Medium | High |
| 3 | Noun-first / resource-dot-action | refund.create | Medium | Medium |
| 4 | Domain-prefix + bounded context | billing.refund.process | High | Medium-High |
| 5 | Ubiquitous-language naming | deactivate_account (not delete) | High | High |
| 6 | Schema-first (Pydantic discriminated union) | name plus typed return | High | High (with caveats) |
The ranking I defend, best to worst on the combined axes: 5, then 6, then 4, then 2, then 3, then 1. Patterns 5 and 6 are not rivals. You use them together.
1. Pass-through naming
The backend has POST /users, PATCH /users/{id}, DELETE /users/{id}, so the tools become create_user, update_user, delete_user. Fast to write, the model reads it fine. Failure mode: the name is welded to the current API surface. The day someone decides a user is really an account, the underlying route changes and your tool name is now a lie. If you rename the tool, every prompt and eval fixture and agent that learned create_user has to be updated. If you do not, new engineers read create_user and go looking for a users table that no longer exists. High on clarity today, low on survival tomorrow.
2. Verb-first
process_refund, cancel_subscription, send_invoice. The action leads, which reads well because tool selection is fundamentally a verb-matching problem. A good verb often describes intent rather than transport, so a backend refactor can leave the name intact. Failure mode: verbs collide and drift as the surface grows. You add process_payment, process_payout, process_chargeback, and now process carries four meanings. process is a weak verb, so people reach for it whenever they cannot think of the precise word. Medium survival, high clarity that decays as the tool count climbs.
3. Noun-first, resource-dot-action
refund.create, subscription.cancel. Groups nicely for a human browsing the list. Failure mode: it optimizes for the resource taxonomy at the cost of the thing the model is best at, verb matching. With the action second, the model reads refund first and has to hold it before the verb that tells it what to do. The grouping also tempts you into CRUD-over-resources when the domain operation you want is richer than create/update/delete. The pattern I reach for least.
4. Domain-prefix with a bounded context
billing.refund.process, identity.account.deactivate. The first segment names the bounded context (the term is from Domain-Driven Design): the part of the business this tool lives in. Namespacing that means something: a billing.transfer and an inventory.transfer stay apart for both model and reader. And a bounded context is a deliberately stable concept, so the prefix survives refactors that wreck pass-through names. Failure mode: the prefix is only as good as your context boundaries, and most teams have not drawn them. If billing and payments and invoicing are three overlapping prefixes nobody can distinguish, you have added ceremony without clarity. The other failure is verbosity. Used with real boundaries it is excellent. Used as decoration it is worse than a flat verb-first name.
5. Ubiquitous-language naming
The one I will defend hardest. Name the tool after the word the domain experts actually use, not the database verb. Do not call it delete_account. In almost every real billing or identity domain you do not delete an account, you deactivate it or close it, and the row sticks around for compliance and audit. So the tool is deactivate_account, and that name encodes a true fact about the domain that delete hides. Two good things happen: the model gets a precise verb (deactivate is far less collision-prone than delete or process), and the name stays correct across refactors because the business meaning does not change when you swap the database. Failure mode: it requires that a ubiquitous language actually exists, and on a lot of teams it does not, or there are three competing dialects. And there is a discipline cost: someone has to resist the easy delete and insist on the accurate deactivate in code review, every time.
6. Schema-first with Pydantic discriminated unions
The first five patterns are about the name. This one is about the return type, and it is where the leverage actually is. A vague return schema (a bare dict, a stringified blob) undoes all the discipline you put into the name. A discriminated union lets one tool return several clearly-distinguished outcomes, each with its own typed shape, tagged by a literal field the model can branch on.
from typing import Literal, Annotated, Union
from decimal import Decimal
from pydantic import BaseModel, Field
class RefundIssued(BaseModel):
status: Literal["issued"] = "issued"
refund_id: str
amount: Decimal
currency: str = Field(min_length=3, max_length=3)
class RefundPending(BaseModel):
status: Literal["pending_review"] = "pending_review"
request_id: str
reason: str
class RefundRejected(BaseModel):
status: Literal["rejected"] = "rejected"
code: Literal["already_refunded", "outside_window", "amount_exceeds_charge"]
message: str
RefundOutcome = Annotated[
Union[RefundIssued, RefundPending, RefundRejected],
Field(discriminator="status"),
]
class RefundResult(BaseModel):
outcome: RefundOutcome
@mcp.tool()
def request_account_refund(charge_id: str, amount: Decimal) -> RefundResult:
"""Request a refund against a charge. Returns one of three outcomes:
issued, pending_review, or rejected (with a reason code)."""
...
The model reads three named outcomes with three shapes, branches on status, and never parses prose to find out what happened. Failure mode: discriminated unions are easy to over-build. Nine variants where three would do is a decision tree the model did not need. The other trap is letting the union drift out of sync with reality, at which point the model gets a typed promise the system cannot keep, which is worse than an honest untyped blob.
House pick
My default, for any MCP server fronting a system I expect to change, is pattern 5 inside a pattern 4 prefix, with pattern 6 on the return side. Concretely: billing.deactivate_account, returning a typed discriminated union. The reason I keep coming back to it is the anti-corruption layer. In DDD an anti-corruption-layer is the translation seam between your clean domain model and a messier external system. An MCP server sits in exactly that position between a model and your backend. If you name tools after the backend, you have no anti-corruption-layer, you have a passthrough, and every backend change corrupts the model's view of your system. That is the whole argument for why Domain-Driven Design plus tool-use schemas is the production fix for agents.
One number, modest and from one project: when we moved a cluster of pass-through tool names over to ubiquitous-language names on a server with roughly thirty tools, our internal tool-selection eval went from about 88 percent to about 94 percent correct-tool-on-first-try. I would not over-read a single before/after on one codebase. The renames that helped most were the ones that killed verb collisions (three different process_* tools) and the ones that replaced delete with the accurate domain verb.
FAQ
Is dotted naming like billing.refund.process even valid for an MCP tool name? Depends on the server framework and client. Some accept dots, others constrain names and you simulate the hierarchy with underscores. Check what your server and client allow. The principle (a stable context prefix) survives whichever character you use.
Will a discriminated union confuse the model more than a flat dict? In my experience it does the opposite, as long as the variants are genuinely distinct and few. The confusion comes from too many variants, not from the union itself.
How is ubiquitous-language naming different from just picking a good verb? A good verb is a writing instinct. Ubiquitous language is a sourcing rule: the verb has to be the one the domain experts actually use, verified against how the business talks, not invented at your desk.
Do I need DDD to get value here? No. You can adopt ubiquitous-language naming and typed returns without ever drawing a context map. The bounded-context prefix specifically only pays off if you have real boundaries.
Open questions I am still chewing on
When a discriminated-union return genuinely has to gain a fourth variant, what is the least disruptive way to roll it out to agents that already learned the three-variant shape? If two bounded contexts legitimately share a verb and a noun, is the prefix enough, or does the duplication signal the boundary is drawn wrong? And the one I go back and forth on: how much naming discipline is worth it before the tool count is high enough to matter. On a five-tool server, pass-through names are fine and contexts and unions are overhead. Somewhere between five and fifty the calculus flips, and I do not have a clean threshold.
Top comments (0)