DEV Community

James O'Connor
James O'Connor

Posted on

We renamed two MCP tools and our agent's tool-call accuracy went from 71% to 94%

Three months ago our customer-service agent confidently issued a $2,400 accounting reversal that should have been a $240 partial refund. The customer had asked for "a refund on the broken item." The agent had two tools available: refund and cancel. It picked cancel. The cancel tool, in our system, performed a full transaction reversal in the accounting ledger.

The agent was technically correct. "Cancel" can mean "undo," which can mean "reverse." The customer was furious. The CFO was annoyed.

For three weeks I tried to fix this with prompt engineering. None of it stuck. Tool-call accuracy on our held-out trace set was 71%.

The fix turned out to be renaming the tools.

The diagnosis

I had been treating MCP tools the way I treated API endpoints. Pick a verb, pick a noun, name it. Clean RESTful naming.

That naming convention is wrong for tool-using agents. For an LLM, the bounded contexts do not exist. The LLM sees one global tool list and picks based on semantic similarity to the user's intent.

Eric Evans wrote about this in 2003 in Domain-Driven Design. He called it Ubiquitous Language. Russell Miles applied it to agents in "Domain Driven Agent Design" earlier this year. Dennis Traub wrote about it on Dev.to with the framing "your agent keeps using that word."

The rule: name tools by their bounded context, not by their operation.

The rename

Before:

@mcp_tool
def cancel(order_id: str) -> dict:
    """Cancel an order."""
    ...

@mcp_tool
def refund(order_id: str, amount: Decimal) -> dict:
    """Refund a customer's payment."""
    ...
Enter fullscreen mode Exit fullscreen mode

After:

@mcp_tool
def customer_support_cancel_order(order_id: str) -> CustomerOrderCancellation:
    """Cancel a customer's order before it ships.
    Stops fulfillment. Does NOT issue a refund. Does NOT touch accounting."""
    ...

@mcp_tool
def customer_support_refund_partial(
    order_id: str,
    amount: Decimal,
    reason: RefundReason,
) -> CustomerRefund:
    """Issue a partial refund on a shipped order.
    For full refunds use customer_support.refund_full."""
    ...

@mcp_tool
def accounting_reverse_transaction(
    transaction_id: str,
    reason: ReversalReason,
) -> AccountingReversal:
    """Reverse a posted accounting transaction. Full reversal only.
    NOT for customer-initiated refunds. Use customer_support.refund_partial."""
    ...
Enter fullscreen mode Exit fullscreen mode

Three changes:

  1. Tool names carry the bounded context as a prefix.
  2. Descriptions explicitly cross-reference sibling tools (each tool says what it is NOT).
  3. Return types are named with the context. Pydantic models carry the same vocabulary.

How we measured

500-example held-out test set. Before: 71% accuracy. After: 94%.

Wiring this into FastAPI + MCP

from mcp.server.fastmcp import FastMCP
from fastapi import FastAPI
from pydantic import BaseModel
from decimal import Decimal

mcp = FastMCP("customer-support-server")

class CustomerRefund(BaseModel):
    refund_id: str
    order_id: str
    amount: Decimal
    reason: RefundReason
    audit_log_id: str

@mcp.tool()
def customer_support_refund_partial(
    order_id: str,
    amount: Decimal,
    reason: RefundReason,
) -> CustomerRefund:
    """Issue a partial refund on a shipped order.
    For full refunds use customer_support.refund_full."""
    return CustomerRefund(
        refund_id=create_refund_id(),
        order_id=order_id,
        amount=amount,
        reason=reason,
        audit_log_id=create_audit_log(order_id, amount, reason),
    )

app = FastAPI()
app.mount("/mcp", mcp.streamable_http_app())
Enter fullscreen mode Exit fullscreen mode

The Pydantic return type gives the LLM another disambiguation signal. We saw a roughly 5-point accuracy bump just from naming the dict return types.

The anti-corruption layer

class CustomerRefundToAccountingTransaction:
    """Maps a customer-initiated refund into the accounting domain."""

    def translate(self, refund: CustomerRefund) -> AccountingEntry:
        return AccountingEntry(
            account=refund.payment_method.linked_account,
            amount=-refund.amount,
            reason=AccountingReason.CUSTOMER_REFUND,
            ref_id=refund.refund_id,
            audit_log_id=create_audit_log(refund),
        )
Enter fullscreen mode Exit fullscreen mode

The mapper is a hard boundary. The LLM cannot bypass it because the LLM only sees the customer-context tool surface; the accounting context is invoked deterministically.

Where this lands

MCP tools are not API methods. They are vocabulary items in a shared language. Name them by bounded context, not by operation. Your MCP server is a database schema problem, not an API problem.

Where I would push back on this

This whole experience made me reconsider how I think about tool layers. They are not APIs. They are vocabulary. I have stopped using the word "tool" internally and started saying "verb in our agent's language" because it surfaces the design question better.

The bounded-context naming convention adds verbosity. Engineers will push back. The pushback I would accept: "we do not have multiple bounded contexts to disambiguate." That is true at small scale.

The pushback I would not accept: "descriptions are enough." We tried that for three weeks. They are not. The agent's behavior is shaped by the names you pick, not the descriptions you write.

If you have shipped multi-context agents without bounded-context naming and the tool-call accuracy held up above 90%, I want to see the architecture. My prior is strong but not absolute.

Top comments (0)