<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: James O'Connor</title>
    <description>The latest articles on DEV Community by James O'Connor (@james_oconnor_dev).</description>
    <link>https://dev.to/james_oconnor_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940356%2F8b7ecfc2-82da-41e4-b887-959272780323.png</url>
      <title>DEV Community: James O'Connor</title>
      <link>https://dev.to/james_oconnor_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/james_oconnor_dev"/>
    <language>en</language>
    <item>
      <title>Bounded retries for agent tool calls: the budget that stopped our infinite-loop incidents</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Mon, 15 Jun 2026 15:11:50 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/bounded-retries-for-agent-tool-calls-the-budget-that-stopped-our-infinite-loop-incidents-4354</link>
      <guid>https://dev.to/james_oconnor_dev/bounded-retries-for-agent-tool-calls-the-budget-that-stopped-our-infinite-loop-incidents-4354</guid>
      <description>&lt;h2&gt;
  
  
  The worst incident our agent caused was not a wrong answer. It was a loop.
&lt;/h2&gt;

&lt;p&gt;The worst incident our agent ever caused was not a wrong answer. It was a loop. A tool call failed, the agent retried, the retry failed the same way, and it kept going, burning tokens and hammering a downstream API a few hundred times in a minute before anything stopped it. The agent was doing exactly what we had told it: if a tool fails, try again. We just never told it when to stop.&lt;/p&gt;

&lt;p&gt;Retries are the right instinct. A transient failure should be retried. The problem is that an agent does not reliably distinguish "transient" from "this will fail every time," and left to its own judgment it will cheerfully retry a permanently broken call until something external kills it. Human-written code learned this lesson decades ago: every retry loop has a bound. Agent tool-calling quietly forgot it, because the retry decision moved from a for-loop you can see into the model's reasoning, where it is invisible and unbounded.&lt;/p&gt;

&lt;p&gt;So we put the bound back, outside the model. Two budgets: a per-call retry cap (this specific tool call gets N attempts, then it is a hard failure the agent must handle differently), and a per-session attempt budget (the whole task gets a ceiling on total tool calls, after which we stop and escalate rather than let it spin).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolBudget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;per_call&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;per_session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;per_call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;per_session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;per_call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;per_session&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_calls&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;per_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ToolGivingUp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; attempts, stop retrying, try another path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_calls&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;per_session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;SessionGivingUp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool budget exhausted, escalate to a human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is not the cap, it is what happens at the cap. A retry that just stops leaves the agent stuck. So when the per-call budget is exhausted we hand the agent a specific message ("this call has failed twice, do not retry it, try a different approach or ask for help"), which turns a dead loop into a decision. Most of the time the agent then does something sensible, because now it knows retrying is off the table.&lt;/p&gt;

&lt;p&gt;Tool-misuse loops in our logs went from a handful of nasty incidents a month to basically none, and the few that remain hit the session budget and escalate cleanly instead of paging someone at 2am about a runaway API bill.&lt;/p&gt;

&lt;p&gt;The tension I have not fully resolved: a per-session budget that is too tight kills legitimately long tasks (a genuine multi-step workflow can need a lot of tool calls), and one that is too loose lets a slow loop run up real cost before it trips. We set ours from the 95th percentile of healthy task lengths and pad it, which is empirical and a little arbitrary. If you have found a non-arbitrary way to bound agent tool-call budgets, that is the comment I am reading.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>We version our tool schemas like an API contract, because the agent is a consumer</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Mon, 15 Jun 2026 05:40:22 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/we-version-our-tool-schemas-like-an-api-contract-because-the-agent-is-a-consumer-43fc</link>
      <guid>https://dev.to/james_oconnor_dev/we-version-our-tool-schemas-like-an-api-contract-because-the-agent-is-a-consumer-43fc</guid>
      <description>&lt;p&gt;TL;DR: We changed a tool's return schema, shipped it, and watched about 1 in 5 of that tool's calls start failing downstream, even though every call still validated. The schema was internal, so nobody treated the change like a breaking API change. But the agent is a consumer of that schema exactly like an external client is, and we had just shipped a breaking change to a consumer with no version, no deprecation, no migration. Now we version tool schemas like the public contracts they actually are.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent is a client you forgot you had
&lt;/h2&gt;

&lt;p&gt;When a human team consumes your API, you version it, you deprecate fields with notice, you do not rename a field on a Tuesday. When the consumer is your own agent, all of that discipline evaporates, because the schema lives in your repo and feels internal. It is not internal. The agent was trained, prompted, or wired against the old shape, and renaming &lt;code&gt;order_id&lt;/code&gt; to &lt;code&gt;orderId&lt;/code&gt; is as breaking for it as for any client, just quieter, because it fails as worse decisions rather than a 400.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a breaking change looks like to an agent
&lt;/h2&gt;

&lt;p&gt;This is the part that makes it sneaky. A human client gets a hard error on a missing field. An agent gets a &lt;code&gt;None&lt;/code&gt;, treats it as "the value is absent," and makes a plausible wrong decision with a perfectly valid-looking tool call. Our validation passed every time. The damage showed up two hops later as the agent choosing the wrong branch because a field it depended on had silently moved.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we version them now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolSchemaV2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;          &lt;span class="c1"&gt;# kept; do NOT rename without a major bump
&lt;/span&gt;    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="c1"&gt;# added in v2, optional so v1 consumers still parse
&lt;/span&gt;    &lt;span class="n"&gt;refund_window_closes_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rules we adopted, lifted straight from API practice: additive changes are a minor bump and optional; renames and removals are a major bump with both shapes served during a deprecation window; the agent's expected schema version is pinned and checked, so a mismatch is a loud failure at the boundary instead of a quiet wrong decision two hops later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The open question
&lt;/h2&gt;

&lt;p&gt;Deprecation windows assume you can run two shapes at once, which is fine for fields but hard for semantics: if the &lt;em&gt;meaning&lt;/em&gt; of a field changes rather than its name, there is no optional-field trick that saves you. I do not have a clean way to version a semantic change to a tool's contract, only structural ones. If you have versioned a meaning change to an agent's tool cleanly, that is the comment I want to read.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>api</category>
    </item>
    <item>
      <title>We logged every rejected tool call for a month. A third were our validation being wrong, not the model.</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Mon, 15 Jun 2026 05:33:09 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/we-logged-every-rejected-tool-call-for-a-month-a-third-were-our-validation-being-wrong-not-the-3nm1</link>
      <guid>https://dev.to/james_oconnor_dev/we-logged-every-rejected-tool-call-for-a-month-a-third-were-our-validation-being-wrong-not-the-3nm1</guid>
      <description>&lt;p&gt;TL;DR: Everyone logs tool calls that error or return junk. We started logging the calls our own validation REJECTED before they ever ran. Over a month, about 1 in 3 of those rejections were false: a valid user intent our schema or precheck was too rigid to accept. We had spent weeks hardening the guardrail and never checked whether it was now blocking real work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The blind spot in "we added validation"
&lt;/h2&gt;

&lt;p&gt;After an incident where our agent made a structurally valid but wrong tool call, we added a precheck layer in front of every state-mutating tool. Failures dropped, we moved on. What we did not log was the other side of the ledger: every time the precheck said no. A block felt like a success by definition. The agent tried something bad, we stopped it, good.&lt;/p&gt;

&lt;p&gt;Then support started forwarding tickets where the agent refused something the user was clearly allowed to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the rejection log showed
&lt;/h2&gt;

&lt;p&gt;So we logged every rejection with three fields: which check fired, the full arguments, and the user-visible outcome. One month, 612 rejections. We hand-reviewed a sample.&lt;/p&gt;

&lt;p&gt;Roughly a third were false rejections. The pattern was almost always the same: a check written to stop one specific bad case was also catching a legitimate neighbouring case nobody thought about when they wrote it. The "is this order in the cancellation window" check rejected legitimate cancellations on orders whose timezone put them one hour outside a window they were actually inside. The "does this id exist in retrieved context" check rejected valid ids that arrived through a second tool the author had not considered.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;failures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_precheck&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rejected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;failures&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasons&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;failures&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;redact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blocked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;failures&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;failures&lt;/span&gt;
&lt;span class="c1"&gt;# the 'rejected' branch is the one nobody reads. read it weekly.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What we changed
&lt;/h2&gt;

&lt;p&gt;Two things. First, a weekly fifteen-minute review of a sample of rejections, same as we review errors. False rejections get the check loosened or split. Second, checks now fail with a specific reason string the agent can act on, not a generic block, so a too-strict check often self-corrects: the agent reads "outside cancellation window by your local timezone" and escalates instead of dead-ending.&lt;/p&gt;

&lt;p&gt;False rejections fell from a third to under a tenth over six weeks. The number that matters more: support tickets about the agent refusing valid requests basically stopped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tension I have not resolved
&lt;/h2&gt;

&lt;p&gt;Every loosened check is a check that now lets more through, which is the exact surface the check was added to close. We have not found a principled way to loosen a guardrail without quietly reopening the hole. Right now we lean on the canonical-examples test (the bad case that prompted the check stays frozen as a must-block), but that only protects against the failures we have already seen.&lt;/p&gt;

&lt;p&gt;If you run guardrails on an agent: do you measure your false-rejection rate at all, and if so, how do you loosen a check without trusting that a frozen example covers the regression? That is the part I keep getting wrong.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your schema validation passes and the agent still picks the wrong tool. The bug is semantic.</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Wed, 10 Jun 2026 04:05:27 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/your-schema-validation-passes-and-the-agent-still-picks-the-wrong-tool-the-bug-is-semantic-2i41</link>
      <guid>https://dev.to/james_oconnor_dev/your-schema-validation-passes-and-the-agent-still-picks-the-wrong-tool-the-bug-is-semantic-2i41</guid>
      <description>&lt;p&gt;Pydantic and JSON-schema guarantee the shape of a tool call. They say nothing about whether it was the right call for the user's intent.&lt;/p&gt;

&lt;p&gt;TL;DR: We put strict Pydantic validation on every tool call our agent makes, expecting tool-call failures to drop. They barely did. When I categorized 40 logged failures, 31 of them passed schema validation cleanly. They were well-formed calls to the wrong tool, or the right tool with arguments that were valid types but wrong values. Schema validation catches structural errors. Our actual problem was semantic, and the validator is blind to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What schema validation actually guarantees
&lt;/h2&gt;

&lt;p&gt;Pydantic checks types, required fields, enums, ranges. A call like cancel_order(order_id="A123") is structurally perfect even when the user asked to cancel a subscription, not an order. The validator passes it. The user is still angry. Shape is not intent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 40-failure breakdown
&lt;/h2&gt;

&lt;p&gt;Of 40 tool-call failures we logged over a few weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;9 were real schema violations the validator caught (working as intended).&lt;/li&gt;
&lt;li&gt;18 were the wrong tool for the intent, all schema-valid.&lt;/li&gt;
&lt;li&gt;13 were the right tool with a semantically wrong argument (valid type, wrong value).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So 31 of 40 sailed through validation. The thing we added to fix tool-call failures addressed less than a quarter of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  A cheap semantic precheck that helped
&lt;/h2&gt;

&lt;p&gt;After structural validation passes, run a deterministic check that the call's preconditions match the resolved state. No model required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;precheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# structural validation already passed; now check semantics vs resolved state
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel_order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CANCELABLE&lt;/span&gt;
    &lt;span class="c1"&gt;# ... one branch per destructive tool
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This killed the 13 wrong-argument cases almost entirely: the id was valid as a string but did not resolve to a cancelable order owned by this user.&lt;/p&gt;

&lt;h2&gt;
  
  
  The case this does not solve
&lt;/h2&gt;

&lt;p&gt;The wrong-tool-for-intent bucket (the 18) is harder. Detecting that the agent chose cancel_order when the user meant cancel_subscription is itself an intent-understanding problem, and using another model to judge it just inherits the same blind spot. We stopped trying to verify intent automatically for destructive tools and put a one-line confirmation step in front of them instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open question
&lt;/h2&gt;

&lt;p&gt;How do you test that an agent picked the right tool, not just a well-formed one, without leaning on an LLM judge that shares the failure mode? The precheck handles arguments. Tool selection itself I still gate behind a human-style confirmation, which feels like admitting defeat.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Doesn't an LLM judge catch the wrong-tool case?&lt;/strong&gt; Sometimes, but it misreads intent the same way the agent did, so we do not trust it on the destructive path.&lt;br&gt;
&lt;strong&gt;Which model?&lt;/strong&gt; Genericize: the agent and any judge should be from different model families, but the precheck above is model-agnostic on purpose.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Validate your Pydantic schema before the LLM call, not after.</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Tue, 09 Jun 2026 08:12:31 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/validate-your-pydantic-schema-before-the-llm-call-not-after-5c5c</link>
      <guid>https://dev.to/james_oconnor_dev/validate-your-pydantic-schema-before-the-llm-call-not-after-5c5c</guid>
      <description>&lt;p&gt;A small change that cut our schema-related retries: validate the Pydantic model before sending the request, not after the LLM responds. The usual flow is call, parse, catch the validation error, retry. That burns a full token budget before you learn the schema was wrong. Instead we instantiate the target model with dummy data at boot and on every schema change, and dry-parse a known-good example before the real call. If the schema itself is broken (a bad discriminator, a wrong field type, a renamed enum) it fails in CI or at boot, not on a paid call. Two lines, zero runtime cost, and it caught about 60 percent of our schema bugs before they reached the model. The other 40 percent are genuine model failures, and those are the ones worth retrying. Separating 'my schema is wrong' from 'the model is wrong' is the whole point. Most retry loops conflate them and pay for it in tokens.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>MCP tool naming: 6 patterns ranked by how well they survive a refactor</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Fri, 05 Jun 2026 05:29:55 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/mcp-tool-naming-6-patterns-ranked-by-how-well-they-survive-a-refactor-2gf3</link>
      <guid>https://dev.to/james_oconnor_dev/mcp-tool-naming-6-patterns-ranked-by-how-well-they-survive-a-refactor-2gf3</guid>
      <description>&lt;p&gt;What happens to your agent when the team renames users to accounts, and why the tool name you picked six months ago decides whether anything breaks&lt;/p&gt;

&lt;p&gt;TL;DR: I have shipped MCP servers where the tool names were a thin shell over the underlying REST API, and I have shipped servers where the names came from the domain model instead. The domain-named ones survived backend refactors with close to zero churn. The pass-through ones broke every time someone renamed a table or split a service. After ranking six naming patterns by two axes (how well a name survives a refactor of the system underneath it, and how cleanly the model can pick the right tool), my house pick is ubiquitous-language naming inside a bounded-context prefix, with Pydantic discriminated-union return types doing the schema work. The one-line version of my opinion: Domain-Driven Design plus tool-use schemas is the production fix for agents. The MCP layer is where the anti-corruption-layer belongs, not an afterthought you bolt on later.&lt;/p&gt;

&lt;p&gt;A quick scoping note. I am going to use a refund as a naming example throughout, because everyone has a mental model of what a refund is. I am not describing a system that lets a model issue refunds on its own. The refund here is a stand-in for any operation whose name you have to choose. Treat it as a label, not as a production design I am endorsing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two axes
&lt;/h2&gt;

&lt;p&gt;Refactor-survival is whether the name stays correct when the system underneath it changes. A name with high refactor-survival describes something stable (an intent in the business domain) rather than something volatile (the current shape of the database or the current REST route).&lt;/p&gt;

&lt;p&gt;LLM-selection clarity is whether the model reliably picks the right tool from the name and description alone. Names that are too clever, too abstract, or too collision-prone make the model hesitate or pick wrong. This axis sometimes rewards the literal names that the first axis punishes, which is the whole tension.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Refactor-survival&lt;/th&gt;
&lt;th&gt;LLM-selection clarity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Pass-through (mirrors the API)&lt;/td&gt;
&lt;td&gt;create_user, delete_user&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Verb-first&lt;/td&gt;
&lt;td&gt;process_refund&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Noun-first / resource-dot-action&lt;/td&gt;
&lt;td&gt;refund.create&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Domain-prefix + bounded context&lt;/td&gt;
&lt;td&gt;billing.refund.process&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Medium-High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Ubiquitous-language naming&lt;/td&gt;
&lt;td&gt;deactivate_account (not delete)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Schema-first (Pydantic discriminated union)&lt;/td&gt;
&lt;td&gt;name plus typed return&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High (with caveats)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ranking I defend, best to worst on the combined axes: 5, then 6, then 4, then 2, then 3, then 1. Patterns 5 and 6 are not rivals. You use them together.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Pass-through naming
&lt;/h2&gt;

&lt;p&gt;The backend has POST /users, PATCH /users/{id}, DELETE /users/{id}, so the tools become create_user, update_user, delete_user. Fast to write, the model reads it fine. Failure mode: the name is welded to the current API surface. The day someone decides a user is really an account, the underlying route changes and your tool name is now a lie. If you rename the tool, every prompt and eval fixture and agent that learned create_user has to be updated. If you do not, new engineers read create_user and go looking for a users table that no longer exists. High on clarity today, low on survival tomorrow.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Verb-first
&lt;/h2&gt;

&lt;p&gt;process_refund, cancel_subscription, send_invoice. The action leads, which reads well because tool selection is fundamentally a verb-matching problem. A good verb often describes intent rather than transport, so a backend refactor can leave the name intact. Failure mode: verbs collide and drift as the surface grows. You add process_payment, process_payout, process_chargeback, and now process carries four meanings. process is a weak verb, so people reach for it whenever they cannot think of the precise word. Medium survival, high clarity that decays as the tool count climbs.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Noun-first, resource-dot-action
&lt;/h2&gt;

&lt;p&gt;refund.create, subscription.cancel. Groups nicely for a human browsing the list. Failure mode: it optimizes for the resource taxonomy at the cost of the thing the model is best at, verb matching. With the action second, the model reads refund first and has to hold it before the verb that tells it what to do. The grouping also tempts you into CRUD-over-resources when the domain operation you want is richer than create/update/delete. The pattern I reach for least.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Domain-prefix with a bounded context
&lt;/h2&gt;

&lt;p&gt;billing.refund.process, identity.account.deactivate. The first segment names the bounded context (the term is from Domain-Driven Design): the part of the business this tool lives in. Namespacing that means something: a billing.transfer and an inventory.transfer stay apart for both model and reader. And a bounded context is a deliberately stable concept, so the prefix survives refactors that wreck pass-through names. Failure mode: the prefix is only as good as your context boundaries, and most teams have not drawn them. If billing and payments and invoicing are three overlapping prefixes nobody can distinguish, you have added ceremony without clarity. The other failure is verbosity. Used with real boundaries it is excellent. Used as decoration it is worse than a flat verb-first name.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Ubiquitous-language naming
&lt;/h2&gt;

&lt;p&gt;The one I will defend hardest. Name the tool after the word the domain experts actually use, not the database verb. Do not call it delete_account. In almost every real billing or identity domain you do not delete an account, you deactivate it or close it, and the row sticks around for compliance and audit. So the tool is deactivate_account, and that name encodes a true fact about the domain that delete hides. Two good things happen: the model gets a precise verb (deactivate is far less collision-prone than delete or process), and the name stays correct across refactors because the business meaning does not change when you swap the database. Failure mode: it requires that a ubiquitous language actually exists, and on a lot of teams it does not, or there are three competing dialects. And there is a discipline cost: someone has to resist the easy delete and insist on the accurate deactivate in code review, every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Schema-first with Pydantic discriminated unions
&lt;/h2&gt;

&lt;p&gt;The first five patterns are about the name. This one is about the return type, and it is where the leverage actually is. A vague return schema (a bare dict, a stringified blob) undoes all the discipline you put into the name. A discriminated union lets one tool return several clearly-distinguished outcomes, each with its own typed shape, tagged by a literal field the model can branch on.&lt;/p&gt;

&lt;p&gt;from typing import Literal, Annotated, Union&lt;br&gt;
from decimal import Decimal&lt;br&gt;
from pydantic import BaseModel, Field&lt;/p&gt;

&lt;p&gt;class RefundIssued(BaseModel):&lt;br&gt;
    status: Literal["issued"] = "issued"&lt;br&gt;
    refund_id: str&lt;br&gt;
    amount: Decimal&lt;br&gt;
    currency: str = Field(min_length=3, max_length=3)&lt;/p&gt;

&lt;p&gt;class RefundPending(BaseModel):&lt;br&gt;
    status: Literal["pending_review"] = "pending_review"&lt;br&gt;
    request_id: str&lt;br&gt;
    reason: str&lt;/p&gt;

&lt;p&gt;class RefundRejected(BaseModel):&lt;br&gt;
    status: Literal["rejected"] = "rejected"&lt;br&gt;
    code: Literal["already_refunded", "outside_window", "amount_exceeds_charge"]&lt;br&gt;
    message: str&lt;/p&gt;

&lt;p&gt;RefundOutcome = Annotated[&lt;br&gt;
    Union[RefundIssued, RefundPending, RefundRejected],&lt;br&gt;
    Field(discriminator="status"),&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;class RefundResult(BaseModel):&lt;br&gt;
    outcome: RefundOutcome&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;a class="mentioned-user" href="https://dev.to/mcp"&gt;@mcp&lt;/a&gt;.tool()
&lt;/h1&gt;

&lt;p&gt;def request_account_refund(charge_id: str, amount: Decimal) -&amp;gt; RefundResult:&lt;br&gt;
    """Request a refund against a charge. Returns one of three outcomes:&lt;br&gt;
    issued, pending_review, or rejected (with a reason code)."""&lt;br&gt;
    ...&lt;/p&gt;

&lt;p&gt;The model reads three named outcomes with three shapes, branches on status, and never parses prose to find out what happened. Failure mode: discriminated unions are easy to over-build. Nine variants where three would do is a decision tree the model did not need. The other trap is letting the union drift out of sync with reality, at which point the model gets a typed promise the system cannot keep, which is worse than an honest untyped blob.&lt;/p&gt;

&lt;h2&gt;
  
  
  House pick
&lt;/h2&gt;

&lt;p&gt;My default, for any MCP server fronting a system I expect to change, is pattern 5 inside a pattern 4 prefix, with pattern 6 on the return side. Concretely: billing.deactivate_account, returning a typed discriminated union. The reason I keep coming back to it is the anti-corruption layer. In DDD an anti-corruption-layer is the translation seam between your clean domain model and a messier external system. An MCP server sits in exactly that position between a model and your backend. If you name tools after the backend, you have no anti-corruption-layer, you have a passthrough, and every backend change corrupts the model's view of your system. That is the whole argument for why Domain-Driven Design plus tool-use schemas is the production fix for agents.&lt;/p&gt;

&lt;p&gt;One number, modest and from one project: when we moved a cluster of pass-through tool names over to ubiquitous-language names on a server with roughly thirty tools, our internal tool-selection eval went from about 88 percent to about 94 percent correct-tool-on-first-try. I would not over-read a single before/after on one codebase. The renames that helped most were the ones that killed verb collisions (three different process_* tools) and the ones that replaced delete with the accurate domain verb.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;Is dotted naming like billing.refund.process even valid for an MCP tool name? Depends on the server framework and client. Some accept dots, others constrain names and you simulate the hierarchy with underscores. Check what your server and client allow. The principle (a stable context prefix) survives whichever character you use.&lt;/p&gt;

&lt;p&gt;Will a discriminated union confuse the model more than a flat dict? In my experience it does the opposite, as long as the variants are genuinely distinct and few. The confusion comes from too many variants, not from the union itself.&lt;/p&gt;

&lt;p&gt;How is ubiquitous-language naming different from just picking a good verb? A good verb is a writing instinct. Ubiquitous language is a sourcing rule: the verb has to be the one the domain experts actually use, verified against how the business talks, not invented at your desk.&lt;/p&gt;

&lt;p&gt;Do I need DDD to get value here? No. You can adopt ubiquitous-language naming and typed returns without ever drawing a context map. The bounded-context prefix specifically only pays off if you have real boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open questions I am still chewing on
&lt;/h2&gt;

&lt;p&gt;When a discriminated-union return genuinely has to gain a fourth variant, what is the least disruptive way to roll it out to agents that already learned the three-variant shape? If two bounded contexts legitimately share a verb and a noun, is the prefix enough, or does the duplication signal the boundary is drawn wrong? And the one I go back and forth on: how much naming discipline is worth it before the tool count is high enough to matter. On a five-tool server, pass-through names are fine and contexts and unions are overhead. Somewhere between five and fifty the calculus flips, and I do not have a clean threshold.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>mcp</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Pydantic V2 discriminator pattern for MCP return types</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:55:09 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/pydantic-v2-discriminator-pattern-for-mcp-return-types-3pnl</link>
      <guid>https://dev.to/james_oconnor_dev/pydantic-v2-discriminator-pattern-for-mcp-return-types-3pnl</guid>
      <description>&lt;p&gt;When an MCP tool can return one of several shapes (success, partial-success, error), the cleanest typing is a Pydantic V2 discriminated union. The pattern survived our last 3 refactors of the tool-return surface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RefundSuccess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;refund_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;amount_refunded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RefundPartial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;refund_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;amount_refunded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;amount_remaining&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RefundError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;RefundResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;RefundSuccess&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;RefundPartial&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;RefundError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;discriminator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this beats the alternatives:&lt;/p&gt;

&lt;p&gt;The literal discriminator (&lt;code&gt;status&lt;/code&gt;) is the field MCP clients (and downstream LLMs) parse first. Pydantic builds the right model based on that field. No if-elif branching in the client.&lt;/p&gt;

&lt;p&gt;Subclassing a common parent works but loses the structural specificity. The LLM gets a parent type and has to guess the actual shape.&lt;/p&gt;

&lt;p&gt;Optional fields with None for the missing branches works but the type signature lies to readers. The discriminator approach makes the shape obvious.&lt;/p&gt;

&lt;p&gt;The gotcha: keep the discriminator field name consistent across the whole MCP server. Pydantic V2's &lt;code&gt;discriminator=&lt;/code&gt; is per-union; if one union uses &lt;code&gt;status&lt;/code&gt; and another uses &lt;code&gt;type&lt;/code&gt;, contributors will mix them up. Pick one and document it in the contributor guide.&lt;/p&gt;

</description>
      <category>python</category>
      <category>mcp</category>
      <category>pydantic</category>
      <category>ai</category>
    </item>
    <item>
      <title>We renamed two MCP tools and our agent's tool-call accuracy went from 71% to 94%</title>
      <dc:creator>James O'Connor</dc:creator>
      <pubDate>Tue, 26 May 2026 15:04:47 +0000</pubDate>
      <link>https://dev.to/james_oconnor_dev/we-renamed-two-mcp-tools-and-our-agents-tool-call-accuracy-went-from-71-to-94-3e47</link>
      <guid>https://dev.to/james_oconnor_dev/we-renamed-two-mcp-tools-and-our-agents-tool-call-accuracy-went-from-71-to-94-3e47</guid>
      <description>&lt;p&gt;Three months ago our customer-service agent confidently issued a $2,400 accounting reversal that should have been a $240 partial refund. The customer had asked for "a refund on the broken item." The agent had two tools available: refund and cancel. It picked cancel. The cancel tool, in our system, performed a full transaction reversal in the accounting ledger.&lt;/p&gt;

&lt;p&gt;The agent was technically correct. "Cancel" can mean "undo," which can mean "reverse." The customer was furious. The CFO was annoyed.&lt;/p&gt;

&lt;p&gt;For three weeks I tried to fix this with prompt engineering. None of it stuck. Tool-call accuracy on our held-out trace set was 71%.&lt;/p&gt;

&lt;p&gt;The fix turned out to be renaming the tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  The diagnosis
&lt;/h2&gt;

&lt;p&gt;I had been treating MCP tools the way I treated API endpoints. Pick a verb, pick a noun, name it. Clean RESTful naming.&lt;/p&gt;

&lt;p&gt;That naming convention is wrong for tool-using agents. For an LLM, the bounded contexts do not exist. The LLM sees one global tool list and picks based on semantic similarity to the user's intent.&lt;/p&gt;

&lt;p&gt;Eric Evans wrote about this in 2003 in Domain-Driven Design. He called it Ubiquitous Language. Russell Miles applied it to agents in "Domain Driven Agent Design" earlier this year. Dennis Traub wrote about it on Dev.to with the framing "your agent keeps using that word."&lt;/p&gt;

&lt;p&gt;The rule: name tools by their bounded context, not by their operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rename
&lt;/h2&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Cancel an order.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@mcp_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;refund&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Refund a customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s payment.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;customer_support_cancel_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CustomerOrderCancellation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Cancel a customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s order before it ships.
    Stops fulfillment. Does NOT issue a refund. Does NOT touch accounting.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@mcp_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;customer_support_refund_partial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RefundReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CustomerRefund&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Issue a partial refund on a shipped order.
    For full refunds use customer_support.refund_full.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@mcp_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;accounting_reverse_transaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ReversalReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AccountingReversal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Reverse a posted accounting transaction. Full reversal only.
    NOT for customer-initiated refunds. Use customer_support.refund_partial.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tool names carry the bounded context as a prefix.&lt;/li&gt;
&lt;li&gt;Descriptions explicitly cross-reference sibling tools (each tool says what it is NOT).&lt;/li&gt;
&lt;li&gt;Return types are named with the context. Pydantic models carry the same vocabulary.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How we measured
&lt;/h2&gt;

&lt;p&gt;500-example held-out test set. Before: 71% accuracy. After: 94%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring this into FastAPI + MCP
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;decimal&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-support-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomerRefund&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;refund_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RefundReason&lt;/span&gt;
    &lt;span class="n"&gt;audit_log_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;customer_support_refund_partial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RefundReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CustomerRefund&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Issue a partial refund on a shipped order.
    For full refunds use customer_support.refund_full.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CustomerRefund&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;refund_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;create_refund_id&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;audit_log_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;create_audit_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;streamable_http_app&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Pydantic return type gives the LLM another disambiguation signal. We saw a roughly 5-point accuracy bump just from naming the dict return types.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anti-corruption layer
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomerRefundToAccountingTransaction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Maps a customer-initiated refund into the accounting domain.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;translate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;refund&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CustomerRefund&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AccountingEntry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;AccountingEntry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;refund&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payment_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linked_account&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="n"&gt;refund&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AccountingReason&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CUSTOMER_REFUND&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ref_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;refund&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;refund_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;audit_log_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;create_audit_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;refund&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mapper is a hard boundary. The LLM cannot bypass it because the LLM only sees the customer-context tool surface; the accounting context is invoked deterministically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this lands
&lt;/h2&gt;

&lt;p&gt;MCP tools are not API methods. They are vocabulary items in a shared language. Name them by bounded context, not by operation. Your MCP server is a database schema problem, not an API problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I would push back on this
&lt;/h2&gt;

&lt;p&gt;This whole experience made me reconsider how I think about tool layers. They are not APIs. They are vocabulary. I have stopped using the word "tool" internally and started saying "verb in our agent's language" because it surfaces the design question better.&lt;/p&gt;

&lt;p&gt;The bounded-context naming convention adds verbosity. Engineers will push back. The pushback I would accept: "we do not have multiple bounded contexts to disambiguate." That is true at small scale.&lt;/p&gt;

&lt;p&gt;The pushback I would not accept: "descriptions are enough." We tried that for three weeks. They are not. The agent's behavior is shaped by the names you pick, not the descriptions you write.&lt;/p&gt;

&lt;p&gt;If you have shipped multi-context agents without bounded-context naming and the tool-call accuracy held up above 90%, I want to see the architecture. My prior is strong but not absolute.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>mcp</category>
      <category>ddd</category>
    </item>
  </channel>
</rss>
