<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Guy</title>
    <description>The latest articles on DEV Community by Guy (@guyernest).</description>
    <link>https://dev.to/guyernest</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F253942%2F5456be42-99ec-4805-bbf3-99d9ad61feba.jpeg</url>
      <title>DEV Community: Guy</title>
      <link>https://dev.to/guyernest</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/guyernest"/>
    <language>en</language>
    <item>
      <title>MCP Prompts and Resources: The Primitives You're Not Using</title>
      <dc:creator>Guy</dc:creator>
      <pubDate>Thu, 09 Apr 2026 17:36:08 +0000</pubDate>
      <link>https://dev.to/guyernest/mcp-prompts-and-resources-the-primitives-youre-not-using-3oo1</link>
      <guid>https://dev.to/guyernest/mcp-prompts-and-resources-the-primitives-youre-not-using-3oo1</guid>
      <description>&lt;p&gt;Your user asks for a weekly sales report. The LLM has four tools available: querying the database, aggregating data, calculating trends, and formatting the output. It chains them together. Steps 1 and 2 go fine. Step 3 goes wrong: the LLM tries to calculate week-over-week percentage changes itself, mixes up which week is the baseline, and produces a report showing 340% growth in a category that actually declined. The user gets a polished, confident, completely wrong report.&lt;/p&gt;

&lt;p&gt;This isn't a contrived scenario. It's the predictable outcome of asking an LLM to choreograph a multi-step workflow where some steps require symbolic computation. The LLM is good at language. It is bad at arithmetic. And when you give it tools for each individual step, you're asking it to be good at something else entirely: sequencing, data flow management, and knowing which steps it should delegate versus attempt itself.&lt;/p&gt;

&lt;p&gt;Now consider the alternative. The same user clicks a single prompt: "Weekly Sales Report." The server executes the deterministic steps: queries the database, aggregates by category, calculates trends server-side using exact arithmetic, and hands the LLM a precomputed dataset with one instruction: format this as an executive summary. The report is correct every time because the server handled the parts that require precision, and the LLM handled the parts that require language.&lt;/p&gt;

&lt;p&gt;If you read &lt;a href="https://dev.to/aws-heroes/mcp-tool-design-why-your-ai-agent-is-failing-and-how-to-fix-it-40fc"&gt;our article on tool design&lt;/a&gt;, you know how to build tools that LLMs can use well. But tools solve individual tasks. What about multi-step workflows where the steps must happen in a specific order, with data flowing between them, and some steps requiring computation that LLMs shouldn't be doing? That's where MCP's second primitive comes in: prompts.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;business analyst&lt;/strong&gt; — one of the two human corners of the Capability Square we introduced in that article — knows which workflows their business users run every week. The right operating model is domain-led, engineering-implemented, platform-governed: the analyst brings the workflow knowledge, engineers implement it, and the platform team governs how it runs in production. That is what lets the weekly sales report, the incident response runbook, or the customer onboarding checklist show up as a reliable one-click workflow for the person who actually runs it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is MCP? (The 30-Second Version)
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP, &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" rel="noopener noreferrer"&gt;spec 2025-11-25&lt;/a&gt;) defines three primitives for connecting AI models to external services: tools, prompts, and resources. The &lt;a href="https://dev.to/aws-heroes/mcp-tool-design-why-your-ai-agent-is-failing-and-how-to-fix-it-40fc"&gt;previous article&lt;/a&gt; covered tools, which are model-controlled primitives that let LLMs invoke server-side operations. This article covers the other two: prompts, which are user-controlled workflow packages, and resources, which provide application-controlled context. Together, the three primitives form a complete system for AI-service integration.&lt;/p&gt;

&lt;p&gt;The enterprise mental model is the same one from the previous article: MCP for AI is what HTTP-based applications are for humans. MCP servers are the AI-facing web servers or mobile applications for your organization's data systems, which is why they are usually remote services rather than local helpers. They should also be thin and mostly stateless: an interface layer over internal systems, not a stateful application tier of their own. The main exception is explicit long-running task handling, where state is persisted deliberately because the work itself outlives a single request, aka MCP Tasks. We will describe Tasks in a future article in the series.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Control Planes
&lt;/h2&gt;

&lt;p&gt;MCP's three primitives aren't just three types of capability. They represent three distinct control planes, or three answers to the question "who decides when this gets used?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools are model-controlled.&lt;/strong&gt; The LLM (model) decides when to invoke them. When a user asks, "Where's my order?", the LLM selects &lt;code&gt;track_latest_order&lt;/code&gt; from the available tools. The user never explicitly chose that tool; the LLM's reasoning did. This is the right model for individual tasks where the LLM's judgment about which tool to call is sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts are user-controlled.&lt;/strong&gt; The human explicitly triggers them. In Claude Desktop, they appear as slash commands. In other clients, they show up as menu items or quick actions. The user sees "Weekly Sales Report" and clicks it, entering a week number. There's no ambiguity about what will happen, no LLM judgment about which workflow to run. The user chose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources are application-controlled.&lt;/strong&gt; The host application decides when to pull them into context. A resource might be a database schema, a configuration file, or a live dashboard. The application injects it into the conversation when relevant. For example, loading an API schema before a coding task. Neither the user nor the LLM explicitly requested it; the application determined it was needed.&lt;/p&gt;

&lt;p&gt;This taxonomy tells you which primitive to use. If the LLM should decide, use a tool. If the user decides, use a prompt. If the application decides to use a resource.&lt;/p&gt;

&lt;p&gt;In practice, many enterprise deployments add one more concept on top of these three primitives: &lt;strong&gt;Tasks&lt;/strong&gt;. Tasks are not part of the base three-way split. They are an extension pattern for long-running operations such as scans, report generation, provisioning, or approvals. They are also the main exception to the normal stateless model. The request/response interface remains stateless, but the server explicitly persists task state and exposes progress or completion, rather than relying on sticky in-memory sessions.&lt;/p&gt;

&lt;p&gt;This maps cleanly onto the Capability Square from the previous article — and prompts are where the split between the two human corners pays off the most:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control Plane&lt;/th&gt;
&lt;th&gt;Who Triggers at Runtime&lt;/th&gt;
&lt;th&gt;Square Corner(s)&lt;/th&gt;
&lt;th&gt;Strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;td&gt;The LLM (model)&lt;/td&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Intent interpretation, tool selection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompts&lt;/td&gt;
&lt;td&gt;The business user&lt;/td&gt;
&lt;td&gt;Business Analyst (authors) + Business User (triggers)&lt;/td&gt;
&lt;td&gt;Workflow knowledge encoded once, invoked many times&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resources&lt;/td&gt;
&lt;td&gt;The host application&lt;/td&gt;
&lt;td&gt;Host + Server&lt;/td&gt;
&lt;td&gt;Context management, data access&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7qf7k2pfbysr8vbutzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7qf7k2pfbysr8vbutzw.png" alt="MCP Prompts on the Capability Square" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prompts span both human corners of the square. The business analyst — the domain lead for workflow design — encodes an expert workflow into a prompt at design time. Engineers implement that workflow, and the platform team governs its deployment and control. The business user triggers it with one click at runtime. The prompt is literally the handoff artifact between the two humans: the analyst's workflow knowledge, packaged so a user doesn't need to recreate it every Monday morning. Tools, by contrast, sit under the LLM corner because the model's judgment determines when they are called. Resources sit at the boundary between the host application and the MCP server: the host decides &lt;em&gt;when&lt;/em&gt; to pull a resource into context, but the server &lt;em&gt;provides&lt;/em&gt; the content. This is the one primitive in which two actors collaborate without either human being directly in the loop, which partly explains why its ecosystem support lags behind that of tools and prompts. When all three control planes work together, the system covers every type of interaction: ad-hoc tasks (tools), structured workflows (prompts), and ambient context (resources). And because resource loading is application-dependent, the host may or may not inject the right resource at the right time — so an important role of prompt workflows is to explicitly load the relevant resources into context as part of the workflow definition. This ensures the LLM has the context it needs, regardless of what the host application decided to provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Primitive You're Not Using
&lt;/h2&gt;

&lt;p&gt;Most MCP servers expose tools. A growing number expose resources. Almost none expose prompts.&lt;/p&gt;

&lt;p&gt;Browse the MCP ecosystem, the tutorial repositories, the example servers, and the community showcases, and you'll find tool after tool after tool. Prompts are either absent entirely or limited to trivial "system message" wrappers that add no value beyond what the user could type themselves. The MCP official blog didn't publish its first &lt;a href="https://blog.modelcontextprotocol.io/posts/2025-07-29-prompts-for-automation/" rel="noopener noreferrer"&gt;prompts-for-automation post&lt;/a&gt; until mid-2025, months after the protocol launched. The ecosystem followed suit: tools are easy to demo, prompts require thinking about workflows, and most tutorials took the easy path.&lt;/p&gt;

&lt;p&gt;There's another reason prompts are underutilized: minimal SDK support. Most MCP SDKs treat prompts as simple message templates: you return a list of messages, and that's it. There's no built-in abstraction for multi-step workflows, data flow between steps, or hybrid execution where the server handles some steps and the LLM handles others. This is precisely why the PMCP (Pragmatic MCP) SDK added deep support for workflow prompts as an enterprise feature, the &lt;code&gt;SequentialWorkflow&lt;/code&gt; abstraction we'll demonstrate in this article. Without SDK support, building reliable workflow prompts requires significant boilerplate that most teams don't invest in.&lt;/p&gt;

&lt;p&gt;This is a missed opportunity. Prompts solve a reliability problem that tools cannot solve for known, repeatable workflows.&lt;/p&gt;

&lt;p&gt;Consider the gap. When you leave a multi-step workflow entirely to the LLM, using only tools, you're relying on instruction-only orchestration: the LLM reads the tool descriptions, figures out the right sequence, handles data flow between steps, and decides which computations to delegate versus attempt. In our experience building production MCP servers with the PMCP SDK, testing multi-step workflows like report generation, data pipelines, and incident response across multiple LLM models, instruction-only approaches typically achieve 60-70% compliance for complex workflows. That means 30-40% of the time, the LLM gets something wrong: a step out of order, a calculation it shouldn't have attempted, a variable lost between tool calls.&lt;/p&gt;

&lt;p&gt;Now compare hybrid execution, where the prompt defines the workflow, the server executes the deterministic steps, and the LLM fills in only where its language intelligence is needed. In the same test scenarios, hybrid execution typically achieves 85-95% compliance. These numbers come from internal benchmarks, not published studies, and will vary by model, workflow complexity, and domain, but the direction is consistent: reducing the LLM's decision space materially improves reliability.&lt;/p&gt;

&lt;p&gt;The reason is straightforward. Prompts reduce the LLM's decision space and move &lt;strong&gt;workflow state management&lt;/strong&gt; from the LLM's volatile context to explicit server-side execution state. In a multi-step tool chain, the LLM must track variables between calls, remember which step it's on, and pass results forward correctly, all in its context window, where information degrades with distance. In a workflow prompt, the server manages that state deterministically through request-scoped execution and, when necessary, explicitly persisted state. The LLM receives a pre-built plan with most steps already completed. It only needs to handle the parts that genuinely require language understanding: summarization, formatting, and inference.&lt;/p&gt;

&lt;p&gt;The most common failure mode has a name: &lt;strong&gt;calculation hallucination&lt;/strong&gt;. When an LLM sees a "calculate" tool and a "format" tool, it often skips the calculation tool to save a round trip and attempts the arithmetic itself. The result looks plausible, and the format is right; however, the numbers are wrong. Hybrid execution prevents this entirely: the server runs the calculation, the LLM never sees the raw numbers, and the result is correct by construction.&lt;/p&gt;

&lt;p&gt;If you're measuring task completion across diverse requests, and you should be, as we argued in the tool design article, prompts are how you push completion rates from "usually works" to "reliably works" for your most common workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Protocol to Workflow
&lt;/h2&gt;

&lt;p&gt;At the protocol level, a prompt is simple: the client calls &lt;code&gt;prompts/get&lt;/code&gt; with a name and arguments, and the server returns a &lt;code&gt;GetPromptResult&lt;/code&gt; containing a sequence of &lt;code&gt;PromptMessage&lt;/code&gt; values. Each message has a role (&lt;code&gt;System&lt;/code&gt; or &lt;code&gt;User&lt;/code&gt;) and content (text, images, or embedded resources). The client uses these messages to populate the conversation and guide the LLM's response. Clients discover available prompts via &lt;code&gt;prompts/list&lt;/code&gt; -- parallel to &lt;code&gt;tools/list&lt;/code&gt; -- and present them to users as slash commands, menu items, or quick actions. The key difference from tools: the user explicitly selects them. There's no LLM reasoning about which prompt to invoke.&lt;/p&gt;

&lt;p&gt;At this protocol level, prompts are message templates. Useful for setting up context, but not fundamentally different from what the user could type themselves. The real power emerges when you move from templates to workflows: multi-step processes in which data flows between steps, and the server executes what it can before handing off to the LLM. In a production deployment, that workflow engine should still fit the same remote, mostly stateless service model: deterministic steps execute within the request, and truly long-running work is broken out into explicit tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An important distinction:&lt;/strong&gt; base MCP prompts are message templates. The server-executed workflow behavior shown below is a PMCP SDK abstraction built on top of prompts, tools, and resources. It uses the prompt protocol as the entry point, but adds a workflow engine that executes deterministic steps server-side before returning the message sequence to the client. This is not part of the MCP spec -- it's what a well-designed SDK can layer on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Weekly Sales Report: One Click, Complex Result
&lt;/h2&gt;

&lt;p&gt;Here's the weekly sales report as a &lt;code&gt;SequentialWorkflow&lt;/code&gt; -- a PMCP abstraction where each step can feed data into the next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;
    &lt;span class="nn"&gt;dsl&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_arg&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;SequentialWorkflow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;serde_json&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// SequentialWorkflow: a multi-step prompt where data flows between steps.&lt;/span&gt;
&lt;span class="c1"&gt;// Unlike SyncPrompt (which builds static messages), SequentialWorkflow&lt;/span&gt;
&lt;span class="c1"&gt;// orchestrates tool calls with typed data bindings between steps.&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;sales_report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;SequentialWorkflow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"weekly_sales_report"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Generate a formatted weekly sales report with trends and key metrics."&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// Arguments: what the user provides when triggering this prompt&lt;/span&gt;
&lt;span class="nf"&gt;.argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"week"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Week identifier (e.g., '2026-W12')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;.argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Output format: summary or detailed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Step 1: Query sales database (server executes -- deterministic)&lt;/span&gt;
&lt;span class="c1"&gt;// The server calls query_database with constant + user-provided args.&lt;/span&gt;
&lt;span class="c1"&gt;// No LLM needed: this is pure data retrieval.&lt;/span&gt;
&lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nn"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"query_sales"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"query_database"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"query_type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;json!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"weekly_sales"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"week"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;prompt_arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"week"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sales_data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// output available as "sales_data" for later steps&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Step 2: Aggregate by category (server executes -- deterministic)&lt;/span&gt;
&lt;span class="c1"&gt;// Uses the output from step 1. The server chains these automatically.&lt;/span&gt;
&lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nn"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aggregate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aggregate_metrics"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sales_data"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;// entire output from step 1&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"group_by"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;json!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"product_category"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="nf"&gt;.bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aggregated"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Step 3: Calculate week-over-week trends (server executes -- deterministic)&lt;/span&gt;
&lt;span class="c1"&gt;// This is the step that failed in our opening scenario when the LLM&lt;/span&gt;
&lt;span class="c1"&gt;// tried to do it. The server handles the arithmetic correctly every time.&lt;/span&gt;
&lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nn"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"calc_trends"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"calculate_trends"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"current_week"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aggregated"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"comparison"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;json!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"previous_week"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="nf"&gt;.bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trends"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Step 4: Format as executive summary (LLM needed -- natural language)&lt;/span&gt;
&lt;span class="c1"&gt;// This step requires intelligence: choosing which metrics to highlight,&lt;/span&gt;
&lt;span class="c1"&gt;// writing prose summaries, deciding what "noteworthy" means.&lt;/span&gt;
&lt;span class="c1"&gt;// The server provides the data and guidance; the LLM provides the writing.&lt;/span&gt;
&lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nn"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"format_report"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"format_output"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.with_guidance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"Format the aggregated data into an executive summary for week {week}.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             Highlight the top 3 performing categories and any &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             week-over-week trends that exceed 10% change.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             Use the report template for consistent formatting."&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"template://reports/weekly-sales"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Report template resource"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aggregated"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trends"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trends"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;prompt_arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"report"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Follow the data flow through the DSL helpers. &lt;code&gt;prompt_arg("week")&lt;/code&gt; pulls the user-provided week into step 1. &lt;code&gt;from_step("sales_data")&lt;/code&gt; feeds step 1's entire output into step 2. &lt;code&gt;from_step("aggregated")&lt;/code&gt; chains step 2's result into step 3. Each &lt;code&gt;bind("name")&lt;/code&gt; names a step's output, allowing subsequent steps to reference it. The data flows forward through the workflow without any LLM involvement in the plumbing.&lt;/p&gt;

&lt;p&gt;Steps 1-3 are deterministic. The server executes them because each parameter can be resolved from prompt arguments (&lt;code&gt;prompt_arg&lt;/code&gt;), constants (&lt;code&gt;constant&lt;/code&gt;), or prior-step bindings (&lt;code&gt;from_step&lt;/code&gt;). No judgment required. No natural language interpretation. Just data retrieval, aggregation, and arithmetic.&lt;/p&gt;

&lt;p&gt;Step 4 is where the server stops and hands off. The &lt;code&gt;format_output&lt;/code&gt; tool needs LLM intelligence for natural language summarization: choosing which metrics to highlight, writing prose, deciding what "noteworthy" means. The server provides everything the LLM needs -- the aggregated data (from steps 1-3), the guidance (what to highlight), and a report template resource. The LLM's job is reduced to writing.&lt;/p&gt;

&lt;p&gt;Remember the opening scenario? The LLM tried to calculate week-over-week trends and got the arithmetic wrong—mixing up baselines and producing a report showing 340% growth in a category that actually declined. With this workflow, the server handles the arithmetic in step 3. Deterministically. Correctly. Every time. The LLM only enters at step 4, where its strength—natural language—is needed.&lt;/p&gt;

&lt;p&gt;Registration ties the workflow to the server's existing tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nn"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"query_database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_db_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aggregate_metrics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aggregate_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"calculate_trends"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trends_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"format_output"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.resources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report_templates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.prompt_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales_report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;  &lt;span class="c1"&gt;// validates bindings and registers as prompt&lt;/span&gt;
    &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
    &lt;span class="nf"&gt;.run_streamable_http&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.0.0.0:3000"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice &lt;code&gt;.prompt_workflow()&lt;/code&gt; validates the workflow's bindings at registration time. If you reference a binding that doesn't exist -- say, &lt;code&gt;from_step("sales_data")&lt;/code&gt; with a typo -- you get an error at startup, not a runtime surprise when a user triggers the prompt. The tools you already built become the building blocks. The workflow just orchestrates them.&lt;/p&gt;

&lt;p&gt;The user clicks one prompt. Three database operations, one aggregation, and one trend calculation happen server-side in milliseconds. The LLM receives the complete data and writes the summary. One click, complex result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Partial Execution Plans: The Server Does What It Can
&lt;/h2&gt;

&lt;p&gt;When a user invokes the weekly sales report prompt, the server doesn't just return instructions. It returns a &lt;em&gt;partial execution plan&lt;/em&gt;: a conversation trace showing what was already done and what remains.&lt;/p&gt;

&lt;p&gt;The server executed steps 1-3 and embedded the actual results. Here's a simplified version of what the client LLM receives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Message 1 (User): "Generate weekly sales report for 2026-W12"
Message 2 (Assistant): "Plan: 1. Query sales DB  2. Aggregate  3. Calculate trends  4. Format"
Message 3 (Assistant): "Calling query_database..."
Message 4 (Tool Result): {"total_revenue": 284500, "transactions": 1247, ...}  ← PRE-EXECUTED by server
Message 5 (Assistant): "Calling aggregate_metrics..."
Message 6 (Tool Result): {"categories": [{"name": "Enterprise", "revenue": 142000}, ...]}  ← PRE-EXECUTED by server
Message 7 (Assistant): "Calling calculate_trends..."
Message 8 (Tool Result): {"enterprise": "+12%", "smb": "-3%", "startup": "+28%", ...}  ← PRE-EXECUTED by server
Message 9 (Assistant): "Format the aggregated data into an executive summary for 2026-W12..."
Message 10 (Resource): [weekly-sales template content]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Messages 1-8 are done. The tool results (Messages 4, 6, 8) were pre-executed by the server; the LLM didn't call those tools. It receives actual data —real revenue numbers, real category breakdowns, real trend percentages—not instructions to fetch that data. The server already queried the database, already aggregated, already calculated. The results are embedded in the conversation trace as if the tools had been called, but no LLM decision-making was involved.&lt;/p&gt;

&lt;p&gt;Message 9 is the guidance for the remaining step. Message 10 is the resource template. The LLM's job is reduced to: take this data, follow this guidance, use this template, write prose. That's one decision (how to write the summary) instead of the dozens of decisions the instruction-only approach requires (which tools to call, in what order, how to handle errors, whether to do the arithmetic itself).&lt;/p&gt;

&lt;p&gt;This is not a template. It's an execution plan where the server has already completed the deterministic portion. The distinction matters: a template says "do these steps." A partial execution plan says, "these steps are done and here are the results, now do the remaining steps." The LLM starts from step 4, not step 1.&lt;/p&gt;

&lt;p&gt;This is the Capability Square operating at the workflow level. The &lt;strong&gt;server&lt;/strong&gt; handles deterministic computation — its strength. The &lt;strong&gt;LLM&lt;/strong&gt; handles natural language — its strength. The &lt;strong&gt;business analyst&lt;/strong&gt; designed the workflow at design time, identifying which steps are deterministic and which require intelligence — their strength. And the &lt;strong&gt;business user&lt;/strong&gt; invoked it at runtime with the specific parameters (the week, the service, the severity) that only they, living inside the working context, can provide — their strength. All four corners working together, not on a single tool call, but across an entire workflow.&lt;/p&gt;

&lt;p&gt;The compliance improvement is consistent across our internal benchmarks. Instruction-only approaches, where the prompt simply says "follow these steps: 1. Query the sales DB, 2. Aggregate by category, 3. Calculate trends, 4. Format as a summary:" and leave every decision to the LLM. It might skip steps, reorder them, call different tools, or do the arithmetic itself (badly). Hybrid execution, where steps 1-3 are already done, and the LLM just needs to format, dramatically narrows the decision space. Far fewer decisions, far fewer failure points, far more reliable output.&lt;/p&gt;

&lt;p&gt;Test this with your own workflows. Take a 4-step process that your team runs weekly. Build it as an instruction-only prompt, then as a SequentialWorkflow with hybrid execution. Run both 20 times. The difference in successful completions will make the case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Incident Response: When the Server Needs the LLM
&lt;/h2&gt;

&lt;p&gt;The sales report workflow was mostly deterministic: three server-executed steps, one LLM step. But not every workflow splits that cleanly. Consider incident response, where the server gathers data, but the LLM needs to do the hard part of synthesis and recommendation.&lt;/p&gt;

&lt;p&gt;A 5-step incident response workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check service status (server executes -- API call, deterministic)&lt;/li&gt;
&lt;li&gt;Pull recent error logs (server executes -- log query, deterministic)&lt;/li&gt;
&lt;li&gt;Correlate with recent deployments (server executes -- git/deploy history lookup, deterministic)&lt;/li&gt;
&lt;li&gt;Draft incident summary (LLM needed -- synthesis, pattern recognition, writing)&lt;/li&gt;
&lt;li&gt;Suggest mitigation steps (LLM needed -- reasoning about root cause, recommending actions)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the sketch -- not a full implementation, but enough to see the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nn"&gt;SequentialWorkflow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"incident_response"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Investigate and summarize a service incident"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Service name or ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Severity level: P1, P2, P3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Steps 1-3: Server handles (deterministic data gathering)&lt;/span&gt;
    &lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="cm"&gt;/* check_service_status -- server executes */&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="cm"&gt;/* query_error_logs -- server executes */&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="cm"&gt;/* check_recent_deploys -- server executes */&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Steps 4-5: LLM handles (intelligence required)&lt;/span&gt;
    &lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nn"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"draft_summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"create_incident_report"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;.with_guidance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"Analyze the service status, error logs, and deployment history.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 Draft an incident summary for {service} at severity {severity}.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 Include: timeline, affected systems, error patterns, and &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 correlation with recent deployments."&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"service_status"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"logs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"error_logs"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"deploys"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"deploy_history"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;.bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nn"&gt;WorkflowStep&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"suggest_mitigation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;ToolHandle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"recommend_actions"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;.with_guidance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"Based on the incident summary, suggest 2-3 mitigation steps.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 If the incident correlates with a recent deployment, include &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 a rollback recommendation."&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;from_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nf"&gt;.bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"recommendations"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The split is different from the sales report. The sales report was a 3-step server, 1-step LLM—mostly deterministic. The incident response is 3 steps for the server; 2 steps for the LLM. The analysis and recommendation require genuine intelligence. But the constant is the same: the server gathers all the data the LLM needs before handing off. The LLM doesn't have to figure out which APIs to call or which logs to check. It receives the service status, error logs, and deployment history, then applies its strengths: synthesis and reasoning.&lt;/p&gt;

&lt;p&gt;Notice that step 5 depends on step 4's output (&lt;code&gt;from_step("summary")&lt;/code&gt;). The LLM executes both steps, but the data dependency is explicit in the workflow. The business analyst who designed this workflow decided that the mitigation suggestions should be based on the incident summary rather than the raw data. That's domain knowledge encoded in the workflow structure.&lt;/p&gt;

&lt;p&gt;The partial execution plan for this workflow looks different, too. The server executes steps 1-3 and embeds the results. The LLM receives three steps' worth of data and two steps' worth of guidance. It drafts the summary, then uses that summary to suggest mitigations. The workflow is longer, the LLM does more, but the pattern is identical: the server handles the deterministic parts, the LLM handles the intelligence parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Business Analyst's Playbook: Learning What Business Users Need
&lt;/h2&gt;

&lt;p&gt;The weekly sales report and the incident response share something important: someone who understands the organization's workflows designed them. That someone is the &lt;strong&gt;business analyst&lt;/strong&gt; — one of the two human corners of the Capability Square. In a strong enterprise setup, workflow design is domain-led, engineering-implemented, and platform-governed. The analyst shares a domain with the business users they're designing for, and their role doesn't end at tool design. It extends to workflow design: identifying which processes their business users run repeatedly, which steps are deterministic, and where the LLM's intelligence adds value.&lt;/p&gt;

&lt;p&gt;The following diagram illustrates the benefits of adding workflow prompts to the MCP servers, as they dramatically reduce the effort for busy business people and significantly increase the completion rate of requests and their consistency:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hdnkp197kssqj3xz7or.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hdnkp197kssqj3xz7or.png" alt="Why to design MCP Prompts" width="800" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's how to approach workflow prompt design in practice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observe your users.&lt;/strong&gt; What tasks do they repeat weekly? Monthly? What multi-step processes do they describe as "the usual"? These are prompt candidates. Every Monday, the sales team generates a weekly report. Every time there's an outage, the ops team runs the same diagnostic sequence. Every quarter, the finance team reconciles accounts. These are not ad hoc tasks, as they are workflows that run on a schedule, with the same steps and for the same reasons.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Identify the deterministic core.&lt;/strong&gt; For each repeating workflow, ask: which steps are always the same? Which steps require judgment? The always-the-same steps become server-executed workflow steps with &lt;code&gt;constant()&lt;/code&gt; and &lt;code&gt;from_step()&lt;/code&gt; bindings. The judgment steps become LLM-guided steps with &lt;code&gt;.with_guidance()&lt;/code&gt;. The sales report's trend calculation is always the same arithmetic. The incident response's mitigation recommendation always requires judgment. The split is usually obvious once you look for it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with one prompt.&lt;/strong&gt; Don't build 20 prompts. Build the one prompt that saves the most time for the most users. Measure its completion rate. Iterate. This mirrors the tool design advice from the &lt;a href="//../01-tool-design/article.md"&gt;previous article&lt;/a&gt;: start with the 20% that serves 80%. For prompts, start with the one workflow your team runs most often.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Connect prompts to tools.&lt;/strong&gt; Prompts don't replace tools -- they orchestrate them. Your existing tools become the building blocks of workflow prompts. A SequentialWorkflow's steps call your tools via &lt;code&gt;ToolHandle&lt;/code&gt;. The &lt;code&gt;query_database&lt;/code&gt;, &lt;code&gt;aggregate_metrics&lt;/code&gt;, and &lt;code&gt;calculate_trends&lt;/code&gt; tools existed independently before the sales report workflow was built. The workflow just wired them together with data flow and execution order.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Iterate based on failure modes.&lt;/strong&gt; If the LLM consistently gets step N wrong, move step N to the server side. If the server can't handle step M because it requires judgment, move it to the LLM with clear guidance. The boundary between deterministic and intelligent steps is not fixed -- it's something you discover through observation and measurement.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The business analyst's role is to encode organizational knowledge into the MCP server — knowledge they are qualified to encode precisely because they share a domain with the business users who will invoke it. Tools encode individual capabilities. Prompts encode workflows — the sequences, the data flow, the decision about which steps need human-level intelligence and which don't. You know which workflows matter. You know which steps are deterministic. You know where the LLM's intelligence adds value. Encode that knowledge in prompts.&lt;/p&gt;

&lt;p&gt;Track prompt invocation frequency and completion rates. A prompt that's invoked 50 times a week with 90% completion is saving your team hours of manual orchestration. A prompt that's never invoked is telling you something about your understanding of user needs. Both signals are useful -- one tells you what to optimize, the other tells you what to rethink.&lt;/p&gt;

&lt;p&gt;None of this removes the need for security-by-design. Prompts are not "just UX." They package access to real systems and real workflows. The same controls apply here as in tools: per-request authn and authz, policy checks on downstream operations, audit logs, rate limits, secret isolation, and clear boundaries on which systems the workflow may touch. If a workflow includes code mode, the controls need to be tighter still: validate first, approve when the risk warrants it, and execute only within a constrained sandbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources: The Application-Controlled Plane
&lt;/h2&gt;

&lt;p&gt;We've covered tools (model-controlled) and prompts (user-controlled). The third primitive is resources: application-controlled context that the host application pulls into the conversation.&lt;/p&gt;

&lt;p&gt;Resources are read-only reference material -- documentation, schemas, configuration, templates. They provide context that helps agents make better decisions. Where tools perform actions and prompts orchestrate workflows, resources serve information on request. They are passive: the server publishes them, and the client or prompt reads them when needed.&lt;/p&gt;

&lt;p&gt;Here's a resource using the PMCP SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;StaticResource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ResourceCollection&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Resources provide context data that agents can read before acting.&lt;/span&gt;
&lt;span class="c1"&gt;// Unlike tools (which perform actions) or prompts (which orchestrate workflows),&lt;/span&gt;
&lt;span class="c1"&gt;// resources are passive: they serve information on request.&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;ResourceCollection&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.add_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nn"&gt;StaticResource&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"docs://sales/schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"# Sales Database Schema&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             ## Tables&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             - `orders`: order_id, customer_id, total, created_at&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             - `products`: product_id, name, category, price&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             - `customers`: customer_id, name, email, segment&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             ## Common Queries&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             Weekly sales: GROUP BY date_trunc('week', created_at)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             By category: JOIN products ON orders.product_id = products.product_id"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sales Database Schema"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"Database schema and common query patterns for the sales system. &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
             Read this before constructing database queries."&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_mime_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"text/markdown"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;URI design matters. Use scheme prefixes to organize your resources: &lt;code&gt;docs://&lt;/code&gt; for documentation, &lt;code&gt;config://&lt;/code&gt; for configuration, &lt;code&gt;data://&lt;/code&gt; for structured data, &lt;code&gt;template://&lt;/code&gt; for report and output templates. The URI is a stable identifier that clients and prompts reference -- &lt;code&gt;docs://sales/schema&lt;/code&gt; tells both humans and agents what they'll find before reading it.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.with_description()&lt;/code&gt; call serves the same purpose as tool descriptions: it helps agents decide whether a resource is relevant before reading its content. A well-described resource lets an agent skip resources it doesn't need, reducing unnecessary context in the conversation.&lt;/p&gt;

&lt;p&gt;Notice how this connects to the weekly sales report prompt. In that workflow, step 4 used &lt;code&gt;.with_resource("template://reports/weekly-sales")&lt;/code&gt; to fetch a report template and embed its content in the conversation trace. Resources provide the context that makes prompts more effective -- the LLM reads the schema to understand the data it's formatting, reads the template to follow the expected output structure. Resources and prompts are designed to work together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ecosystem Reality Check
&lt;/h2&gt;

&lt;p&gt;Resources are the least mature of the three MCP primitives in terms of client support. The spec defines them comprehensively -- annotations, subscriptions, URI templates, content types. The PMCP SDK supports them fully. But client implementations lag behind.&lt;/p&gt;

&lt;p&gt;Most MCP clients implement the &lt;code&gt;resources/list&lt;/code&gt; and &lt;code&gt;resources/read&lt;/code&gt; protocol operations, but the user experience varies significantly. Claude Desktop requires users to explicitly select resources from a list. There is no standardized resource picker UI across clients. And critically, resource access is a client-side operation -- the LLM has no built-in way to request a resource the way it can call a tool. Unless the client proactively injects resources into context, or the server wraps resource access as a tool, the LLM never sees them.&lt;/p&gt;

&lt;p&gt;The gap between spec and ecosystem is real. The MCP specification describes a rich resource system with subscriptions for change notifications, URI templates for parameterized access, and annotations for priority and freshness signals. In practice, most clients implement the basics (list and read) and skip the rest. If you build a resource-heavy server today, you're building ahead of client support.&lt;/p&gt;

&lt;p&gt;This doesn't mean you shouldn't build resources. It means you should build them with realistic expectations about how they'll be consumed today, while designing for where the ecosystem is headed. The patterns in the next section bridge the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pragmatic Bridge Patterns: Making Resources Work Today
&lt;/h2&gt;

&lt;p&gt;Four patterns let you get value from resources today, regardless of client support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Wrap resources as tools&lt;/strong&gt; (most reliable today). Instead of serving a resource at &lt;code&gt;docs://sales/schema&lt;/code&gt;, create a &lt;code&gt;get_sales_schema&lt;/code&gt; tool that returns the same content. The LLM discovers and calls tools reliably -- this is the pragmatic path when you need agents to access reference data without depending on client resource support.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bridge pattern: expose resource content as a tool.&lt;/span&gt;
&lt;span class="c1"&gt;// Until clients reliably handle resources, tools are the safe path.&lt;/span&gt;
&lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"get_sales_schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="cm"&gt;/* returns the same content as docs://sales/schema */&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't elegant, but it works everywhere. You can maintain both the resource (for clients that support it) and the tool wrapper (for clients that don't), serving the same underlying content through both channels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Resource templates as parameterized access.&lt;/strong&gt; URI templates like &lt;code&gt;docs://reports/{report_type}&lt;/code&gt; let the server generate URIs from parameters. When clients support resource templates, they can offer auto-complete for resource URIs -- the user types &lt;code&gt;docs://reports/&lt;/code&gt; and sees available report types. This pattern is worth implementing now because it costs nothing extra and will work well as clients catch up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Prompt-mediated resource loading.&lt;/strong&gt; This is the pattern we already saw: &lt;code&gt;.with_resource(uri)&lt;/code&gt; in SequentialWorkflow steps. The server fetches the resource during prompt execution and embeds it in the conversation. This works today because it doesn't depend on client resource support at all -- the server handles the resource loading internally, and the client just sees the content in the prompt messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Subscribe and automatic injection&lt;/strong&gt; (future pattern). Clients can subscribe to resource changes via &lt;code&gt;resources/subscribe&lt;/code&gt;. When the resource updates, the server sends a notification, and the client can refresh its context. This enables "always up-to-date context" without manual polling -- imagine an agent that automatically gets the latest API schema whenever it changes. This is where resources are headed. When client support catches up, automatic resource injection will make context management seamless.&lt;/p&gt;

&lt;p&gt;Build your resources now. Use bridge patterns for today's clients. As the ecosystem matures, your resources will work natively -- and you'll already have the content, the URIs, and the descriptions in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Three control planes, three primitives.&lt;/strong&gt; Tools are model-controlled (the LLM decides). Prompts are user-controlled (the human decides). Resources are application-controlled (the host decides). Knowing which to use is the first design decision for any MCP capability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompts solve the workflow reliability problem.&lt;/strong&gt; For known, repeatable workflows, hybrid execution -- where the server handles deterministic steps and the LLM handles intelligence -- consistently outperforms instruction-only orchestration in our benchmarks. Each party does what it's built for.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partial execution plans are the key differentiator.&lt;/strong&gt; A prompt doesn't just send instructions. It returns a conversation trace with completed tool results, guidance for remaining steps, and embedded resource content. The LLM receives data, not directions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The business analyst designs workflows, not just tools.&lt;/strong&gt; Observe which tasks your business users repeat. Identify the deterministic core. Package it as a SequentialWorkflow. Start with one prompt for your team's most common workflow and measure its completion rate. This is the handoff between the two human corners of the square: the analyst encodes once at design time, the business user triggers many times at runtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resources are underbuilt but worth building.&lt;/strong&gt; Client support is thin today. Use bridge patterns -- wrap as tools, prompt-mediated loading -- for immediate value. Design for where the ecosystem is going, and your resources will be ready when clients catch up.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tasks are the explicit exception to the stateless rule.&lt;/strong&gt; Most MCP interactions should stay stateless. When work outlives a single request, model it as a task with persisted state, progress tracking, and clear completion semantics instead of smuggling session state into the server process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompts and tools are complementary.&lt;/strong&gt; Prompts orchestrate tools. Your existing tools become the building blocks of workflow prompts. Good tool design (from the &lt;a href="//../01-tool-design/article.md"&gt;previous article&lt;/a&gt;) makes good prompt design possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Measure prompt completion rates.&lt;/strong&gt; Track invocation frequency and success across diverse users. If a prompt is never invoked, your understanding of user needs may be wrong. If it fails consistently at step N, move step N server-side. Both signals guide iteration.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Continue the Series
&lt;/h2&gt;

&lt;p&gt;This article covered prompts and resources -- the primitives that turn individual tools into reliable workflows and ambient context. The rest of the series goes deeper.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Want to test your server?&lt;/strong&gt; See &lt;strong&gt;Testing MCP Servers&lt;/strong&gt; for unit testing, integration testing, and description quality validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concerned about security?&lt;/strong&gt; &lt;strong&gt;MCP Security&lt;/strong&gt; covers OAuth 2.1, input validation, and the common vulnerabilities that affect MCP servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building from an existing API spec?&lt;/strong&gt; &lt;strong&gt;Schema-Driven MCP Servers&lt;/strong&gt; shows the generate-then-prune workflow for going from OpenAPI spec to curated tool set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need interactive UI?&lt;/strong&gt; &lt;strong&gt;MCP Apps&lt;/strong&gt; covers building MCP Apps with UI widgets for rich agent experiences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interested in code mode?&lt;/strong&gt; &lt;strong&gt;Code Mode for MCP&lt;/strong&gt; explores the two-step &lt;code&gt;validate_code&lt;/code&gt; then &lt;code&gt;execute_code&lt;/code&gt; flow, including policy analysis, risk scoring, human approval, and sandboxed execution for the long tail of requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need long-running execution?&lt;/strong&gt; &lt;strong&gt;Tasks for MCP&lt;/strong&gt; covers the explicit task model for work that should not happen inside a single stateless request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For hands-on practice with these patterns, the &lt;a href="https://advanced-mcp-course.us-east.true-mcp.com/landing" rel="noopener noreferrer"&gt;Advanced MCP course&lt;/a&gt; provides guided exercises building production MCP servers in Rust with the PMCP SDK.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>rust</category>
      <category>llm</category>
    </item>
    <item>
      <title>MCP Tool Design: Why Your AI Agent Is Failing (And How to Fix It)</title>
      <dc:creator>Guy</dc:creator>
      <pubDate>Wed, 18 Mar 2026 23:57:57 +0000</pubDate>
      <link>https://dev.to/aws-heroes/mcp-tool-design-why-your-ai-agent-is-failing-and-how-to-fix-it-40fc</link>
      <guid>https://dev.to/aws-heroes/mcp-tool-design-why-your-ai-agent-is-failing-and-how-to-fix-it-40fc</guid>
      <description>&lt;h2&gt;
  
  
  The Reports of MCP's Death Have Been Greatly Exaggerated
&lt;/h2&gt;

&lt;p&gt;Scroll through developer forums in early 2026, and you'll find a recurring theme: MCP is dead. The takes range from dismissive ("just a fad") to resigned ("we tried it, our agents kept failing"). And the frustrations behind them are real. Teams are building MCP servers with 50+ tools, watching their agents stumble through tool selection, and concluding that the protocol itself is broken.&lt;/p&gt;

&lt;p&gt;It isn't. MCP isn't dead; it's being used poorly. And the evidence for how to use it well is now overwhelming.&lt;/p&gt;

&lt;p&gt;Over the past year, teams at GitHub, Block, and dozens of smaller shops have converged on the same set of principles. &lt;a href="https://github.blog/ai-and-ml/github-copilot/how-were-making-github-copilot-smarter-with-fewer-tools/" rel="noopener noreferrer"&gt;GitHub Copilot cut its tool count from 40 to 13&lt;/a&gt; and saw measurable benchmark improvements. &lt;a href="https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers" rel="noopener noreferrer"&gt;Block rebuilt its Linear MCP server three times&lt;/a&gt;, going from 30+ tools to just 2. The pattern is consistent: fewer tools, better descriptions, outcome-oriented design. The problem isn't the protocol. It's tool design.&lt;/p&gt;

&lt;p&gt;This article lays out the framework. We'll start with the mental model that makes everything else click, the Capability Square, then walk through the anatomy of a well-designed tool. Subsequent articles in this series cover the quantitative evidence, description quality, and the anti-patterns that cause most failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is MCP? (The 30-Second Version)
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) is an open protocol that connects AI models to external tools and data sources. The simplest way to think about it: websites and mobile apps are the interface between humans and online services. MCP is the interface between AI and those same services. Over decades, we've invested heavily in improving human interfaces, including the iPhone's gesture language, years of UX research, accessibility standards, and usability testing. AI needs the same investment in its interface to online services. MCP is that interface, and tool design is its UX discipline.&lt;/p&gt;

&lt;p&gt;One of our clients came to us with exactly this gap. They wanted AI agents to operate their web forms: filling in fields, clicking buttons, navigating multi-step workflows through a browser. They asked us to run tests evaluating how well browser-based agents could complete their online forms, and to help "fix" the forms for agent compatibility. We explained that this was significant effort in the wrong direction. Their web forms were designed for humans, with visual layout, hover states, drag-and-drop interactions. Instead, we showed them that adding an MCP server to the same API sitting behind those forms gave AI agents a native interface purpose-built for how they work: structured inputs, clear descriptions, typed responses. The agents went from struggling with form fields to completing tasks reliably. The lesson: don't retrofit human interfaces for AI. Build AI-native interfaces alongside them, and MCP servers to your internal and external services.&lt;/p&gt;

&lt;p&gt;The parallels between UX design and MCP tool design run deep. Decades of UX research have produced principles that transfer directly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Affordance&lt;/strong&gt; - the idea that a door handle should look pullable, maps to tool names and parameter descriptions: if a field is named &lt;code&gt;id&lt;/code&gt; but requires a UUID, the affordance is broken. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recognition over recall&lt;/strong&gt; - it's easier to pick from a list than type from memory, maps to using enums and example values in schemas so the LLM recognizes valid inputs instead of guessing. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visibility of system status&lt;/strong&gt; - users need feedback when something goes wrong, maps to error messages that explain what happened and how to fix it, rather than a cryptic "invalid input." These aren't metaphors. They're the same design discipline applied to a different kind of user.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Capability Square: Four Parties, One Tool
&lt;/h2&gt;

&lt;p&gt;Even if you've been building MCP servers for months, don't skip this section. The Capability Square reframes tool design around two parties that most MCP discussions collapse into one or ignore entirely: the &lt;strong&gt;business analyst&lt;/strong&gt; who designs the server, and the &lt;strong&gt;business user&lt;/strong&gt; who actually invokes it. Both are domain experts — they know the business the server operates in — but they show up at different times. The analyst shows up at design time, encoding domain knowledge into tool names, descriptions, and schemas. The business user shows up at runtime, asking the questions the server was built to answer. Every MCP tool sits at the intersection of four parties, each with distinct strengths and weaknesses. Understanding this balance is the foundation of good tool design.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5aw3k68ah90pohjiq38c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5aw3k68ah90pohjiq38c.png" alt="The Capability Square: Four Parties, One Tool" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The LLM (MCP Client)
&lt;/h3&gt;

&lt;p&gt;A large language model (LLM) is part of each MCP client, such as ChatGPT, Claude Desktop, or a custom agent. It brings language understanding, reasoning, and tool-calling intelligence. It's good at interpreting ambiguous user requests ("where's my package?"), choosing between available tools, composing multi-step plans, and recovering gracefully from errors.&lt;/p&gt;

&lt;p&gt;What it's bad at: domain knowledge and symbolic computation. An LLM doesn't know which API capabilities matter for your specific users, and it can't access your databases. It doesn't know that your customer support team needs order tracking, but never touches inventory management. It doesn't know your compliance requirements or your business rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  The MCP Server
&lt;/h3&gt;

&lt;p&gt;The server provides symbolic computation, data access, and validated operations. It's good at precise calculations, database queries, API calls with proper authentication, input validation, and returning structured results. It runs deterministically, and it is more predictable and easier to validate than LLM reasoning.&lt;/p&gt;

&lt;p&gt;What it's bad at: understanding user intent. A server can't interpret "check if we have enough widgets for the Johnson order" without a tool specifically designed for that workflow. It doesn't adapt to ambiguity. It does exactly what it's told, nothing more.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Business Analyst (Design-Time Domain Expert)
&lt;/h3&gt;

&lt;p&gt;This is the party that's most often overlooked, and it's the one that shapes everything the LLM ever sees. Critically, this is &lt;strong&gt;not&lt;/strong&gt; a technical role. The best server designer is not the software developer, not the platform engineer, not the IT team — it is a &lt;strong&gt;domain-expert business analyst&lt;/strong&gt;: the product manager, the operations lead, the subject-matter expert, or the analyst who sits closest to users and their workflows. They may pair with engineers who implement the server, but the design decisions — which tools to expose, what to name them, how to describe them, when they should be used — belong to someone fluent in the &lt;em&gt;business domain&lt;/em&gt;, not the underlying API. Technical fluency is not a substitute for domain fluency, and handing MCP design to the team that happens to own the codebase is one of the most common and expensive mistakes in this space.&lt;/p&gt;

&lt;p&gt;The business analyst brings knowledge that neither the LLM nor the server possesses. They know which 20% of an API serves 80% of actual requests. They understand the user personas and their existing processes. They know the vocabulary their users speak in, the business rules that constrain what "correct" means, and the compliance requirements that shape the edges. They know that the customer support team needs order tracking, but never touches inventory management. They know the business context.&lt;/p&gt;

&lt;p&gt;What they're bad at: &lt;strong&gt;being present at runtime&lt;/strong&gt;. The business analyst's knowledge has to be encoded into the tool's name, description, schema, and error messages before the first business user ever connects. Every design choice is a message to the LLM about how to use the tool.&lt;/p&gt;

&lt;p&gt;But "not present at runtime" doesn't mean "design it and walk away." Tool design is iterative. Your first design is a hypothesis about what your business users need, and like any hypothesis, it needs validation. Usage logs tell you which tools are called, which fail, which are never used, and which requests produce no tool match at all. The business analyst reviews these logs and refines: renaming tools that confuse the LLM, improving descriptions that lead to wrong selections, and adding tools for workflows that users need but the initial design missed.&lt;/p&gt;

&lt;p&gt;This iterative loop is where MCP shines compared to direct API integration. Changing a tool's name, description, or input schema is a server-side change, with no client updates, no SDK version bumps, no breaking changes propagated to consumers. The MCP protocol decouples tool discovery from tool invocation, so the LLM rediscovers the improved schema on the next connection. This makes the feedback cycle fast: observe failures, update the tool design, deploy, and measure again. Teams that treat tool design as a one-time exercise miss the biggest advantage of having MCP in the middle. You should put effort into designing the MCP server correctly, as "You never get a second chance to make a first impression." However, you should continue to monitor the MCP server usage logs to adjust to the usage patterns of real business users.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Business User (Runtime Domain Expert)
&lt;/h3&gt;

&lt;p&gt;The business user is the person who actually opens the MCP client and asks the question. They share the business analyst's domain — a customer support rep, a warehouse manager, a financial analyst, a clinician, an operations planner — but their expertise shows up at runtime, not design time. They bring the one thing no one else in the square possesses when a request is actually being made: &lt;strong&gt;the specific intent behind this specific request, in this specific business context, right now&lt;/strong&gt;. "The Johnson order." "The Q3 reconciliation." "The East Coast warehouse." These references mean nothing to the LLM or the server on their own; they only make sense because the business user lives inside a working context that the design-time parties can't fully predict.&lt;/p&gt;

&lt;p&gt;What they're good at: knowing what they actually want, recognizing a wrong answer when they see it, and framing requests in domain language. What they're bad at — and what a well-designed server should protect them from — are the things the analyst has already solved for them: they shouldn't have to know which tool to pick, which parameters to format, or which API endpoint underlies the answer. A well-designed MCP server makes the business user's domain fluency sufficient; they describe the outcome in their own vocabulary, and the system handles the rest.&lt;/p&gt;

&lt;p&gt;This is why the two human corners of the square must share a domain. If the business analyst designs tools for a persona they don't understand, no amount of schema polish will save the server: the vocabulary will be wrong, the outcomes won't match real requests, and the "obvious" tool for a given question won't exist. The tightest MCP servers are those where the analyst either &lt;em&gt;is&lt;/em&gt; a business user (dogfooding) or spends significant time watching them work. The feedback channel between the two human corners — usage logs, failed requests, "no tool matched" events — is the mechanism that keeps them aligned as users and workflows evolve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Square Matters
&lt;/h3&gt;

&lt;p&gt;Each corner of the square compensates for the others' weaknesses, and each lives at a different point in time. &lt;strong&gt;At design time,&lt;/strong&gt; the business analyst encodes domain context into the server's tools, descriptions, and schemas — knowledge that neither the LLM nor the server possesses on its own. &lt;strong&gt;At runtime,&lt;/strong&gt; the business user brings the specific intent behind a specific request — the thing the analyst could not predict in advance. The LLM translates that intent into a tool call, interpreting ambiguity that the server can't. The server executes with precision, but the LLM can't match. No single corner can carry the system; remove any one, and task completion collapses.&lt;/p&gt;

&lt;p&gt;This has a practical consequence that trips up most teams: the same API should produce different MCP servers for different business users.&lt;/p&gt;

&lt;p&gt;Consider the London Transit API. A daily commuter wants trip planning: "fastest route from Paddington to Canary Wharf, avoiding the Jubilee line." An event organizer wants logistics: "How many bus routes serve Wembley Stadium, and what's the last departure after a 10 PM concert?" A municipal planner wants a construction impact analysis: "If we close three stations on the Northern line for six weeks, which bus routes need capacity increases?"&lt;/p&gt;

&lt;p&gt;Same API. Three completely different MCP servers. Three different sets of tools, with different names, different descriptions, and different response shapes, because the business analyst for each server shares a domain with their business users and knows how those users actually frame their requests.&lt;/p&gt;

&lt;p&gt;Here's the key insight: when you ask an LLM to auto-wrap an API, it lacks this domain context. It can't know which 20% matters because it doesn't know who the user is. Auto-generated MCP servers produce generic tool sets that serve no one well. The business analyst's judgment — encoded in tool selection, naming, and descriptions — is what makes an MCP server effective, and that judgment exists only because the analyst understands the &lt;em&gt;business&lt;/em&gt;, not just the API.&lt;/p&gt;

&lt;p&gt;How do you know your square is balanced? Measure task completion across the specific requests your business users actually make, not only the three test cases you tried during development. If completion is low, one corner is weak:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The LLM can't understand your tools&lt;/strong&gt; → fix names, descriptions, and schemas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The server can't handle the requests&lt;/strong&gt; → add or redesign tools, or move orchestration server-side.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The business analyst chose the wrong tools to expose&lt;/strong&gt; → the design doesn't match what users actually ask for; re-observe the real workflows and re-prioritize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The business user's vocabulary doesn't match the server's&lt;/strong&gt; → the analyst built for a different persona, or the shared-domain assumption was wrong, and a technical team ended up making design calls they weren't qualified to make.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tool Anatomy: What Makes an MCP Tool
&lt;/h2&gt;

&lt;p&gt;An MCP tool has six components: a name, a description, an input schema, an output schema, a handler, and error handling. Each one is a communication channel to the LLM, and each one matters.&lt;/p&gt;

&lt;p&gt;Here's a complete tool written in Rust that uses the PMCP SDK. Don't worry if you're not fluent in Rust, as the comments walk through every important line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// -- Dependencies --&lt;/span&gt;
&lt;span class="c1"&gt;// pmcp: the PMCP SDK for building MCP servers&lt;/span&gt;
&lt;span class="c1"&gt;// serde: serialization/deserialization (parses JSON input, formats JSON output)&lt;/span&gt;
&lt;span class="c1"&gt;// schemars: generates JSON Schema from Rust types (so the LLM knows what to send)&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;server&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;typed_tool&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;TypedToolWithOutput&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;RequestHandlerExtra&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;serde&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;Deserialize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Serialize&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;schemars&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JsonSchema&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// -- Input Schema --&lt;/span&gt;
&lt;span class="c1"&gt;// This struct defines what the LLM must send. Each field becomes a property&lt;/span&gt;
&lt;span class="c1"&gt;// in the JSON Schema that the LLM sees when it discovers this tool.&lt;/span&gt;
&lt;span class="c1"&gt;// The doc comments (///) become the schema descriptions automatically.&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="c1"&gt;// Annotations on each field define constraints that flow into the&lt;/span&gt;
&lt;span class="c1"&gt;// JSON Schema. The LLM sees "maxLength": 16 on the SKU field and&lt;/span&gt;
&lt;span class="c1"&gt;// "minimum": 1 on quantity BEFORE it calls the tool. A well-behaved&lt;/span&gt;
&lt;span class="c1"&gt;// client respects these; the server enforces them at runtime too.&lt;/span&gt;
&lt;span class="c1"&gt;// deny_unknown_fields rejects any extra fields the LLM might add.&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Deserialize,&lt;/span&gt; &lt;span class="nd"&gt;JsonSchema)]&lt;/span&gt;
&lt;span class="nd"&gt;#[schemars(deny_unknown_fields)]&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;CheckInventoryInput&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cd"&gt;/// Product SKU to look up (e.g., "WIDGET-42", "BOLT-7")&lt;/span&gt;
    &lt;span class="nd"&gt;#[schemars(length(max&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
    &lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="cd"&gt;/// Number of items needed. Defaults to 1 if not specified.&lt;/span&gt;
    &lt;span class="cd"&gt;/// Use this to check whether a specific quantity is available&lt;/span&gt;
    &lt;span class="cd"&gt;/// before quoting delivery dates.&lt;/span&gt;
    &lt;span class="nd"&gt;#[serde(default&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"default_quantity"&lt;/span&gt;&lt;span class="nd"&gt;)]&lt;/span&gt;
    &lt;span class="nd"&gt;#[schemars(range(min&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nd"&gt;,&lt;/span&gt; &lt;span class="nd"&gt;max&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
    &lt;span class="n"&gt;quantity_needed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Default value: if the LLM doesn't specify a quantity, assume 1&lt;/span&gt;
&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;default_quantity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// -- Output Schema --&lt;/span&gt;
&lt;span class="c1"&gt;// Defining the output shape serves two purposes:&lt;/span&gt;
&lt;span class="c1"&gt;// 1. The LLM knows exactly what fields to expect in the response&lt;/span&gt;
&lt;span class="c1"&gt;// 2. Downstream tools or MCP Apps can rely on this structure&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Serialize,&lt;/span&gt; &lt;span class="nd"&gt;JsonSchema)]&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;InventoryResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cd"&gt;/// The product SKU that was checked&lt;/span&gt;
    &lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Whether the requested quantity is currently in stock&lt;/span&gt;
    &lt;span class="n"&gt;in_stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Total quantity available in warehouse&lt;/span&gt;
    &lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Whether the requested quantity can be fulfilled&lt;/span&gt;
    &lt;span class="n"&gt;sufficient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// -- Register the tool with the server --&lt;/span&gt;
&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;anyhow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;ServerBuilder&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"inventory-server"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.version&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"1.0.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;// Register the tool: name, handler with typed input and output,&lt;/span&gt;
        &lt;span class="c1"&gt;// plus a description that tells the LLM WHAT it does, WHEN to&lt;/span&gt;
        &lt;span class="c1"&gt;// use it, and what it RETURNS.&lt;/span&gt;
        &lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"check_inventory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nn"&gt;TypedToolWithOutput&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"check_inventory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CheckInventoryInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_extra&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RequestHandlerExtra&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="nn"&gt;Box&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;pin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="c1"&gt;// In production, this queries your inventory database.&lt;/span&gt;
                        &lt;span class="c1"&gt;// Here we return a mock response for clarity.&lt;/span&gt;
                        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;847_u32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;InventoryResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="py"&gt;.sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;in_stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;sufficient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="py"&gt;.quantity_needed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="p"&gt;})&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;.with_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"Check inventory levels for a product by SKU. Returns stock &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 status, available quantity, and whether the requested amount &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 can be fulfilled. Use this before quoting delivery dates &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
                 to customers."&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Start the server over Streamable HTTP -- the production transport.&lt;/span&gt;
    &lt;span class="c1"&gt;// This makes your server accessible to any MCP client over the network:&lt;/span&gt;
    &lt;span class="c1"&gt;// Claude Desktop, ChatGPT, custom agents, or browser-based tools.&lt;/span&gt;
    &lt;span class="c1"&gt;// Unlike stdio (which requires local installation), HTTP lets&lt;/span&gt;
    &lt;span class="c1"&gt;// non-technical users connect without touching a terminal.&lt;/span&gt;
    &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="nf"&gt;.run_streamable_http&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.0.0.0:3000"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While we're using Rust and the PMCP SDK throughout this series, the design principles, mainly typed schemas, descriptive names, and structured output, apply to any MCP-compliant server, whether TypeScript, Python, or anything else that speaks the protocol. These are protocol-level concerns, not language-level ones.&lt;/p&gt;

&lt;p&gt;Let's walk through each component.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Name&lt;/strong&gt; (&lt;code&gt;"check_inventory"&lt;/code&gt;): The name follows a &lt;code&gt;verb_noun&lt;/code&gt; pattern. It's unambiguous, and the LLM won't mistake it for a tool for updating inventory or listing products. Avoid generic names like &lt;code&gt;get_data&lt;/code&gt; or &lt;code&gt;process_request&lt;/code&gt;. The name is the LLM's first signal about what a tool does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description&lt;/strong&gt;: This is the LLM's primary decision surface. Notice it does three things: says what the tool does ("check inventory levels"), what it returns ("stock status, available quantity, and whether the requested amount can be fulfilled"), which helps the LLM understand if the tool can answer the user's request, and when to use it ("before quoting delivery dates to customers"). That last part is critical. It tells the LLM about the workflow context, which the description's author — the business analyst — knows but the LLM doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input schema&lt;/strong&gt;: The &lt;code&gt;CheckInventoryInput&lt;/code&gt; struct defines what the LLM must send. Each field has a type (the LLM can't accidentally pass a string where a number is expected), a doc comment that becomes the JSON Schema description (the LLM sees "Product SKU to look up" when it discovers the tool), and optional defaults (&lt;code&gt;quantity_needed&lt;/code&gt; defaults to 1 if omitted). The &lt;code&gt;#[schemars(...)]&lt;/code&gt; annotations are the single source of truth for constraints: &lt;code&gt;length(max = 16)&lt;/code&gt; on the SKU field generates &lt;code&gt;"maxLength": 16&lt;/code&gt; in the JSON Schema, and &lt;code&gt;range(min = 1, max = 10000)&lt;/code&gt; on quantity generates &lt;code&gt;"minimum": 1, "maximum": 10000&lt;/code&gt;. The LLM sees these rules when it discovers the tool, before it ever makes a call. And &lt;code&gt;#[schemars(deny_unknown_fields)]&lt;/code&gt; on the struct means the LLM can't sneak in extra fields, as anything outside &lt;code&gt;sku&lt;/code&gt; and &lt;code&gt;quantity_needed&lt;/code&gt; is rejected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output schema&lt;/strong&gt;: The &lt;code&gt;InventoryResult&lt;/code&gt; struct defines what the tool returns. This is optional in the MCP spec, but we strongly recommend it. A defined output schema serves two purposes: the LLM knows exactly what fields to expect (it won't hallucinate response fields that don't exist), and downstream consumers, whether another tool in a chain or an MCP App rendering a UI widget, can rely on the structure. The &lt;code&gt;sufficient&lt;/code&gt; field is a good example: it performs the comparison server-side rather than asking the LLM to compare &lt;code&gt;available&lt;/code&gt; against &lt;code&gt;quantity_needed&lt;/code&gt;, which risks getting it wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handler&lt;/strong&gt;: The async closure that does the actual work. In this example, it returns a mock response for clarity. In production, this would query your inventory database, call a warehouse API, or perform whatever computation the tool promises. Notice that the handler receives a typed &lt;code&gt;CheckInventoryInput&lt;/code&gt; and not raw JSON. The parsing already happened. Your handler code focuses on business logic, not input validation. This is the server's contribution to the Capability Square: reliable, deterministic execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation&lt;/strong&gt;: Notice that constraints are declared once, on the struct fields, using &lt;code&gt;#[schemars(...)]&lt;/code&gt; annotations. The same annotation serves two purposes: it generates the JSON Schema that the LLM reads at discovery time, and it defines the contract the server enforces at runtime. No duplication between schema and validation logic, where the struct is the single source of truth.&lt;/p&gt;

&lt;p&gt;Security in MCP servers operates in layers, and schema constraints are among the easiest to add. First, serde enforces type safety: &lt;code&gt;sku&lt;/code&gt; must be a string, &lt;code&gt;quantity_needed&lt;/code&gt; must be an unsigned integer, and type-level attacks are blocked at deserialization before your code runs. Second, &lt;code&gt;#[schemars(length(max = 16))]&lt;/code&gt; constrains input shape: it won't prevent SQL injection on its own (that's the job of parameterized queries and safe query construction in your database layer), but it does reject obviously malformed or abusive input early, before it reaches any downstream system. Real SKUs are short; a 200-character string is either a mistake or a probe, and there's no reason to let it through. Third, &lt;code&gt;deny_unknown_fields&lt;/code&gt; prevents unexpected fields from slipping past the schema entirely. Each layer is simple, but together they significantly reduce the attack surface. The deeper security story, such as parameterized queries, OAuth 2.1, Rust's memory safety guarantees, and the OWASP MCP threat model, gets its own article later in this series.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error handling&lt;/strong&gt;: If the LLM sends input that doesn't match &lt;code&gt;CheckInventoryInput&lt;/code&gt;, such as, passing &lt;code&gt;"sku": 42&lt;/code&gt; instead of &lt;code&gt;"sku": "WIDGET-42"&lt;/code&gt;, serde produces an error message explaining the type mismatch. If the SKU exceeds 16 characters, the schema constraint rejects it before the handler runs. For business logic errors inside the handler, use &lt;code&gt;pmcp::Error::validation()&lt;/code&gt; with actionable messages following a three-part template: what went wrong, what was expected, and an example of correct input. Good error messages suggest one or two specific fixes, since multiple options force the LLM to guess, and guessing wastes tokens and user patience.&lt;/p&gt;

&lt;p&gt;Notice this isn't a local development tool. This is a server designed for a specific business user who needs to quote delivery dates. The business analyst decided that inventory checks matter for their users, and encoded that context into the description and the output shape. The &lt;code&gt;sufficient&lt;/code&gt; field exists because the business analyst knows that customers ask "do you have enough?" not "how many do you have?" A different business analyst building for a warehouse manager might expose entirely different tools from the same inventory system.&lt;/p&gt;

&lt;p&gt;Can an LLM discover this tool and call it correctly on the first try? If not, your name or description needs work. That's the simplest measurement of tool design quality, and it's one you can test in five minutes with any MCP client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outcomes, Not Operations
&lt;/h2&gt;

&lt;p&gt;The business analyst in the Capability Square knows something the LLM never will: what outcome the business user actually wants. When a customer asks "where's my order?", they don't want a customer ID, then a list of order IDs, then a status lookup. They want a tracking link and an ETA. The difference between those two experiences is the difference between operation-oriented and outcome-oriented tool design.&lt;/p&gt;

&lt;p&gt;Here's the anti-pattern. A team with a REST background wraps their existing endpoints as MCP tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;get_customer_by_email(email)&lt;/code&gt; returns a &lt;code&gt;customer_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;list_customer_orders(customer_id)&lt;/code&gt; returns an array of &lt;code&gt;order_id&lt;/code&gt; values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_order_status(order_id)&lt;/code&gt; returns a status string&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To answer "where's my order?", the LLM must chain all three calls in the correct sequence. The costs compound at every step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More tokens.&lt;/strong&gt; The LLM processes the full response from each tool call and generates the next call. Three round-trips mean three times as many input and output tokens, which is a cost the user incurs without getting any additional value.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More latency.&lt;/strong&gt; Each step requires a network round-trip to the MCP server plus LLM processing time to interpret the result and formulate the next call. What could be a sub-second single call becomes a multi-second chain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Growing risk of misstep.&lt;/strong&gt; The probability of a correct sequence is the product of each step's success rate. If each tool call has a 95% chance of correct execution, three chained calls drop to 85.7%. At five steps, you're at 77.4%. The LLM must remember variable names and values from earlier calls, handle edge cases at each step, and maintain coherence across the full chain. Each step is another opportunity for the model to hallucinate a parameter, misinterpret a response, or lose track of its plan.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Operation-Oriented (REST style)&lt;/th&gt;
&lt;th&gt;Outcome-Oriented (MCP style)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool Count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (1 per endpoint)&lt;/td&gt;
&lt;td&gt;Low (1 per user goal)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Effort&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (choreographing multi-step chains)&lt;/td&gt;
&lt;td&gt;Low (single-shot invocation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (processing every intermediate result)&lt;/td&gt;
&lt;td&gt;Low (one request, one response)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (N round trips + N LLM inferences)&lt;/td&gt;
&lt;td&gt;Low (single round trip)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reliability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low (3+ compounding points of failure)&lt;/td&gt;
&lt;td&gt;High (deterministic server logic)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now consider the outcome-oriented alternative:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// -- Input: just the customer's email --&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Deserialize,&lt;/span&gt; &lt;span class="nd"&gt;JsonSchema)]&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;TrackOrderInput&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cd"&gt;/// Customer email address (e.g., "alice@company.com")&lt;/span&gt;
    &lt;span class="nd"&gt;#[schemars(length(max&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;254&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// -- Status enum: the LLM sees valid values in the schema --&lt;/span&gt;
&lt;span class="c1"&gt;// Instead of a free-form string, an enum lets the LLM &lt;/span&gt;
&lt;span class="c1"&gt;// "recognize" valid statuses rather than "recall" them from &lt;/span&gt;
&lt;span class="c1"&gt;// memory.&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Serialize,&lt;/span&gt; &lt;span class="nd"&gt;JsonSchema)]&lt;/span&gt;
&lt;span class="nd"&gt;#[serde(rename_all&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"snake_case"&lt;/span&gt;&lt;span class="nd"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;OrderStatus&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Processing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Shipped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;InTransit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Delivered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// -- Output: everything the LLM needs to answer the question --&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Serialize,&lt;/span&gt; &lt;span class="nd"&gt;JsonSchema)]&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;OrderTrackingResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cd"&gt;/// Customer name for the greeting&lt;/span&gt;
    &lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Order identifier&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Current order status&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OrderStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Shipping carrier name&lt;/span&gt;
    &lt;span class="n"&gt;carrier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Estimated delivery date (ISO 8601)&lt;/span&gt;
    &lt;span class="n"&gt;eta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="cd"&gt;/// Direct tracking URL the customer can click&lt;/span&gt;
    &lt;span class="n"&gt;tracking_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool registration follows the same pattern as &lt;code&gt;check_inventory&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nf"&gt;.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"track_latest_order"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nn"&gt;TypedToolWithOutput&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"track_latest_order"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TrackOrderInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_extra&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RequestHandlerExtra&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nn"&gt;Box&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;pin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c1"&gt;// Internally: resolve customer, find latest order, get status.&lt;/span&gt;
                &lt;span class="c1"&gt;// The server handles the entire chain -- three API calls&lt;/span&gt;
                &lt;span class="c1"&gt;// collapsed into one deterministic operation.&lt;/span&gt;
                &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OrderTrackingResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Alice Chen"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"ORD-8834"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nn"&gt;OrderStatus&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;InTransit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;carrier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"FedEx"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="n"&gt;eta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"2026-03-20"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="n"&gt;tracking_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"https://fedex.com/track/ABC123"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.with_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"Track the most recent order for a customer using their email. &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
         Returns order status, carrier info, and tracking link. Use this &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
         when a customer asks 'where is my order?' or 'when will it arrive?'"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One tool. One user outcome. The output struct gives the LLM a rich, typed response, with customer name, status, carrier, ETA, and a clickable tracking URL, which is everything it needs to answer the question in a single turn. The server handles the chaining internally (resolve customer, find latest order, fetch status) because that's what servers are good at: deterministic, multi-step computation. In a production environment, your server handles requests from business users who don't know MCP exists and don't care about your API structure. They just want answers. In the Capability Square, symbolic computation and data access are the server's strengths. Let the server do the work it's built for, and let the LLM do what it's built for: understanding the user's intent and presenting a clear answer.&lt;/p&gt;

&lt;p&gt;This isn't a theoretical pattern. &lt;a href="https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers" rel="noopener noreferrer"&gt;Block built 60+ production MCP servers&lt;/a&gt;. Their Linear integration started with 30+ tools that mirrored GraphQL endpoints, with one tool per query and one per mutation. After three iterations, they were down to 2 tools. The tool count dropped because the team learned to design for outcomes. Each iteration moved complexity from the LLM (which had to choreograph multi-tool sequences) into the server (which could handle the orchestration deterministically).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measurement point:&lt;/strong&gt; Test this yourself. Give 10 users the same task ("find my latest order status"). With the 3-tool REST mapping, measure how many succeed on the first try. Now try the single outcome-oriented tool. The difference in task completion rates is your design-quality signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Less Is More: The Evidence for Tool Reduction
&lt;/h2&gt;

&lt;p&gt;Outcome-oriented design naturally reduces tool count. But how much does reduction actually matter? The research is unambiguous.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.blog/ai-and-ml/github-copilot/how-were-making-github-copilot-smarter-with-fewer-tools/" rel="noopener noreferrer"&gt;GitHub reduced their Copilot MCP integration&lt;/a&gt; from 40 built-in tools to 13 core tools. The result: 2 to 5 percentage point improvement across SWE-Lancer and SWEbench-Verified benchmarks, plus a 400ms latency reduction. Fewer tools meant the model spent less time on tool selection and more time on the actual task. The gains came not from adding capability, but from removing it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.speakeasy.com/mcp/tool-design/less-is-more" rel="noopener noreferrer"&gt;The Speakeasy team ran a controlled experiment&lt;/a&gt; using a Pet Store API. At 107 tools, both large and small models failed completely, and task success collapsed. At 20 tools, large models scored 19 out of 20 correct. At 10 tools, performance was perfect. The failure wasn't gradual. It was a cliff: past a threshold, models don't degrade gracefully. They fall off.&lt;/p&gt;

&lt;p&gt;Why does success collapse rather than degrade? Two mechanisms compound. First, &lt;strong&gt;context window bloat&lt;/strong&gt;: every tool name, description, and parameter schema consumes tokens on every request. At 50+ tools, this can eat 5 to 7 percent of the model's context before a single user message arrives, thus crowding out conversation history, document content, and reasoning space. Second, and more insidious, is &lt;strong&gt;tool hallucination&lt;/strong&gt;: when the LLM's attention is spread across too many similar-sounding tools, it starts inventing nonexistent tool names, conflating parameters between tools, or calling the right tool while using arguments from a different tool's schema. This is the same "instruction following degradation" that causes LLMs to drift off-task in long prompts, except here, each hallucinated tool call is a hard failure, not a soft one. The model doesn't produce a slightly wrong answer. It produces no answer at all.&lt;/p&gt;

&lt;p&gt;In UX terms, this is information overload. Just as a human can't choose from a menu of 100 items without decision fatigue, an LLM's attention fragments across too many similar-sounding options. The threshold varies by model size. Small models (8B parameters) hit their sweet spot around 19 tools and fail at 46. Even the largest models struggle past 100 tools.&lt;/p&gt;

&lt;p&gt;As Hugging Face's Phil Schmid &lt;a href="https://www.philschmid.de/mcp-best-practices" rel="noopener noreferrer"&gt;puts it&lt;/a&gt;: "Curate ruthlessly. 5 to 15 tools per server. One server, one job."&lt;/p&gt;

&lt;p&gt;This raises an obvious question: if you expose only 10 to 15 tools, aren't you leaving functionality on the table? Yes! and deliberately. And that's the right choice. We'll see why shortly, when we look at how much of an API your users actually need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measurement point:&lt;/strong&gt; Count your tools. If you have more than 15 per server, you're likely past the diminishing returns threshold. Benchmark your task completion rate before and after pruning, and the numbers will make the case for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 97% Problem: Tool Description Quality
&lt;/h2&gt;

&lt;p&gt;You can have the right number of tools, designed for the right outcomes, and still fail. &lt;a href="https://arxiv.org/html/2602.14878v1" rel="noopener noreferrer"&gt;A 2025 study analyzing MCP tool descriptions&lt;/a&gt; across the ecosystem found that 97.1% contain at least one quality issue. More than half (56%) have unclear purpose statements. Your tools might be well-designed, but if the LLM can't understand when to use them, that design is invisible.&lt;/p&gt;

&lt;p&gt;Tool descriptions are not documentation. They are the LLM's primary decision surface. When the LLM sees 15 tools and must choose one, the description is the only signal it has. A vague description is like a restaurant menu that says "food" for every dish, which is technically accurate, but practically useless.&lt;/p&gt;

&lt;p&gt;The research identified six components of a quality tool description: &lt;strong&gt;Purpose&lt;/strong&gt; (what the tool does), &lt;strong&gt;Guidelines&lt;/strong&gt; (when and how to use it), &lt;strong&gt;Limitations&lt;/strong&gt; (what it cannot do or when to use something else), &lt;strong&gt;Parameter Explanation&lt;/strong&gt; (input format and constraints), &lt;strong&gt;Length&lt;/strong&gt; (enough detail without overwhelming), and &lt;strong&gt;Examples&lt;/strong&gt; (concrete usage scenarios). Most descriptions fail on multiple components simultaneously.&lt;/p&gt;

&lt;p&gt;Here's what the improvement looks like in practice. Consider a flight search tool across three levels of description quality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LEVEL 1 -- Vague (56% of MCP tools have this problem)&lt;/span&gt;
&lt;span class="nf"&gt;.with_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Search for flights"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// LEVEL 2 -- Better purpose, but missing guidelines and limitations&lt;/span&gt;
&lt;span class="nf"&gt;.with_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Search for available flights between two airports on a given date"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// LEVEL 3 -- Full rubric: purpose + guidelines + limitations&lt;/span&gt;
&lt;span class="nf"&gt;.with_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"Search for available flights between two airports on a specific date. &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
     Returns up to 20 results sorted by price. Use 3-letter IATA airport &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
     codes (e.g., 'LAX', 'JFK'). Only searches economy class. For business &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
     or first class, use the premium_flight_search tool. Dates must be &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
     within the next 330 days."&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Level 1 tells the LLM nothing about parameters, constraints, or when to use an alternative tool. The LLM has to guess at everything: the input format, the result shape, and the scope. Level 2 adds purpose: the LLM knows it needs two airports and a date, but it doesn't know about airport code formats, result limits, or class restrictions. It might pass "Los Angeles" instead of "LAX", or ask for business-class flights and get the wrong results. Level 3 gives the LLM everything it needs to (a) decide to use this tool, (b) provide correct inputs, and (c) know when NOT to use it, that last point being critical for multi-tool servers where the LLM must choose between similar options.&lt;/p&gt;

&lt;p&gt;In the same study, augmented descriptions improved task success by 5.85 percentage points in controlled testing. That may sound modest, but at scale it's the difference between a tool that works most of the time and one that works almost all of the time. For a customer-facing agent handling thousands of requests per day, those percentage points represent real users getting real answers.&lt;/p&gt;

&lt;p&gt;Description quality extends to error messages. When a tool receives invalid input, the error message is the LLM's only guide for recovery. Compare these two approaches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// BAD: LLM tries random fixes&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;validation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Invalid input"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;// GOOD: Problem + expectation + example&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;pmcp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;validation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"Invalid date format for 'departure': '15/04/2026'. &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
     Use ISO 8601 format (YYYY-MM-DD). &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
     Example: '2026-04-15'"&lt;/span&gt;
&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first error forces the LLM to guess. It might try a different date format, or remove the date entirely, or change a different parameter. Each wrong guess wastes a round trip and user patience. The second error follows a three-part template: what went wrong ("invalid date format for 'departure'"), what was expected ("ISO 8601 format"), and an example of correct input ("2026-04-15"). Suggest one or two fixes maximum. Multiple options force the LLM to guess, and guessing is what we're trying to eliminate.&lt;/p&gt;

&lt;p&gt;This is where the typed struct pattern we saw in the Tool Anatomy section pays off. Remember how &lt;code&gt;CheckInventoryInput&lt;/code&gt; used doc comments on each field to generate JSON Schema descriptions? The same pattern applies to every tool. When the business analyst writes &lt;code&gt;/// Product SKU to look up (e.g., "WIDGET-42", "BOLT-7")&lt;/code&gt; on a struct field, that text becomes the LLM's guide for formatting its input. The type system enforces correctness at parse time, before the handler code ever runs. And the output schema tells the LLM exactly what fields to expect, so it won't hallucinate response fields that don't exist.&lt;/p&gt;

&lt;p&gt;This connects back to the Capability Square. The &lt;strong&gt;business analyst&lt;/strong&gt; writes these descriptions at design time, capturing what the &lt;strong&gt;business user&lt;/strong&gt; will eventually ask for—in their own words. The &lt;strong&gt;LLM&lt;/strong&gt; reads the descriptions at runtime and translates the user's phrasing into a tool call. The &lt;strong&gt;server&lt;/strong&gt; validates the call against the same schema the analyst authored. All four corners aligned, with the analyst's design-time knowledge guiding the LLM's runtime decisions and the server's runtime enforcement, in service of a business user who never has to see any of it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measurement point:&lt;/strong&gt; Test your description quality by presenting your tool list to an LLM and asking it to select the right tool for 10 different user requests. If tool selection accuracy is below 90%, your descriptions need work. This test takes five minutes and tells you more about your MCP server's real-world effectiveness than any benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full-API Trap (And the Pareto Escape)
&lt;/h2&gt;

&lt;p&gt;We said that exposing only 10 to 15 tools means leaving functionality on the table. Now let's talk about why that's the right call.&lt;/p&gt;

&lt;p&gt;The most tempting mistake in MCP server design is wrapping your entire API. You have 200 endpoints, so you generate 200 tools. The OpenAPI-to-MCP converter makes it easy. The result is a server that does everything and succeeds at nothing. The LLM sees 200 tool descriptions, burns through context window space parsing them, and still picks the wrong one, because 200 options is not a menu, it's a phone book.&lt;/p&gt;

&lt;p&gt;The deeper problem is &lt;strong&gt;semantic noise&lt;/strong&gt;. When you auto-wrap an API, you inject your backend's implementation details into the LLM's reasoning space. The LLM shouldn't have to understand your database normalization, your internal microservice boundaries, or your pagination cursor format. It should see tools that map cleanly to user intent. Auto-wrapping exposes tools like &lt;code&gt;get_customer_by_internal_id&lt;/code&gt; and &lt;code&gt;list_orders_with_cursor_pagination&lt;/code&gt;, which exist because of how your backend is built, not because of what your users need. Every implementation-detail tool is noise that the LLM must parse, evaluate, and reject before it can find the tool that actually answers the user's question.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Pareto Escape
&lt;/h3&gt;

&lt;p&gt;The way out is the 80/20 rule. In practice, roughly 20% of an API's capabilities serve 80% of user requests. The business analyst — one of the two human corners of the Capability Square — is the person who knows which 20%, because they share a domain with the business users they're designing for.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0sdk3d1s00al7pikz8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0sdk3d1s00al7pikz8t.png" alt="Request Distribution: The 80/20 Rule" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The left side of the curve is where your MCP tools live: the high-frequency request types that your users ask for every day. These are the 10 to 15 outcome-oriented tools you design carefully, with typed schemas, quality descriptions, and validation constraints. They handle the bulk of traffic reliably and fast.&lt;/p&gt;

&lt;p&gt;The right side is the long tail: rare, unpredictable requests that don't justify a dedicated tool. Creating tools for every edge case pushes you back into the 50+ tool zone where LLM performance collapses. The Pareto line is where you stop adding tools and start thinking about a different mechanism for everything to its right.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers" rel="noopener noreferrer"&gt;Block has built over 60 production MCP servers&lt;/a&gt;. Their consistent finding: the generate-then-prune workflow is standard practice. Generate from your API spec, then ruthlessly cut. Most teams end up keeping 10 to 15 percent of what they started with. The tools that survive are those that map to actual user outcomes, and identifying those outcomes requires a business analyst who shares a domain with business users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measurement point:&lt;/strong&gt; Provide your MCP server to 20 business users who represent your target persona. Track task completion rates across their actual requests, and not your limited test cases, their requests. If completion is below 80%, you're either exposing too many tools (confusion) or the wrong tools (coverage gap). The Capability Square tells you which: if the LLM selects wrong tools, fix descriptions. If the right tool doesn't exist, the business analyst chose the wrong 20% — re-observe the users. If the tool exists but returns unhelpful results, fix the server's implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  For the Other 80%: A Preview of Code Mode
&lt;/h2&gt;

&lt;p&gt;So you've curated your tools to the critical 20%. But users will inevitably ask for something outside that set. What then?&lt;/p&gt;

&lt;p&gt;This is where code mode enters. Instead of creating a tool for every possible request, you let the LLM write code that calls your API directly. &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;Anthropic's engineering team found&lt;/a&gt; that code execution reduced token usage from 150,000 to 2,000 tokens (98.7% reduction) in a Google Drive to Salesforce workflow. The LLM writes a targeted script, executes it, and returns just the result. No 200-tool context window bloat. No multi-step tool chaining. Just precise, one-shot computation.&lt;/p&gt;

&lt;p&gt;The playbook: design 10 to 15 outcome-oriented tools for the common 80% of requests. For the long tail, provide code mode access with appropriate guardrails. This gives you broad coverage without the tool count explosion that kills LLM performance. Your curated tools handle the predictable workflows fast. Code mode handles the unpredictable ones flexibly.&lt;/p&gt;

&lt;p&gt;We'll cover code mode in depth in a later article in this series: how to set it up, how to secure it, and when it's the right (and wrong) choice. For now, the key insight is that tool reduction isn't about limiting your users. It's about choosing the right mechanism for each type of request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Capability Square drives everything.&lt;/strong&gt; Good tool design requires balancing four parties: what the &lt;strong&gt;LLM&lt;/strong&gt; can do (interpret intent, select tools), what the &lt;strong&gt;server&lt;/strong&gt; should do (precise computation, data access), what the &lt;strong&gt;business analyst&lt;/strong&gt; knows at design time (which capabilities matter, how to describe them, which user persona they serve), and what the &lt;strong&gt;business user&lt;/strong&gt; brings at runtime (the actual request and its business context). When any corner is weak, task completion suffers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design for outcomes, not operations.&lt;/strong&gt; One tool per user goal, not one tool per API endpoint. The customer asking "where's my order?" wants a tracking link, not three chained API calls. Move orchestration complexity into the server, where it runs deterministically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Less is more.&lt;/strong&gt; Keep servers to 5 to 15 tools. Evidence from GitHub Copilot, Speakeasy, and Block consistently shows that performance degrades sharply past 20 tools. The failure is not gradual; it's a cliff.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Descriptions are the user interface.&lt;/strong&gt; Use the six-component rubric: Purpose, Guidelines, Limitations, Parameters, Length, and Examples. With 97% of tool descriptions containing quality issues, this is a big opportunity for immediate improvement.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Error messages are recovery instructions.&lt;/strong&gt; Use the three-part template: what went wrong, what was expected, and an example of correct input. Suggest one or two fixes, not five. Ambiguity in error messages wastes round trips and user patience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Know your users — and design from inside their domain.&lt;/strong&gt; The business analyst corner of the Capability Square determines which 20% of the API capabilities to expose. The same API should produce different MCP servers for different business user personas. Auto-wrapping skips this judgment and produces servers that serve no one well. Handing the design to a purely technical team — engineers who don't live in the business domain — produces the same failure mode for the same reason: design calls made without domain fluency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Measure task completion across diverse business user requests.&lt;/strong&gt; Not your three test cases during development, but real requests from real business users representing your target persona. If completion is low, the Capability Square tells you which corner to fix.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Continue the Series
&lt;/h2&gt;

&lt;p&gt;This article covered the foundation: how to design MCP tools that LLMs can actually use. The rest of the series goes deeper.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Want to add user-controlled workflows?&lt;/strong&gt; Read our article on &lt;strong&gt;Prompts and Resources&lt;/strong&gt;, where we cover MCP's underutilized primitives for guided interactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ready to test your server?&lt;/strong&gt; See &lt;strong&gt;Testing MCP Servers&lt;/strong&gt; for unit testing, integration testing, and description quality validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concerned about security?&lt;/strong&gt; &lt;strong&gt;MCP Security&lt;/strong&gt; covers OAuth 2.1, input validation, and the common vulnerabilities that &lt;a href="https://astrix.security/research/the-state-of-mcp-security/" rel="noopener noreferrer"&gt;affect 43% of MCP servers&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building from an existing API spec?&lt;/strong&gt; &lt;strong&gt;Schema-Driven MCP Servers&lt;/strong&gt; shows the generate-then-prune workflow in detail, from OpenAPI spec to curated tool set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interested in code mode?&lt;/strong&gt; &lt;strong&gt;Code Mode for MCP&lt;/strong&gt; explores the long-tail strategy we previewed above: how to let the LLM write code safely against your API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For hands-on practice with these patterns, the &lt;a href="https://advanced-mcp-course.us-east.true-mcp.com/landing" rel="noopener noreferrer"&gt;Advanced MCP course&lt;/a&gt; provides guided exercises building production MCP servers in Rust with the PMCP SDK.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>rust</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building a Successful Modern Data Analytics Platform in the Cloud</title>
      <dc:creator>Guy</dc:creator>
      <pubDate>Sun, 20 Oct 2019 16:27:36 +0000</pubDate>
      <link>https://dev.to/guyernest/building-a-successful-modern-data-analytics-platform-in-the-cloud-5c06</link>
      <guid>https://dev.to/guyernest/building-a-successful-modern-data-analytics-platform-in-the-cloud-5c06</guid>
      <description>&lt;p&gt;I worked with dozens of companies migrating their legacy data warehouses or analytical databases to the cloud. I saw the difficulty to let go of the monolithic thinking and design and to benefit from the modern cloud architecture fully. In this article, I’ll share my pattern for a scalable, flexible, and cost-effective data analytics platform in the AWS cloud, which was successfully implemented in these companies.&lt;br&gt;
&lt;strong&gt;TL;DR, design the data platform with three layers, L1 with raw files data, L2 with optimized files data, and L3 with cache in mind. Ingest the data as it comes into L1, and transform each use-case independently into L2, and when a specific access pattern demands it, cache some of the data into a dedicated data store.&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13w2yvau3a9uhv7mk8lk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13w2yvau3a9uhv7mk8lk.png" alt="Data Size to Access Scale Balance" width="521" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 1: “One Data Store To Rule Them All”
&lt;/h2&gt;

&lt;p&gt;The main difference that companies are facing when modernizing their existing data analytics platform is giving up on a single database that was used in their legacy system. It is hard to give on it after the massive investment of building it and operating it. I met companies that spent millions of dollars and hundreds of years of development to built their data warehouse and the many ETL processes, stored procedures, and reporting tools that are part of it. It is also hard to give up on the benefits that a single tool is giving in terms of “a single neck to choke,” or answers to “where is the (analytical) data that I need?”.&lt;br&gt;
A few days ago, Amazon.com announced that they finally shut down the last Oracle database in their retail business. It was a long process that was running for more than four years. My first role as a solutions architect for Amazon.com was to help with the design of the migration away from relational databases in general and Oracle specifically. I worked with dozens of teams across the business to re-design their systems from the classical relational to the more scalable and flexible newer data stores. The goal was to shift to NoSQL (DynamoDB mainly) or analytical (Amazon Redshift was the main target then) databases. It was hard for the teams to give up on the easy life of being able to query (or search, as they called it) on every column, to use standard query language as SQL for all their data needs, and mainly to use the tools they were familiar with. However, Amazon.com took the long term perspective and decided to invest in building infrastructure that is (almost infinitely) scalable. They wanted to be able to grow their business without technical limitations.&lt;br&gt;
During these years, Amazon.com, which is famous for its “simplify and invent” principle, built many tools to make this migration easier. They also built, using AWS, a set of new databases that can be used as targets, mainly Amazon Aurora (almost drop-in replacement for Oracle with its PostgreSQL flavor), and Amazon Athena, which we will discuss shortly.&lt;br&gt;
The limitations of a single tool are also apparent to many companies, in terms of flexibility, scale, cost, agility, and others that are part of modern architecture in the cloud. However, the break down or curve out of a monolithic system is painful for most companies. Therefore, many companies desire to replace a non-scalable and expensive on-premises database, such as Oracle or MS-SQL, with a cloud service, such as Amazon Redshift (data warehouse service) or Amazon Athena (managed Presto service), Azure Databricks (managed Spark service) or Google BigQuery. They are hoping that the single cloud service will replace the single monolithic on-prem database. Sadly, this is often a disappointment, as the limitation is on using a single tool and not only on where they are operating.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 2: “Hadoop is dead, long live the new king — Spark.”
&lt;/h2&gt;

&lt;p&gt;Every about five years, new technology is coming along and changing the way to build a modern architecture. Ten years ago, it was Hadoop that opened up scalable opportunities to handle a large amount of data with tools such as Hive, Pig, HBase, and others. Five years ago, it was Spark that changed the game with much faster big data processing, better SQL than Hive, newer functional programming languages with Scala and Python than Hadoop’s Java, new streaming capabilities, and many others.&lt;br&gt;
Spark is also enjoying the maturity of the tools and the popularity among many big data developers. The combination of running Spark SQL, Spark Streaming, and even machine learning with Spark MLlib is very appealing, and many companies have standardized their big data on Spark. However, the growth of the popularity and need for data analytics and machine learning exposed the limitations of Spark. As a Spark expert, I’m often asked to come to review and fix the Spark code that is too complex or too slow as it grows. I also see many companies trying to build their machine learning using the Spark library, which Databricks is developing and pushing a lot.&lt;br&gt;
My recommendation now is to write the data transformation logic using SQL based on PrestoDB. SQL has many benefits compared to the Scala or Python of Spark, mainly in its concise form, fewer bugs that can sneak into the code, and many more people who can write the logic using it. The main objection I get is based on the resistance of the current developers who are less comfortable with SQL than with Scala or Python they are using today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 3: “So, Presto is the new king.”
&lt;/h2&gt;

&lt;p&gt;The term modern cloud architecture is referring to an architecture that is based on microservices, serverless, and pay-for-what-you-use (and not pay-for-what-you-provision). The poster boy of this modern architecture is AWS Lambda (or Azure Functions/Google Cloud Functions). You write the business logic, and the cloud is managing the rest for you. No more Application servers, no more starting, and terminating servers or virtual machines, no more waiting for the yearly product release, and no more “only Java in production.” The future is thousands of functions developed as needed and executed when needed calling one another in a perfect mesh of business logic and scaled up and down just-in-time.&lt;br&gt;
Amazon Athena is the serverless option when it comes to data. The service is currently running a managed PrestoDB engine. The reason for the “currently” modifier in the previous sentence is to allow Amazon to upgrade the engine based on the best SQL engine on files in S3 at the time. The evolution from Hive, Impala, SparkSQL, and now Presto only proves that we will see in the future an even better engine. Amazon wants to avoid the mistake they did with naming the EMR (used to be Elastic-Map-Reduce) service, which is running today more complex distributed computing than Map-Reduce.&lt;br&gt;
In Amazon Athena, you write your business logic using SQL, and the query is sent to a fleet of workers that are optimized to meet the complexity of the data and the query. In most cases, in a few seconds, you have the query result in a CSV file in S3. No servers to manage, no time to wait for the servers to spin up, and no payment for idle machines. Real serverless.&lt;br&gt;
However, I hear often that Amazon Athena is too expensive, especially when you are running a lot of heavy analytical queries. I listened to the same comments on AWS Lambda. The move from paying once for the resource (cluster of Presto or application server for business logic) to pay-for-what-you-use can be scary and risky. The secret is to know how to optimize your usage, which is usually much harder and less appealing when you are managing your resources.&lt;br&gt;
As the cost of Amazon Athena is based on the amount of data scanned by the query, every reduction in the size of the data reduces the cost of the query. The main mistake in using Athena is using the fact that it can query raw data in huge volume and raw formats like JSON or CSV, and relying on it for too many queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then, what do you recommend?
&lt;/h2&gt;

&lt;p&gt;Let’s summarize what did we learn so far. We learned that we shouldn’t use only a single datastore as in time, it will limit our ability to grow the data usage. We learned that we should be curious and learn, test, and adopt new tools and technology, once they mature from the “nice idea” stage. We should take the long term perspective when designing our technical systems, to allow unlimited business growth for our company.&lt;br&gt;
With this background and vision, we can better explain why do we spend so much effort on the following data tiers, instead of merely dropping everything into a super database _____ (fill in your current favorite database).&lt;br&gt;
I see that you are now ready to see my recommended recipe.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier I (L1) — Raw Data in low-cost storage (such as S3)
&lt;/h3&gt;

&lt;p&gt;All data should land in its raw form from every source with little modification or filtering. The data can come from IoT devices, streaming sources such as Kafka or Kinesis, textual log files, JSON payload from NoSQL or web services interactions, images, videos, textual comments, Excel or CSV files from partners, or anything that you would like one day to analyze and learn from.&lt;br&gt;
The data should NOT be organized nicely with foreign keys between tables, or harmonized to have the same format of the address or product ID. Harmony is not part of tier I, and this is critical to make the system flexible enough to grow. Too many data projects are failing because they take too long to organize all the data without knowing which of the analysis on the data can give significant business values. Thus, Failing to award more investment into the data analytics platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier II (L2) — Multiple optimized data derivatives still in low-cost storage
&lt;/h3&gt;

&lt;p&gt;The second tier is built gradually from the data coming into the first tier above. The second tier is starting as soon as the first file lands in the first tier, and it will evolve as more and more data is coming in. The evolution will be directed based on the data availability, and mainly based on the business usage of the analysis output, such as machine learning predictions or analytical reports. Let’s briefly discuss each part of this tier description:&lt;br&gt;
&lt;strong&gt;Multiple&lt;/strong&gt; — every analytical use case should have its own dedicated and independent flow of data. Even if it means that data will be replicated dozens of times and be calculated differently. Remember that every analysis is looking at the data from a different angle and for a different business need, and eventually, it is developed by a different team. For example, sales analytics is different than marketing analytics or logistics analytics.&lt;br&gt;
&lt;strong&gt;Optimized&lt;/strong&gt; — the transformations of the data from its raw form toward an analytical insight allow tremendous optimization opportunities. An obvious one is taking JSON data and storing it in Parquet format that is both columnar (query on a single column only scans the data of that column) and compressed. In most such transformation, using Create Table As Select (CTAS) in Athena, you can get 1,000–10,000 times cost improvements. The same goes for transcribing audio and video to textual captions or analyzing images to classes, faces sentiments, or face recognition. Running analytical queries on the face sentiments of your customers on different days or stores should be simple and low cost to be used by the business people.&lt;br&gt;
&lt;strong&gt;Data Derivatives&lt;/strong&gt; — The data in the second tier is mostly aggregated, filtered, or transformed from its original raw form to fit a specific business question. If I need to predict the daily sales of a brand, I don’t need to analyze every individual purchase for every unique product. I can look at daily and brand aggregation. We should not be afraid to make the derivative “too specific,” as we still have the raw data in Tier I. We will have many other specific derivatives to the other business use cases. Having the “same” data in different forms is not a problem, as this is not the same data, but a derivative of it.&lt;br&gt;
&lt;strong&gt;Still in low-cost storage&lt;/strong&gt; — if you want to be able to keep dozens of “copies” of the already big data that you have in your company, each “copy” of that data must be very low cost. I saw too many companies trying to work only with raw data (“because we can”) or write too quickly into a database with expensive compute and memory capabilities, and miss on this critical tier II.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tier III (L3) — Optional Cache Datastores
&lt;/h3&gt;

&lt;p&gt;To allow users’ interactions with the results of the data analytics, we often need to cache these results to make them usable for humans, in terms of speed and query capabilities.&lt;br&gt;
The most recommended cache options (and obviously, there are more than one as each is better for different use cases) are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt; for GraphQL access from a client application,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ElasticSearch&lt;/strong&gt; for textual queries,&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; for fast operations on in-memory data sets (such as Sorted-Sets), or&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neptune&lt;/strong&gt; for graph queries (not to be confused with GraphQL).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is also common to cache into a relational database (such as MySQL or Aurora PostgreSQL), which can be OK for relatively small data sets and visualization or BI tools that know how to work only with such databases.&lt;br&gt;
As long as you treat it as cache, understanding that it much more expensive and therefore is used only for the actual use cases of the users, and you can always recreate it or delete as needed, you will have the required flexibility and cost-effectiveness that you need to build your analytical use cases within your organization. It takes time to transform companies to be “smarter” and use data more efficiently, and this time must be planned for agility, cost, scale, and simplicity, which the above architecture provides.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operation: Orchestration and Monitoring
&lt;/h2&gt;

&lt;p&gt;Running such a multi-tier, multi-use-cases, multi-line-of-business, multi-data-store, and other multipliers are not something that you can do manually with a single DBA or even a team of DBAs. You have to have automation in mind from the very beginning of the project.&lt;br&gt;
In many companies, the practices of DevOps already started to evolve, and capabilities around micro-services, containers, continuous integration, and deployment (CI/CD) are already emerging. The migration to the cloud is also a growing interest, plan, and sometimes even execution and contributes to the IT power to support this modern architecture. Nevertheless, the ability to do DataOps efficiently is hard and new to most organizations. The agile and evolving building of the new architecture must include an essential aspect of people skills and choosing the right tools to automate the process.&lt;br&gt;
The main options for orchestration I see today being used most often, are AWS native (Step Functions), Open-Source tools (mainly Apache AirFlow), or managed services (such as Upsolver). I have excellent experience with all these options, and the decision on which way to go is based on your specific use case, data sources, budget, technical capabilities, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s put it all together
&lt;/h2&gt;

&lt;p&gt;The diagram below is often too overwhelming when I show it for the first time, and this is why I kept it only to the end. I hope that after reading the explanations and the reasons for it, you will find it more useful and straightforward to understand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kwfybv2uu2ay6a2i6st.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kwfybv2uu2ay6a2i6st.png" alt="Multi Data Tier Architecture" width="800" height="565"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The diagram shows a specific project I’ve implemented in one of my customers, and the usage of Step Functions for orchestration, DataDog for monitoring or Terraform for deployment, can be replaced with any of your favorite tools (AirFlow, Grafana, and Jenkins, for example). The central concept of the cloud is the modularity of the architectures and the ability to add, replace, scale, and remove any part of it when needed by the business. As long as you are curious and able to learn new and better technologies, in the rapid pace of technological advancements we live in, you can build and operate a powerful and modern data platform. This data platform is an essential part of the &lt;strong&gt;digital&lt;/strong&gt; and &lt;strong&gt;AI transformation&lt;/strong&gt; of every company that wants to stay relevant and competitive today.&lt;/p&gt;

</description>
      <category>bigdata</category>
      <category>aws</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
