DEV Community: Zayne Turner

Recovering from Partial Failures in Enterprise MCP Tools

Zayne Turner — Mon, 02 Feb 2026 23:14:43 +0000

Distributed transactions fail partway through. Payment succeeds, then Salesforce times out. The guest is charged, but three systems hold stale state.

In production, this happens constantly: a system times out, a connection drops mid-request, a user submits unexpected input. In distributed systems, these failures often mean a transaction completes in one system and fails in another—requiring reconciliation to restore consistency across all systems.

What does this look like in composed MCP tools? When an LLM orchestrates multi-step workflows—potentially retrying, potentially calling the same tool multiple times—each tool represents a surface area for partial failure. Who enforces state reconciliation? How does that fit with the separation of concerns we've discussed in previous posts?

Idempotency handles retries. Error handling catches failures. Neither addresses partial success—when some operations complete before the failure point. Recovery requires knowing what succeeded and how to reverse it.

Previous posts covered composable skill design and serverless execution—how to structure tools and run them reliably. This post covers what happens when reliable execution still produces inconsistent state.

The Reference Architecture: Dewy Resort

Throughout this series, we use Dewy Resort—a hotel management system—as our reference implementation. The architecture spans multiple systems:

Salesforce: Guest records, bookings, room inventory, sales opportunities
Stripe: Payment processing, refunds
MCP Tools: Orchestration layer connecting these systems via composed workflows

A single guest action like "check out" triggers operations across both systems: charge payment in Stripe, update booking status in Salesforce, mark room for cleaning, close the sales opportunity. Each system commits on its own timeline. The orchestrator sequences operations but can't provide atomic rollback across system boundaries.

The complete implementation is open source.

The Problem: Consistency Without Transactions

Enterprise workflows need ACID-like guarantees—either everything succeeds or everything rolls back. But there's no shared transaction boundary spanning systems. Stripe commits. Salesforce commits. Each has its own state, its own failure modes.

What This Looks Like: Multi-Object State

In Dewy Resort, a single checkout action updates three related Salesforce objects:

Booking__c
  ├─> Hotel_Room__c (lookup)
  └─> Opportunity (lookup)

State dependencies:
- Booking.Status = "Checked Out"
  → Room.Status__c = "Cleaning"
  → Opportunity.StageName = "Closed Won"

When Booking transitions to "Checked Out," Room and Opportunity must also transition. If any update fails, all three objects may be in inconsistent states—plus the Stripe charge has already succeeded.

A Checkout Failure in Practice

Workflow: process_guest_checkout

Step	Operation	Result
1	Search guest in Salesforce	✓
2	Create Stripe customer	✓
3	Create payment intent	✓
4	Confirm payment	✓ ($250 charged)
5	Update Salesforce booking	✗ Timeout
6	Update room status	— Skipped
7	Update opportunity	— Skipped

Result: HTTP 500 returned to caller.

Actual state across systems:

System	Actual	Expected	Match
Stripe	$250 charged	$250 charged	✓
Booking	Checked In	Checked Out	✗
Room	Occupied	Cleaning	✗
Opportunity	Negotiation	Closed Won	✗

Guest paid, checkout incomplete. Room can't be reassigned. Sales report wrong. Manual reconciliation: 30+ minutes.

Without a transaction boundary spanning systems, you have to build consistency yourself.

Why Try/Catch Isn't Enough

A catch block logs the error and returns 500. It can't distinguish between:

Failed before payment → safe to retry
Failed after payment → need refund or idempotent retry
Failed during Salesforce update → need reconciliation to determine state

Traditional error handling is binary: success or failure. Distributed workflows have partial success. Recovery requires knowing what succeeded and how to reverse it.

A Perspective: Decision Placement

This series has argued for a clear separation of concerns between LLMs and backend systems. That principle applies directly to recovery logic: who should decide when to compensate, and who should execute the compensation?

This isn't established industry practice—it's a perspective we're advocating based on our experience building MCP tools for enterprise contexts.

The Principle

These aren't fully autonomous systems. They're agents assisting humans—so the real question is where human judgment belongs versus where deterministic execution belongs.

Humans + LLMs decide WHEN to act and WHICH workflow.
Backend workflows decide HOW to act and WHAT state transitions.
Financial and security decisions always go to backend.

Human + LLM Decisions

Non-deterministic, context-dependent—where judgment matters:

Understanding user intent ("I want to check out" → tool selection)
Selecting the right workflow (checkout vs compensation)
Extracting parameters from natural language
Confirming actions with users before execution
Explaining results to users

Backend-Appropriate Decisions

Deterministic, rule-based—where consistency matters:

Payment status routing (succeeded/requires_action/failed)
State validation (can this transition happen?)
Business rule enforcement (check-in window, eligibility)
ID resolution (email → Contact ID)
Error categorization (retryable vs permanent)

Applied to Recovery: Who Decides to Compensate?

We built capacity for both paths:

LLM-driven: Staff member receives guest complaint ("I got charged but checkout failed"). Staff-facing agent verifies the problem, calls compensation, explains outcome.
Automated: Scheduled job queries for orphaned payments, triggers compensation automation based on business rules.

Same compensation tool, different trigger mechanisms. Conversational UX for reported issues; automation catches unreported failures.

On tool exposure: The compensation tool exists only on the staff MCP server—there's no guest-facing version. Some tools simply don't make sense for certain audiences. There's no "refund on behalf of guest" capability because allowing guests to trigger unmediated refunds isn't sound business logic. Tool exposure is itself a layer of authorization, complementing the approach to building authorization into tool design discussed earlier in this series.

Why This Matters

Principle	Implementation	Rationale
Strategic vs tactical split	LLM selects workflow; backend executes it	Clear separation enables independent testing
Financial logic in backend	Refund amounts, payment routing, charge validation	Deterministic, auditable, not subject to prompt variation
Multiple trigger mechanisms	Same tool callable by LLM or cron job	Flexibility without duplicating logic

Two Patterns for Recovery

Two established patterns address partial failure recovery directly. Both are implemented in Dewy Resort.

Pattern 1: Compensating Transactions (Saga Pattern)

The saga pattern treats multi-step workflows as a sequence of operations, each with a corresponding compensating transaction that reverses its effect.¹

Use this pattern when:

Multi-system workflows can partially succeed
Financial operations are involved
State consistency affects user experience
Manual reconciliation cost exceeds automation cost

How It Works

When checkout fails after payment succeeds, the compensation tool:

Validates payment state (is it refundable?)
Issues refund (financial operations first)
Checks Salesforce state and reverts if needed

The tool accepts what callers naturally have:

{
  "tool": "compensate_checkout_failure",
  "parameters": {
    "payment_intent_id": "pi_3ABC...",
    "guest_email": "beth.gibbs@email.com",
    "idempotency_token": "comp-pi_3ABC...",
    "reason": "Salesforce timeout during checkout"
  }
}

Design Principles

Principle	Implementation	Rationale
Check state before reversing	Query current state, only update if needed	Makes compensation idempotent—safe to retry
Financial operations first	Issue refund before Salesforce cleanup	Guest harm reversed immediately; data fixable async
Business identifiers in, system IDs hidden	Accept `guest_email`, resolve to Contact ID internally	LLM has email from conversation; logs stay readable
Idempotency at every layer	Client token → Backend check → Stripe Idempotency-Key	Safe to automate; no double-refunds

Pattern 2: Fail-Fast Validation

Validate assumptions before expensive operations. Preventing failures is cheaper than compensating for them.

Use this pattern when:

Operations have prerequisites that could be violated
Downstream operations are non-idempotent (payments, external API calls)
Clear error messages can guide callers to fix input

Example: The Multiple-Bookings Bug

Original code assumed one booking per guest:

search_booking(guest_email)
update_booking(bookings[0].id, status: "Checked Out")

Problem: What if bookings has 0 or 2+ elements?

length == 0: Accesses undefined → crash
length > 1: Updates first booking (might be wrong one)

The fix—explicit validation before array access:

bookings = search_booking(guest_email)

IF bookings.length == 0:
  → Return 404 "No checked-in booking found for this guest today"

IF bookings.length > 1:
  → Return 400 "Multiple bookings found. Provide room_number or booking_number to disambiguate"

IF bookings.length == 1:
  → Proceed with checkout

This stops execution before charging payment. If validation happened after payment, you'd need compensation.

Design Principles

Principle	Implementation	Rationale
Validate before non-idempotent operations	Check prerequisites before payments, external calls	Failures prevented > failures compensated
Validate array lengths	Check length before accessing elements	Prevents crashes and wrong-record updates
Return actionable errors	Specific codes (400/404/409) with guidance	Callers can fix input without guessing

Recovery Strategy Quick Reference

Strategy	When to Use	Example
Compensating transaction	Financial operations, state consistency critical	Payment succeeded, Salesforce failed → Issue refund
Acceptable orphan	Resource has low/zero cost, will be reused	Stripe customer created, checkout failed → Customer reused on retry
Fail-fast validation	Preventing failure cheaper than recovering	Multiple bookings found → Return 400 before payment
Retry with idempotency	Transient failure, operation is idempotent	Salesforce timeout → Retry with same token

Is Your Tool Recovery-Ready?

Saga Pattern

[ ] Compensation orchestrator exists for financial operations
[ ] Compensation checks state before reversing (idempotent)
[ ] Financial compensation prioritized over data cleanup
[ ] All compensation actions logged for audit trail
[ ] Idempotency tokens flow through entire compensation flow

Fail-Fast Validation

[ ] Array lengths validated before element access
[ ] Prerequisites checked before non-idempotent operations
[ ] Error codes are specific with actionable guidance

Decision Placement

[ ] Strategic decisions (when, which workflow) → LLM
[ ] Tactical decisions (how, what transitions) → Backend
[ ] Financial/security decisions → Backend (always)
[ ] Multiple trigger mechanisms supported where needed

Conclusion

MCP standardizes how LLMs discover and invoke tools. What those tools do—and how they handle partial failures, state consistency, and recovery—is architecture you build into the tools themselves.

The saga pattern provides compensating transactions when multi-system workflows fail partway through. Fail-fast validation prevents failures by checking assumptions before expensive operations. Decision placement—where recovery logic lives—determines whether your system is testable, auditable, and flexible.

You can see these approaches in the Dewy Resort sample application. We've built a checkout orchestration tool that handles Stripe and Salesforce coordination, financial operations, state consistency across Booking/Room/Opportunity, automatic compensation, and idempotency at every layer.

Implementation: The complete Dewy Resort Hotel example is open source: github.com/workato-devs/dewy-resort

This post builds on Designing Composable Skills for MCP Tools and Serverless MCP Execution. For more on composable architecture patterns, see the complete series.

The saga pattern was introduced by Hector Garcia-Molina and Kenneth Salem in their 1987 paper "Sagas" ACM SIGMOD. For a modern treatment, see Chris Richardson's Saga pattern documentation. ↩

Serverless MCP: Stateless Execution for Enterprise AI Tools

Zayne Turner — Tue, 13 Jan 2026 20:27:16 +0000

In the first two posts of this series, we explored why enterprise MCP needs compositional architecture and how to design skills that abstract complexity from the AI agent. But there's a question we haven't addressed: how do those tools actually execute at runtime?

The Model Context Protocol defines how AI agents discover and invoke tools. But the protocol says nothing about persistent connections vs. stateless HTTP, session-cached state vs. source-of-truth lookups, or long-running processes vs. queue-based workers. These runtime decisions determine whether your system scales, fails gracefully, and stays debuggable.

Most MCP implementations assume persistent connections and session state. This post explores a different approach: serverless MCP, where every tool call is an independent HTTP request, executed by any available worker in a distributed pool, with no state stored in the MCP server itself.

We'll continue with our hotel operations system to show how we built this using Workato's cloud-native iPaaS and enterprise MCP—and why stateless execution matters for production AI.

What Is Serverless MCP?

Serverless MCP isn't about AWS Lambda or "no servers." It's a set of three specific architectural choices:

Connection model: HTTP request-response per tool call, not persistent WebSocket/SSE connections.

State management: No session state in the MCP server. The source of truth lives in your systems of record (CRM, databases, etc). The MCP layer is stateless.

Execution model: Tool calls are queued immediately upon arrival. Any available worker pulls from the queue and executes. No server affinity—the worker that handles client requests might be different every time.

These choices have cascading effects across the entire system:

Aspect	Traditional MCP	Serverless (managed) MCP
Connection	Persistent WebSocket/SSE	HTTP request per tool call
State	Session-based (cached in server)	Stateless (external systems = source of truth)
Scaling	Vertical, or complex load balancing	Horizontal (queue depth triggers auto-scaling)
Execution	Long-running server process	Queue-based workers, allocated just-in-time
Fault tolerance	Connection drops require reconnect	Queued events survive worker crashes
Idempotency	Must implement manually	Declarative control logic
Deployment	Custom server code, process management	Recipe descriptors, zero infrastructure ops

An individual MCP client could be identical in both cases. The difference is entirely in how backend systems respond to those tool invocations.

When Does Server Choice Matter?

Serverless* MCP excels when:

You're orchestrating across multiple (external) systems.
A hotel checkout touches Stripe (payment), Salesforce (booking status, room status, opportunity stage), and possibly Twilio (confirmation SMS). Each external API call takes at least 600-1200ms. The overhead of HTTP vs. WebSocket is insignificant noise (~50ms) compared to the overall 4-7 seconds spent waiting on external systems.

Your load is variable or unpredictable.
Traffic spikes during batch processing, end-of-day reconciliation, or pilot rollouts. Queue-based execution scales automatically. Worker pools grow when queue depth increases, shrink when it decreases. No capacity planning required.

Your team is operations-constrained.
Small teams without dedicated DevOps still need enterprise-grade reliability. Managed serverless MCP offerings (like Workato's enterprise MCP) offload infrastructure concerns (scaling, health checks, connection pools, OAuth token refresh) to the underlying operating platform. (DIY serverless MCP would NOT necessarily benefit these kinds of teams.)

You need audit trails and guaranteed delivery.
In a managed serverless MCP implementation, every tool call is persisted to a distributed queue before execution begins. If a worker crashes mid-execution, the event survives. Transaction-level logging comes free.

*Some of the benefits here are specific to what I'm calling a "managed serverless" (i.e. running on a platform-as-a-service substrate) implementation. These are noted in the text.

Traditional MCP makes sense when:

You need streaming responses.
HTTP request-response is all-or-nothing. If you need incremental results as data arrives—search results appearing one by one, progress updates during long operations, real-time transcription—you need WebSocket or SSE transports. This is an architectural constraint, not a performance trade-off.

You need server-initiated communication.
Stateless HTTP means the server only responds to requests; it can't push. If your tools need to notify the client asynchronously (alerts, status changes, collaborative updates), you either need persistent connections or a separate notification channel.

Connection overhead would dominate the workload.
HTTP setup adds ~50ms per request. For enterprise integrations hitting systems like Salesforce or Stripe (600-1200ms per API call), that's trivial—acceptable friction that disappears into the overall latency. But if your tools perform fast (less than <50ms) internal operations like cache lookups, in-memory computations, microservice calls, that per-request overhead would become the dominant cost per execution. These costs typically add up more in internal service-to-service communication than SaaS connectivity use cases.

What NOT to put into an MCP Server (no matter the architecture)

As I've discussed before: MCP is a narrow protocol. Keep it that way.
MCP defines tool interfaces. Your backends execute operations, manage state, and handle errors. If you find yourself thinking: "I need persistent connections to maintain state across tool calls," you are putting that responsibility in the wrong layer.

Serverless MCP forces forces a certain amount of architectural discipline—you can't store session state, so you design systems that don't need it.

How Serverless Execution Works

Let's look at how Workato's cloud-native architecture implements serverless MCP.

Recipes as Descriptors, Not Deployed Applications

In traditional integration platforms, you deploy an application to a server. The app "belongs" to that server—-tight coupling. On the Workato platform, a common developer artifact is a multi-step automated integration workflow, called a "recipe." In Workato's architecture, recipes are descriptors of intent, not deployed code.

From Workato's Cloud-Native Architecture whitepaper:

"Recipes are descriptors of the 'builder's intent' and are decoupled from our execution runtime. Recipe logic is evaluated on-demand during execution by any available (idle) server, thus removing the notion of a 'deployed app' and closely aligning with a serverless execution paradigm."

This means:

Any worker can execute any tool call (no server affinity)
Platform upgrades don't require redeploying recipes (zero downtime)
Workers are allocated just-in-time, not pre-assigned

Queue-Based Execution Flow

When a tool call arrives:

HTTP POST hits the API Platform
Request is immediately persisted to a distributed queue
Platform returns acknowledgment
Any available worker pulls the event from the queue
Worker evaluates the recipe descriptor, executes each step
Response returned (or error handled)

If a worker crashes mid-execution, the event persists in the queue. Another worker picks it up. Guaranteed delivery without custom retry logic.

Where State Actually Lives

In our Dewy Resort Hotel implementation, state lives in exactly three places:

CRM/backend systems (source of truth): Guest contacts, hotel bookings, rooms, service cases, opportunities. Every tool call queries current state—no caching, no staleness.

Client-side idempotency tokens: UUIDs generated by the client, passed with every create/update operation. The backend checks for existing records with matching external IDs before creating new ones. Safe to retry.

Platform-managed connections: OAuth tokens and API keys stored in Workato's encrypted vault. Automatic token refresh. Managed at the workspace level, not per-request.

What's not stored in the MCP server: session state, conversation history, connection pools, cached results. Nothing.

Real Example: Guest Check-In

A guest says: "Hi, I'm Sarah Johnson checking in. My email is sarah@example.com."

The LLM calls the check_in_guest tool:

{
  "tool": "check_in_guest",
  "parameters": {
    "guest_email": "sarah@example.com",
    "idempotency_token": "a]550e8400-e29b-41d4..."
  }
}

The orchestrator recipe:

Searches Salesforce for Contact by email (800ms)
Searches for Booking with status=Reserved, date=today (900ms)
Validates Room status is Vacant (800ms)
Updates Booking status: Reserved → Checked In
Updates Room status: Vacant → Occupied
Updates Opportunity stage: Booking Confirmed → Checked In (~2500ms for all state changes)

Total: ~5.5 seconds. Five Salesforce API calls.
Any worker could have executed this—the one that did was simply the next idle worker in the pool.

Stateless validation: Every check queries current Salesforce data. No session state needed.

Idempotency built-in: If the booking is already "Checked In," the operation returns success without duplicating work. Safe to retry on network failure.

Implementation Patterns

Four patterns that make serverless MCP work in production:

1. Compositional Tool Design

Don't expose raw APIs as MCP tools. Design tools around user intent.

Naive approach (11+ tool calls for a checkout):

get_guest_by_email, get_booking_by_guest, get_room_by_booking,
create_payment_intent, charge_payment_method, send_receipt_email,
update_booking_status, update_room_status...

Compositional approach (2 tools):

check_in_guest (orchestrator)
checkout_guest (orchestrator)

Each orchestrator composes multiple atomic operations internally. The complexity moves to the backend, where it's governed, tested, and observable.

Why this matters for serverless: fewer round trips (lower latency), atomic operations (easier retry logic), encapsulated business rules (consistent validation).

2. Idempotency at Every Layer

Every tool that creates or modifies data accepts an idempotency token:

{
  "idempotency_token": "550e8400-e29b-41d4-a716-446655440000",
  "guest_email": "sarah@example.com",
  "amount": 15000
}

The backend checks for existing records before creating:

IF Case.External_ID__c = idempotency_token
  → Return existing case (already created)
ELSE
  → Create new case with External_ID__c = idempotency_token

External APIs enforce their own idempotency (Stripe's Idempotency-Key header deduplicates within 24 hours).

Result: network retries are always safe. No duplicate bookings, no duplicate charges.

3. Business Identifiers Over System IDs

Tightly coupled (exposes your database schema):

{
  "contact_id": "003Dn00000QX9fKIAT",
  "booking_id": "a0G8d000002kQoFEAU"
}

Human-readable (lets the backend resolve IDs):

{
  "guest_email": "sarah@example.com",
  "room_number": "205"
}

The recipe resolves emails and room numbers to Salesforce IDs internally. The LLM doesn't need to know your data model. Logs become human-readable. Each request is self-contained.

4. Structured Error Contracts

Every tool returns structured responses:

type ToolResult =
  | { success: true; data: ResultData }
  | { success: false; error_code: string; error_message: string; retry_safe: boolean };

Error codes map to HTTP status:

200: Success
400: Validation error (bad input)—don't retry
404: Resource not found—don't retry
409: Conflict (room unavailable, multiple reservations)—don't retry
500: Infrastructure error—safe to retry

Performance Reality

The bottleneck in serverless MCP is external APIs, not the execution model.

Typical check-in (~5.5 seconds total):

API Platform routing: 50ms
Search Contact: 800ms
Search Booking: 900ms
Update Booking: 800ms
Update Room: 800ms
Update Opportunity: 900ms

Complex checkout with payment (~6.8 seconds total):

API Platform routing: 50ms
Search Contact: 800ms
Create Stripe Customer: 600ms
Create PaymentIntent: 700ms
Confirm Payment: 2000ms (includes bank authorization)
Update Booking: 1000ms
Update Room: 900ms
Update Opportunity: 900ms

98%+ of execution time is spent waiting on systems like Salesforce and Stripe. A persistent-connection MCP server wouldn't be meaningfully faster—the latency is in the business process, not the protocol.

Throughput Benchmarks

Workato API Platform: 100 requests/second per workspace, auto-scaling worker pools, distributed queue prevents backpressure.

Our test results: 20 concurrent tool calls from multiple LLM instances, no contention or throttling, linear scaling observed.

Practical bottleneck (for our fictional app): Salesforce dev sandbox allows 15,000 API calls per 24 hours. A typical checkout workflow uses 8-9 Salesforce calls. That's ~1,500 checkouts per day—limited by Salesforce, not Workato.

Lessons Learned

What worked

Idempotency everywhere. Client-generated UUIDs + external ID fields in Salesforce made retries trivial. Zero duplicate bookings or charges from test cases through final deployments.

Compositional design. When checkout_guest failed, we could test create_stripe_customer in isolation. Breaking workflows into orchestrators + atomic operations made debugging dramatically easier.

Business identifiers. Troubleshooting meant grepping for "sarah@example.com" instead of decoding "003Dn00000QX9fKIAT". Human-readable logs matter.

Structured errors. Consistent error codes (GUEST_NOT_FOUND, ROOM_NOT_VACANT) let us document recovery paths. The LLM could guide users appropriately based on error type.

What we'd do differently

Treat your test datasets like artifacts. We generated synthetic test data during design, but didn't consistently backport changes as we iterated on the real app. The mocks drifted—reflecting original assumptions, not actual behavior. This debt compounded at the orchestration layer: we burned significant time recreating complex payloads that accurate atomic datasets would have provided for free. Generate mocks early, and maintain them like production code.

Keep atomic operations truly atomic. We started with compositional design as our intent, but still managed to sneak transactional behavior into orchestrators and mix independent read/write transactions into the same "atomic" recipe. Splitting out functionality like search_contact_by_email from an orchestration into its own recipe not only improved reuse--it significantly improved debugging.

Make tool descriptions explicit. The LLM struggled with ambiguous names. "Check in guest" vs. "Check in guest (requires existing reservation)" made a real difference. There is more guidance about this in my previous post, focused on putting composable tool design into practice (link below).

Production-friendly benefits

Zero downtime for recipe changes. Update validation logic, redeploy—workers pick up the new version automatically. No restarts, no reconnects.

Painless scaling. Tested concurrent requests without thinking about it. Workers auto-scaled. No connection pool tuning.

Audit trail for free. Transaction-level logging saved us during a Salesforce API outage. We could see exactly which operations completed vs. failed, replay failed requests once service restored.

Conclusion

As AI agents move into production, your MCP server architecture matters more than the protocol itself.

Traditional MCP—persistent connections, session state, server affinity—works for experiments. Production systems need:

Horizontal scalability without manual tuning
Guaranteed delivery with built-in retry safety
Transaction-level observability for debugging and compliance
Idempotency by design, not as an afterthought

Serverless MCP delivers this by making statelessness a constraint, not an option. You can't store session state, so you design systems that don't need it. You can't rely on server affinity, so you build idempotent operations. The architecture forces good discipline.

The protocol won't save you from bad architecture. But stateless execution—queue-based workers, external systems as source of truth, idempotency at every layer—transforms MCP from a technical curiosity into a production-grade integration layer.

Resources

Dewy Resort Hotel (Open Source): github.com/workato-devs/dewy-resort
Workato Cloud-Native Architecture: Architecture Deep Dive Whitepaper
Model Context Protocol Spec: modelcontextprotocol.io
Previous post: Designing Composable Tools for MCP

Designing Composable Tools for Enterprise MCP: From Theory to Practice

Zayne Turner — Tue, 23 Dec 2025 17:35:52 +0000

In my previous post, I discussed how the biggest gap in enterprise MCP implementations isn't the protocol itself—it's the architectural decisions around it. Specifically, how teams treat MCP as "API gateway for LLMs" when they should be thinking about composable tool design.

Today, I want to show you what composable, skills-based tool design actually looks like in practice.

Hotel Operations Case Study

Let's start with a real scenario from a hotel management system. A front desk employee says: "Beth Gibbs is checking out, and she says the toilet in her room is broken."

This simple interaction requires:

Processing the checkout (payment, receipts, room status)
Filing a maintenance request (with room context intact)
Updating inventory and availability
Routing the request to the right maintenance team

How would you design MCP tools for this?

The Naïve Approach

Many (if not most) teams start by exposing existing APIs as MCP tools:

- get_guest_by_email
- get_booking_by_guest
- get_room_by_booking
- create_payment_intent
- charge_payment_method
- send_receipt_email
- update_booking_status
- update_room_status
- create_case
- assign_case_to_contact
- set_case_priority

An agent now has to orchestrate 11+ API calls in the correct sequence, handle potential failures at each step, and maintain state throughout. The result? Slow, error-prone, and TERRIBLE user experiences.

The Compositional Approach

What if, instead, we designed tools around user intent? The calls could look something like:

- process_guest_checkout
- submit_maintenance_request

Two tools. One natural conversation. The complexity hasn't disappeared—it's just moved to where it belongs.

Nine Patterns for Composable, Skills-Based Tool Design

After implementing production MCP systems, here are the patterns that separate elegant architectures from fragile ones:

1. Accept Business Identifiers, Not System IDs

Bad:

{
  "contact_id": "003Dn00000QX9fKIAT",
  "booking_id": "a0G8d000002kQoFEAU",
  "room_id": "a0I8d000001pRmXEAU"
}

Good:

{
  "guest_email": "beth.gibbs@email.com",
  "room_number": "302"
}

Let the backend resolve human-readable identifiers to internal IDs. The agent shouldn't need to know your database schema.

This applies to all tool parameters—not just the primary entity. When updating relationships (like reassigning a case to a different room or changing the guest on a booking), continue using business identifiers:

{
  "idempotency_token": "550e8400-e29b-41d4-a716-446655440000",
  "room_number": "402",  // backend resolves to room_id
  "guest_email": "new.guest@example.com"  // backend resolves to contact_id
}

The agent should never need to call get_room_by_number or get_guest_by_email just to obtain IDs for another operation. Every tool parameter should use business identifiers, and the backend handles all ID resolution internally.

2. Build Idempotency Into Tool Design

Every tool that creates or modifies resources should accept an idempotency token:

{
  "idempotency_token": "550e8400-e29b-41d4-a716-446655440000",
  "guest_email": "beth.gibbs@email.com",
  "description": "Toilet broken in room 302"
}

When the agent retries (and it will), the backend recognizes the duplicate request and returns the original result. This is a backend responsibility, not an agent responsibility.

Added benefit: For multi-system operations (like a checkout process spanning payment processing and CRM updates), idempotency tokens enable saga pattern orchestration. For example: if a payment succeeds but a CRM update fails, the backend can use the relevant transaction token to coordinate compensating transactions (like refunding the payment) without agent involvement.

3. Coordinate State Transitions Atomically

When a guest checks in, multiple things must happen together:

Booking status: Reserved → Checked In
Room status: Available → Occupied
Opportunity stage: Pending → Active

These shouldn't be three separate tools the agent must coordinate. One tool (check_in_guest) should orchestrate the entire state transition atomically.

4. Embed Authorization in Tool Design

Instead of:

- search_all_cases
- search_all_rooms
- search_all_bookings

Design tools with appropriate scope:

- search_cases_on_behalf_of_guest(guest_email)
- search_rooms_on_behalf_of_guest(guest_email)
- search_rooms_on_behalf_of_staff(floor_filter, status_filter)

The tool interface itself encodes who can see what. Authorization becomes declarative rather than imperative.

5. Provide Smart Defaults

Where ever possible, reduce the agent's cognitive load:

{
  "guest_email": "required",
  "check_in_date": "defaults to today",
  "number_of_guests": "defaults to 1",
  "status_filter": "defaults to 'Open'"
}

Agents should only need to specify what's genuinely variable.

6. Document Prerequisites and Error Modes

Tool descriptions should guide the agent toward success:

Check-in tool: "Validates guest/reservation prerequisites, checks room vacancy, executes state transitions. Returns booking and room details or error codes (404: guest/reservation not found, 409: multiple reservations or room unavailable)."

When the agent knows the failure modes upfront, it can handle them gracefully or ask clarifying questions before attempting the operation.

7. Support Partial Updates with Clear Semantics

Update operations should be easy to reason about:

{
  "external_id": "required",
  "check_in_date": "optional - only changes if provided",
  "room_number": "optional - only changes if provided",
  "guest_email": "optional - only changes if provided"
}

"Only provide fields to change—rest preserved" is much simpler than forcing the agent to read-modify-write.

8. Create Defensive Composition Helpers

Some operations need prerequisites. Rather than forcing the agent to check-then-create:

- create_contact_if_not_found(email, first_name, last_name)

This helper is idempotent and can be safely called by orchestration tools to ensure prerequisites exist.

9. Design for Natural Language Patterns

Listen to how people actually talk:

"Check in Beth Gibbs" → check_in_guest
"Room 302's toilet is broken" → submit_maintenance_request
"Move the booking to room 402" → manage_bookings

Tool names and parameters should match the language users naturally employ.

The Architecture Behind Composable Tools

These nine patterns emerge from a single architectural principle: let LLMs handle intent, let backends handle execution.

LLMs are probabilistic systems, optimized for understanding human communication. Backends are deterministic systems, optimized for reliable state management and transactional consistency. When you blur this boundary—asking LLMs to orchestrate multi-step operations or programming backends to parse natural language—you end up with systems that are neither reliable nor intelligent.

The patterns above show what this separation of concerns looks like in practice:

Patterns 1, 5, 9 (Business identifiers, smart defaults, reference resolution, natural language alignment)

→ Let the LLM work with human concepts. Push system-level details to the backend.

Patterns 2, 3, 6 (Idempotency, atomic transitions, error modes)

→ Backend guarantees reliability. LLM doesn't need to reason about retries or failure recovery.

Patterns 4, 7, 8 (Authorization scope, partial updates, defensive helpers)

→ Tool interfaces encode business rules. Backend validates and enforces constraints.

The architectural payoff is concrete:

When backends handle orchestration (good design):

One implementation, tested and proven
Transactional consistency guaranteed
Observable state transitions
Reusable across interfaces (web, mobile, MCP)

When LLMs handle orchestration (poor design):

Logic scattered across conversations
Non-deterministic coordination
Opaque failures (hard to debug)
Context bloat (e.g. 50+ tools, 6+ calls per task)

Real-World Impact

While we built the Dewy Resort application, we iteratively replaced direct API calls and API tool wrappers with our skills-based architectural design. Below are a few of the benchmarks we captured along the journey.

Before composable design:

Average response time: 8-12 seconds
Success rate: 73%
Number of tools: 47
Average tool calls per interaction: 6.2
User feedback: "It works, but it's slow and sometimes gets confused"

After composable design:

Average response time: 2-4 seconds
Success rate: 94%
Number of tools: 12
Average tool calls per interaction: 1.8
User feedback: "It just works"

The difference isn't in the LLM. It's in the architecture.

Implementation Checklist for Enterprise MCP Tool Design

When designing your MCP tools for production systems, ask yourself:

Identity & Resolution

[ ] Do tools accept business identifiers (email, name, number)?
[ ] Does the backend handle ID resolution?

Safety & Reliability

[ ] Do creation tools require idempotency tokens?
[ ] Are state transitions atomic?
[ ] Are prerequisites validated before operations?

Authorization & Access

[ ] Do tools encode authorization scope in their interface?
[ ] Are search tools scoped to appropriate contexts?

Cognitive Load

[ ] Do tools provide sensible defaults?
[ ] Are tool names aligned with natural language?
[ ] Do descriptions document error modes?

Flexibility

[ ] Do update operations support partial updates?
[ ] Can agents modify relationships using business identifiers?

The Broader Pattern

This isn't just about designing hotel management systems. These patterns apply anywhere you're building AI agents that interact with enterprise systems and processes.

Healthcare: "Schedule a follow-up for this patient" should orchestrate appointment booking, notification, and record updates—not expose 15 scheduling APIs.

Finance: "File this expense report" should handle validation, approval routing, and accounting entries—not force the agent to understand your ERP's state machine.

Retail: "Process this return" should coordinate inventory, refunds, and customer notifications—not expose raw warehouse and payment APIs.

The question is always the same: *Are you designing tools around user intent, or around API operations?*

Conclusion

Enterprise MCP gives you the foundation for tool interoperability. Composable skills-based design is how you build something useful on that foundation.

The protocol won't save you from bad architecture. But good architecture—tools composed around user intent, with complexity pushed to governed backends—transforms MCP from a technical curiosity into a production-grade system.

Stop wrapping APIs. Start composing skills.

Your users will thank you. Your agents will thank you. (Ok, your agents probably won't.) But your operations team will definitely thank you.

What's your experience with MCP tool design? I'd love to hear what patterns you're discovering. Drop a comment or reach out on LinkedIn—the more we share these patterns, the faster we'll all build better AI systems.

This post builds on Beyond Basic MCP: Why Enterprise AI Needs Composable Architecture, where I explored the architectural principles that make MCP useful in production.

Beyond Basic MCP: Why Enterprise AI Needs Composable Architecture 🧩

Zayne Turner — Tue, 16 Dec 2025 22:34:19 +0000

The Model Context Protocol (MCP) has arrived as a promising standard for connecting AI agents to external tools and systems. But enterprise MCP implementations face a critical gap between what the protocol provides and what real-world production applications actually need.

After working with teams implementing MCP in production environments, we've discovered that the most common architectural pattern—wrapping APIs directly as tools—creates more problems than it solves. Here's what we've learned about building MCP architectures that actually work.

The Promise vs. The Reality

At its core, MCP is elegantly simple. It standardizes how LLM-powered applications discover and invoke tools, with servers exposing capabilities through a clean tools/list and tools/call interface. The protocol focuses on optimizing communication between clients and servers, enabling industry-wide interoperability.

But here's the catch: making MCP useful happens entirely outside the protocol's scope.

The protocol itself says nothing about authentication, authorization, logging, retry logic, business rules, or governance. These aren't nice-to-haves. They're the foundation of any production system. The hard work of making MCP enterprise-ready falls squarely on your shoulders.

This focused approach is intentional and beneficial. Recent developments like the MCP Apps extension demonstrate how the ecosystem can evolve through standardized extensions rather than bloating the core protocol. But each extension only solves specific problems—the architectural challenges of composing complex business processes remain yours to solve.

What Makes MCP "Enterprise"?

When we talk about enterprise MCP, we're specifically referring to implementations that handle:

Multi-system orchestration (e.g. Salesforce, Stripe, ERPs, not just single APIs)
Production AI agents rarely interact with just one system. They orchestrate workflows across CRMs, payment processors, communication platforms, and legacy databases—each with its own authentication, rate limits, and failure modes.

Production-grade reliability (SLAs, monitoring, disaster recovery)
Unlike experimental demos, production systems need guaranteed uptime, transaction-level observability, automatic retry logic, and graceful degradation when external systems fail.

Governance and compliance (audit trails, access control, data residency)
Enterprise environments require detailed logs of who accessed what data when, fine-grained permission boundaries that respect organizational hierarchies, and data handling that complies with regulations like GDPR or HIPAA.

Dynamic, high-scale workloads (thousands of users, millions of operations)
What works for 10 concurrent users can break at 1,000. Production MCP architectures must handle variable load, peak traffic, and geographic distribution without manual intervention.

Real integration complexity (legacy systems, custom connectors, data transformation)
Real enterprises don't have pristine REST APIs. They have have 20+ year old SOAP services, mainframe integrations, custom file formats, data models that evolved over decades.

This isn't about enterprise vs. startup—it's about the architectural challenges that emerge when MCP connects to critical business systems rather than experimental APIs. The patterns in this post apply whether you're a three-person team or a Fortune 500 company, as long as you're building something people depend on.

Why Naive Implementations Fail

The most intuitive approach to implementing MCP is to wrap your existing APIs as tools. After all, you already have APIs that do what you need, so why not expose them directly to the LLM?

This is what I call a "naïve MCP architecture," and it breaks down quickly in real-world scenarios.

Consider an employee trying to file an expense report through an AI assistant. In a naive implementation, the MCP server exposes tools that mirror the underlying API: createParentWithChild, batch_createParentWithChild, async_JobStatus, async_GetReport_byId. The agent must orchestrate these low-level operations, manage async job polling, handle errors, and navigate complex dependencies.

The problem isn't technical—it's experiential. Enterprise SaaS APIs are complex, granular, and verbose by design. They weren't built for conversational interfaces or multi-step business processes. When you couple these APIs directly with an LLM, you create overwhelmed agents and underwhelming user experiences.

Rethinking Separation of Concerns

The solution starts with reframing what each part of your system should handle.

Think about the fundamental difference between LLM capabilities and traditional software: LLMs are non-deterministic, while your backend systems need to be deterministic.

LLMs excel at non-deterministic tasks (high-entropy operations):

Understanding user intent and input formatting
Selecting the right tool for the job
Predicting likely next steps in a workflow
Handling ambiguity and natural language variation

Backends excel at deterministic tasks (low-entropy operations):

Controlling system access and permissions
Managing read/write operations reliably
Handling batching, retries, and error propagation
Enforcing business rules and state transitions

This isn't just a philosophical distinction—it has practical implications. When you ask an LLM to orchestrate deterministic operations (like ensuring atomic database transactions or managing retry logic), you're using a probabilistic tool for work that requires guarantees. When you ask your backend to parse natural language or select from dozens of ambiguous options, you're fighting against what makes traditional software reliable.

A well-designed MCP architecture should leverage these complementary strengths. Instead of exposing raw API operations and forcing LLMs into deterministic orchestration, you compose them into higher-level skills aligned with actual user jobs-to-be-done - letting each system operate in its sweet spot.

Composable Architecture in Action

Let's look at a concrete example: hotel operations at a fictional property called Dewy Resort.

When a front desk employee says, "Help me check out a guest," that simple request triggers a complex business process spanning multiple systems. In the background, you might need to:

Validate the reservation and room status
Process payment through a financial gateway
Update room availability in the property management system
Trigger housekeeping workflows
Log the transaction for compliance

In a naïve architecture, the agent would need to orchestrate all of this, calling a dozen different API endpoints in the correct sequence while handling errors at each step.

In a composable architecture, the MCP server exposes a single tool: Process guest checkout. The tool accepts high-level parameters (guest email, booking ID) and the backend handles all the orchestration, including atomic automation jobs, business-ruled retry logic, cross-app authentication, and error handling.

The MCP layer becomes a collection of composed skills:

Create contact if not found
Submit maintenance request
Check in guest
Process guest checkout
Submit guest service request

Each skill abstracts significant backend complexity while exposing a clean, intent-aligned interface to the LLM.

The Real-World Impact

This architectural shift has profound implications:

For users: Interactions feel natural because tools align with their mental model of tasks, not with system architecture.

For agents: Decision-making stays focused on understanding intent and selecting appropriate actions, rather than managing technical minutiae.

For developers: Business logic and integrations are centralized in governed, reusable workflows that can be composed into multiple MCP skills.

For enterprises: You gain observability, governance, and the ability to enforce business rules without bloating your agent's context.

Three Principles for Composable Enterprise MCP Design

If you're building MCP implementations for production systems, keep these principles in mind:

1. The protocol scope is focused—and that's okay.
Don't expect MCP to solve authentication, orchestration, or governance. These are your responsibility, and architectural decisions you make outside the protocol matter more than the protocol itself.

2. Real-world systems are intricate.
Your MCP architecture must handle this complexity somewhere. The question is whether you push it into the agent's runtime context or abstract it in the backend.

3. Optimal tools follow the shape of user needs, not APIs.
Design your tool interfaces around jobs-to-be-done, then compose whatever backend complexity is required to fulfill them reliably.

See It In Action

Want to explore these principles hands-on? We've open-sourced the complete Dewy Resort sample application that demonstrates compositional MCP architecture in a real-world hospitality context.

The repository includes:

Implementation of composed MCP skills* (guest checkout, maintenance requests, room management)
Backend orchestration showing how to abstract API complexity
Integration patterns across multiple systems (Salesforce, Stripe, IoT devices**)
Architectural documentation for each skill (orchestration and atomic level execution)

Explore the code: github.com/workato-devs/dewy-resort

Whether you're building your first MCP implementation or refactoring an existing one, the patterns in this sample app can accelerate your learning and help you avoid common pitfalls.

Moving Forward

As MCP adoption grows, the temptation will be to treat it as a simple API gateway. Resist this urge. The most successful implementations will be those that thoughtfully separate concerns, abstract complexity, and design tools that genuinely serve human needs.

The protocol gives us interoperability. The architecture we build on top determines whether we deliver underwhelming API wrappers or genuinely useful AI experiences.

The future of MCP in the enterprise is ours to build. What path will you choose?

* Final MCP server activation requires configuration (based on CLI functionality - see setup instructions)

** IoT skills are planned for future app releases