Zayne Turner

Posted on Dec 23, 2025 • Originally published at workato.com

Designing Composable Tools for Enterprise MCP: From Theory to Practice

#mcp #architecture #automation #ai

In my previous post, I discussed how the biggest gap in enterprise MCP implementations isn't the protocol itself—it's the architectural decisions around it. Specifically, how teams treat MCP as "API gateway for LLMs" when they should be thinking about composable tool design.

Today, I want to show you what composable, skills-based tool design actually looks like in practice.

Hotel Operations Case Study

Let's start with a real scenario from a hotel management system. A front desk employee says: "Beth Gibbs is checking out, and she says the toilet in her room is broken."

This simple interaction requires:

Processing the checkout (payment, receipts, room status)
Filing a maintenance request (with room context intact)
Updating inventory and availability
Routing the request to the right maintenance team

How would you design MCP tools for this?

The Naïve Approach

Many (if not most) teams start by exposing existing APIs as MCP tools:

- get_guest_by_email
- get_booking_by_guest
- get_room_by_booking
- create_payment_intent
- charge_payment_method
- send_receipt_email
- update_booking_status
- update_room_status
- create_case
- assign_case_to_contact
- set_case_priority

An agent now has to orchestrate 11+ API calls in the correct sequence, handle potential failures at each step, and maintain state throughout. The result? Slow, error-prone, and TERRIBLE user experiences.

The Compositional Approach

What if, instead, we designed tools around user intent? The calls could look something like:

- process_guest_checkout
- submit_maintenance_request

Two tools. One natural conversation. The complexity hasn't disappeared—it's just moved to where it belongs.

Nine Patterns for Composable, Skills-Based Tool Design

After implementing production MCP systems, here are the patterns that separate elegant architectures from fragile ones:

1. Accept Business Identifiers, Not System IDs

Bad:

{
  "contact_id": "003Dn00000QX9fKIAT",
  "booking_id": "a0G8d000002kQoFEAU",
  "room_id": "a0I8d000001pRmXEAU"
}

Good:

{
  "guest_email": "beth.gibbs@email.com",
  "room_number": "302"
}

Let the backend resolve human-readable identifiers to internal IDs. The agent shouldn't need to know your database schema.

This applies to all tool parameters—not just the primary entity. When updating relationships (like reassigning a case to a different room or changing the guest on a booking), continue using business identifiers:

{
  "idempotency_token": "550e8400-e29b-41d4-a716-446655440000",
  "room_number": "402",  // backend resolves to room_id
  "guest_email": "new.guest@example.com"  // backend resolves to contact_id
}

The agent should never need to call get_room_by_number or get_guest_by_email just to obtain IDs for another operation. Every tool parameter should use business identifiers, and the backend handles all ID resolution internally.

2. Build Idempotency Into Tool Design

Every tool that creates or modifies resources should accept an idempotency token:

{
  "idempotency_token": "550e8400-e29b-41d4-a716-446655440000",
  "guest_email": "beth.gibbs@email.com",
  "description": "Toilet broken in room 302"
}

When the agent retries (and it will), the backend recognizes the duplicate request and returns the original result. This is a backend responsibility, not an agent responsibility.

Added benefit: For multi-system operations (like a checkout process spanning payment processing and CRM updates), idempotency tokens enable saga pattern orchestration. For example: if a payment succeeds but a CRM update fails, the backend can use the relevant transaction token to coordinate compensating transactions (like refunding the payment) without agent involvement.

3. Coordinate State Transitions Atomically

When a guest checks in, multiple things must happen together:

Booking status: Reserved → Checked In
Room status: Available → Occupied
Opportunity stage: Pending → Active

These shouldn't be three separate tools the agent must coordinate. One tool (check_in_guest) should orchestrate the entire state transition atomically.

4. Embed Authorization in Tool Design

Instead of:

- search_all_cases
- search_all_rooms
- search_all_bookings

Design tools with appropriate scope:

- search_cases_on_behalf_of_guest(guest_email)
- search_rooms_on_behalf_of_guest(guest_email)
- search_rooms_on_behalf_of_staff(floor_filter, status_filter)

The tool interface itself encodes who can see what. Authorization becomes declarative rather than imperative.

5. Provide Smart Defaults

Where ever possible, reduce the agent's cognitive load:

{
  "guest_email": "required",
  "check_in_date": "defaults to today",
  "number_of_guests": "defaults to 1",
  "status_filter": "defaults to 'Open'"
}

Agents should only need to specify what's genuinely variable.

6. Document Prerequisites and Error Modes

Tool descriptions should guide the agent toward success:

Check-in tool: "Validates guest/reservation prerequisites, checks room vacancy, executes state transitions. Returns booking and room details or error codes (404: guest/reservation not found, 409: multiple reservations or room unavailable)."

When the agent knows the failure modes upfront, it can handle them gracefully or ask clarifying questions before attempting the operation.

7. Support Partial Updates with Clear Semantics

Update operations should be easy to reason about:

{
  "external_id": "required",
  "check_in_date": "optional - only changes if provided",
  "room_number": "optional - only changes if provided",
  "guest_email": "optional - only changes if provided"
}

"Only provide fields to change—rest preserved" is much simpler than forcing the agent to read-modify-write.

8. Create Defensive Composition Helpers

Some operations need prerequisites. Rather than forcing the agent to check-then-create:

- create_contact_if_not_found(email, first_name, last_name)

This helper is idempotent and can be safely called by orchestration tools to ensure prerequisites exist.

9. Design for Natural Language Patterns

Listen to how people actually talk:

"Check in Beth Gibbs" → check_in_guest
"Room 302's toilet is broken" → submit_maintenance_request
"Move the booking to room 402" → manage_bookings

Tool names and parameters should match the language users naturally employ.

The Architecture Behind Composable Tools

These nine patterns emerge from a single architectural principle: let LLMs handle intent, let backends handle execution.

LLMs are probabilistic systems, optimized for understanding human communication. Backends are deterministic systems, optimized for reliable state management and transactional consistency. When you blur this boundary—asking LLMs to orchestrate multi-step operations or programming backends to parse natural language—you end up with systems that are neither reliable nor intelligent.

The patterns above show what this separation of concerns looks like in practice:

Patterns 1, 5, 9 (Business identifiers, smart defaults, reference resolution, natural language alignment)

→ Let the LLM work with human concepts. Push system-level details to the backend.

Patterns 2, 3, 6 (Idempotency, atomic transitions, error modes)

→ Backend guarantees reliability. LLM doesn't need to reason about retries or failure recovery.

Patterns 4, 7, 8 (Authorization scope, partial updates, defensive helpers)

→ Tool interfaces encode business rules. Backend validates and enforces constraints.

The architectural payoff is concrete:

When backends handle orchestration (good design):

One implementation, tested and proven
Transactional consistency guaranteed
Observable state transitions
Reusable across interfaces (web, mobile, MCP)

When LLMs handle orchestration (poor design):

Logic scattered across conversations
Non-deterministic coordination
Opaque failures (hard to debug)
Context bloat (e.g. 50+ tools, 6+ calls per task)

Real-World Impact

While we built the Dewy Resort application, we iteratively replaced direct API calls and API tool wrappers with our skills-based architectural design. Below are a few of the benchmarks we captured along the journey.

Before composable design:

Average response time: 8-12 seconds
Success rate: 73%
Number of tools: 47
Average tool calls per interaction: 6.2
User feedback: "It works, but it's slow and sometimes gets confused"

After composable design:

Average response time: 2-4 seconds
Success rate: 94%
Number of tools: 12
Average tool calls per interaction: 1.8
User feedback: "It just works"

The difference isn't in the LLM. It's in the architecture.

Implementation Checklist for Enterprise MCP Tool Design

When designing your MCP tools for production systems, ask yourself:

Identity & Resolution

[ ] Do tools accept business identifiers (email, name, number)?
[ ] Does the backend handle ID resolution?

Safety & Reliability

[ ] Do creation tools require idempotency tokens?
[ ] Are state transitions atomic?
[ ] Are prerequisites validated before operations?

Authorization & Access

[ ] Do tools encode authorization scope in their interface?
[ ] Are search tools scoped to appropriate contexts?

Cognitive Load

[ ] Do tools provide sensible defaults?
[ ] Are tool names aligned with natural language?
[ ] Do descriptions document error modes?

Flexibility

[ ] Do update operations support partial updates?
[ ] Can agents modify relationships using business identifiers?

The Broader Pattern

This isn't just about designing hotel management systems. These patterns apply anywhere you're building AI agents that interact with enterprise systems and processes.

Healthcare: "Schedule a follow-up for this patient" should orchestrate appointment booking, notification, and record updates—not expose 15 scheduling APIs.

Finance: "File this expense report" should handle validation, approval routing, and accounting entries—not force the agent to understand your ERP's state machine.

Retail: "Process this return" should coordinate inventory, refunds, and customer notifications—not expose raw warehouse and payment APIs.

The question is always the same: *Are you designing tools around user intent, or around API operations?*

Conclusion

Enterprise MCP gives you the foundation for tool interoperability. Composable skills-based design is how you build something useful on that foundation.

The protocol won't save you from bad architecture. But good architecture—tools composed around user intent, with complexity pushed to governed backends—transforms MCP from a technical curiosity into a production-grade system.

Stop wrapping APIs. Start composing skills.

Your users will thank you. Your agents will thank you. (Ok, your agents probably won't.) But your operations team will definitely thank you.

What's your experience with MCP tool design? I'd love to hear what patterns you're discovering. Drop a comment or reach out on LinkedIn—the more we share these patterns, the faster we'll all build better AI systems.

This post builds on Beyond Basic MCP: Why Enterprise AI Needs Composable Architecture, where I explored the architectural principles that make MCP useful in production.

Top comments (6)

Brad McNew • Dec 23 '25

This is excellent 🔥 the “stop wrapping APIs, start composing skills” framing matches what we’ve seen too. One pattern I’d add from production MCP connectors: treat each “skill tool” like a mini-transaction boundary, and make it observable end-to-end.

A couple concrete things that helped:

Return a stable operation_id (and maybe steps_executed) from tools like process_guest_checkout so the host/UI can show progress and ops can trace failures without digging through model transcripts.

Standardize error shapes across tools (e.g., code, message, retryable, user_action, correlation_id) so the agent can reliably ask the right follow-up question vs. blindly retrying.

For multi-system “skills,” we’ve had good results with idempotency tokens + saga-style compensation handled server-side (as you noted), and an explicit dry_run/validate_only mode for high-risk actions.

Curious how you think about “capability negotiation” between host/server here. Do you version tools per skill, or expose a capabilities endpoint so hosts can degrade gracefully as the skill set evolves?

(We’ve been applying a lot of these patterns at Axite building production MCP integrations; happy to share more examples if helpful.)

Daniel Nwaneri • Dec 25 '25

@bradmcnew Great additions! Just published a production example
with some of these patterns: dev.to/dannwaneri/i-built-a-produc...

Implemented:

Stable operation IDs in every response
Standardized error shapes {error, hint, status, code}
Observable end-to-end (built-in timing breakdown)

One pattern I found interesting: For read-heavy operations (search),
idempotency is less critical than request correlation IDs for debugging.

Would love to hear how Axite handles multi-system transactions in
your production connectors!

Zayne Turner • Dec 26 '25

Great question—you're poking at a gap in the series.
What we actually did (in the sample app/reference architectures): Capability partitioning, not negotiation. Two+ separate MCP servers (staff vs. guest) with different tool sets scoped to persona. The host doesn't negotiate; it connects to the server appropriate for its role. Authorization boundaries are architectural, not dynamic.
What we didn't address: The versioning and degradation questions you're raising.

Our implicit approach has been:
Stable interfaces, evolving backends: Orchestrator signatures stay fixed; Workato recipes change (and are versioned) underneath. The compositional layer absorbs backend evolution.
Additive changes: New tools get added; old tools stay until deprecation is safe. We haven't had to make breaking interface changes yet.
Tool descriptions as soft contracts: The LLM reads descriptions at runtime, so well-documented prerequisites and error modes give the agent some adaptive capacity.
But that's not the same as explicit capability negotiation or version-aware degradation. We're relying on "don't break interfaces" discipline rather than protocol-level solutions.

Genuinely curious: How are you handling this at Axite? Are you exposing a capabilities endpoint with version metadata? Implementing fallback tool mappings? I'd love to hear what patterns are working in your integrations.

If there's appetite for it, this might warrant a follow-up post. Would be great to include real-world examples beyond our own.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.