Phil Whittaker

Posted on Jun 30 • Edited on Jul 1

Mastering the Subtle Art of MCP: Fine-Grained Control Beyond API Wrappers

#mcp #api #programming #ai

The Model Context Protocol (MCP) is often described as a “USB-C port for AI applications” – a standardised way to connect AI models to different data sources and tools. Its promise is to replace a tangle of bespoke integrations and RAG implementations with a single, consistent interface. In simpler terms, before MCP an AI assistant needed a custom pipeline for each external system (Slack, Google Drive, GitHub, etc.). With MCP, the assistant can plug into all of them through one unified protocol, significantly reducing complexity. This open standard (introduced by Anthropic in late 2024) lets developers expose their data or services via MCP servers and build AI applications (MCP clients) that talk to those servers.

At first glance, creating MCP servers seems straightforward. The spec defines clear roles like Tools, Resources, and Prompts. There are even utilities to auto-generate MCP servers from an API spec. However, the brilliance of MCP lies in its subtlety – it provides many “control surfaces” that you can tweak for optimal results. In practice, mastering MCP is much more than wrapping existing APIs. It’s about fine-grained control and strategic optimisation to get the best out of your LLM-agent integration. In this post, we’ll explore several nuanced aspects of MCP development – from simplifying parameter schemas and designing better tool descriptions, to handling pagination, context limits, model quirks, and composing multiple MCPs. The goal is to show how and why building a great MCP integration is easy to pick up but hard to truly master.

Why MCP Is Not Just Another API Wrapper

Yes, an MCP server is essentially a wrapper that exposes external system capabilities in a standardised way. But simply auto-wrapping an API without thoughtful design means missing out on a wealth of optimisation opportunities. Naively dumping an entire OpenAPI spec into an MCP generator might functionally work, but you risk losing important subtleties and control. The MCP ecosystem even has tools like Orval, which primarily generates client libraries from OpenAPI definitions, but also includes a mode for scaffolding MCP integrations. While convenient, such one-to-one conversions typically just relay existing endpoint definitions and descriptions straight through to the LLM. This “as-is” approach often fails to account for the unique needs of models and agents, resulting in suboptimal, brittle overly complex interactions.

To truly leverage MCP servers, you need to think of it as an adapter layer between the AI and your system – not a dumb pipe. This layer gives you the power to shape how the model perceives and uses your tools. By being intentional with these facets, you can make your MCP integration far more robust and efficient than a naive wrapper would be.

MCP Components at a Glance

To work effectively with MCP, it’s useful to understand its three core component types—Tools, Resources, and Prompts—which are the most widely used and supported today. There are additional component types defined in the specification, but they are not as widely implemented or adopted yet.

Tools: These are functions that the model can call directly. They represent actions or operations the AI can perform—like creating a new user, updating a record, or submitting a form.

Resources: These are data sources or endpoints whose content is made available to the model. Think of them as contextual inputs—like a current user profile, a document, or a configuration setting.

Prompts: These are predefined templates or workflows that a user or system can inject into a session to steer the model’s behaviour in a certain direction.

During the client-server handshake, the MCP server provides a list of available tools, resources, and prompts—each with a name and a description. These fields play a critical role in helping the model understand what capabilities are available and how to use them. It’s this discovery process that forms the basis for many of the optimisation strategies discussed throughout this post.

Simplifying Parameter Schemas for LLMs

One common challenge in MCP server development is deciding how to present tool input parameters to the LLM. If your tool’s input schema is overly complex – for example, requiring a deeply nested JSON object or a long list of fields – the model might struggle to construct it correctly. In our project found that the JSON payload for a certain tool was “too large and too complicated” for the LLM to reliably fill, leading to frequent mistakes. A notable issue was with UUID fields – even when clearly told how to construct these, the model sometimes failed to generate valid UUIDs. The nondeterministic nature of LLMs means that if the function signature is confusing, the AI can easily produce invalid or erroneous inputs.

The strategy to mitigate this is parameter schema simplification. Streamline the inputs your tool expects: make them simpler and more intuitive, and remove or handle complex requirements like UUID generation where possible. Provide sensible defaults and optional fields to minimise how much the model needs to specify. This significantly reduced complexity for the LLM – effectively abstracting away the tricky parts and making it easier for the model to understand how to call the tool correctly. In practice, this might mean combining several required fields into one (if logically possible), eliminating rarely-used parameters, or breaking one complex tool into two simpler tools called in sequence. The guiding principle is to make the model’s job easier: it should only have to provide the minimum, most natural information to invoke the action.

Combining Multiple API Calls into One Tool

In traditional software design, we often strive for each function or API endpoint to do one thing well. However, when designing MCP server tools, sometimes it pays off to bundle a sequence of steps into one tool so that the LLM can accomplish a goal with a single function call. If the underlying task normally requires calling multiple endpoints in succession (for example, first retrieving an ID, then using that ID to get details, and finally updating something), you have a choice: either expose these as three separate tools and hope the model figures out the workflow, or encapsulate the entire workflow into one higher-level tool.

There’s a subtle trade-off here. Generally, you want to give the model building blocks (tools) that are as simple as possible. But if a process is too elaborate or error-prone for the model to plan reliably, it might be better to simplify and standardise the process as a single action. By wrapping the multi-step sequence into one tool, you ensure the steps happen correctly and in order, and the model only has to make one decision (“call this composite tool”) instead of coordinating several calls correctly.

A practical example could be a “Create user account” tool that not only calls the signup API but also calls subsequent endpoints to set user preferences and retrieve the new user id. To the LLM it’s just one atomic action – less cognitive load, fewer chances to mess up intermediate steps. This is a particularly good example because it's pulling together simple, atomic actions—things that would normally be done step-by-step via a UI—into one cohesive tool. There's minimal chaining of logic—we’re streamlining repetitive boilerplate steps into a single operation. It makes sense to retrieve the user ID after the user is created, but this raises a question—should this enhancement be handled within the MCP server logic, or should it lead a change to the underlying API itself?Of course, overusing this strategy might limit flexibility (the model can’t call the sub-steps individually if it wanted to), so you need to judge case-by-case. The key is to wrap complexity inside the server when it makes things easier for the AI, even if it means your server code is doing more behind the scenes. This goes hand-in-hand with simplifying schemas: both are about reducing what the LLM has to manage directly.

Crafting Effective Tool and Resource Descriptions

Another powerful control surface in MCP are the names and descriptions you provide for each tool, resource, or prompt. During the discovery phase, the MCP server returns a list of capabilities along with descriptive text for each. These descriptions are essentially instructions or hints to the model about what each action or resource is for and how to use it. Writing good descriptions is an art in itself – it’s similar to prompt engineering, except now each description is embedded directly within the MCP specification, defined individually for each tool, resource, and prompt.

What makes a description “effective”? It should be clear, concise, and informative about the capability and its usage. For tools, a description might state: what the tool does, when to use it, any important constraints, usage rules, and even an example output if space permits. It can also help to highlight the critical workflow the tool supports, so the model understands its intended role. For resources, you might describe what data is available and how it can help the user. Providing such context can guide the model to use the tool appropriately. For instance, a tool that posts a Slack message might have a description saying

“Sends a message to a Slack channel.
Use this to communicate insights or alerts.
Do not use for retrieving information.”

This tells the model when it’s relevant.

However, there’s a caveat: LLM compliance with descriptions is not guaranteed. Different models (and even different clients hosting those models) may treat these instructions with varying strictness. Sometimes an LLM will creatively ignore or reinterpret your tool guidelines – essentially treating them as suggestions rather than rules. This means descriptions alone won’t solve all misuse problems, but they are still crucial for nudging the AI’s behaviour. Over time, we expect AI agents to adhere more reliably to provided specs (as inference improves), but for now you should both use descriptions to your advantage and remain cautious. Test how different models respond to your descriptions. You might find you need to rephrase or simplify language for a less capable model, or that you can rely on more advanced models to follow complex instructions.

In summary, think of descriptions as part documentation, part guardrail. They are your chance to speak directly to the model about each facets purpose. Invest effort in them – a well-crafted description can prevent a lot of confusion and unintended usage during the AI’s reasoning process.

Error Handling

If an agent encounters an error during tool execution, it may retry the operation until it receives a successful outcome. For this reason, it’s essential that the error messages returned from your API are clear, accurate, and descriptive—so that the LLM has enough context to reason about what went wrong and adapt accordingly. Just returning HTTP status codes is not good enough.

These improvements are typically best implemented on the API side rather than in the MCP server. Improving your API’s error responses benefits all users—not just LLMs—by making responses more human-readable and actionable.

Aligning Pagination with LLM Needs

Handling data pagination is a great example of a subtle detail that can trip up an MCP integration. The MCP spec itself standardises a cursor-based pagination model for any list operations (like listing resources or tools). Instead of page numbers and offsets, MCP uses an opaque string token (nextCursor) that the client can use to fetch the next chunk of results. This design is beneficial for many reasons (avoiding missing/skipping items, not assuming fixed page sizes, etc.), but it might not align with how your underlying API or data source paginates. Many REST APIs use offset & limit or page number schemes. As an MCP developer, you need to bridge that gap.

For example, imagine your server connects to an API that returns 100 results per page with a page query parameter. Simply exposing that as-is to an LLM (e.g., a tool parameter for page number) would likely be confusing and not very agentic – the model might not know when to increment the page or when it has all data. Instead, you’d implement cursor-based pagination in your MCP server: perhaps the first call fetches page 1 and you return those results along with a synthesized nextCursor token (which internally encodes “page 2”). The client (or the LLM via function calling loops) can then call again with that cursor to get the next batch.

The tricky part is choosing page sizes and overall strategy that suit LLM usage. If pages are too large, you might overflow the model’s context or waste tokens sending a huge list when only a few items were needed. If too small, the model might have to make many calls (possibly hitting rate limits or slowing down responses). Finding the sweet spot may require tuning. In practice, consider how the data will be used in the conversation – if the user asks for “a summary of recent X”, maybe the first page is plenty and you don’t want the model blindly paginating through everything. Conversely, if the user explicitly requests “show all results”, you might allow the model to loop through nextCursor until exhaustion (or set a sensible cap).

The key is aligning your pagination strategy with the AI’s workflow and the needs of the data. Document in your tool description how pagination is supposed to work (“This resource returns results in batches of 20. Use the nextCursor to get more results.”). In the MCP client, ensure that the function-calling mechanism can indeed loop over nextCursor if needed. By consciously designing pagination and making it clear to the model, you prevent confusion like the model trying to use non-existent page indices or dumping excessive data into the conversation. It’s a fine balance that might require iteration based on real usage.

Managing Context Size and Scope

When an MCP client connects to a server, it typically performs a discovery to retrieve the full list of available tools, resources, and prompts along with their descriptions. All this information can be injected into the model’s context (for instance, as function definitions or system messages) so that the model knows what capabilities it has at its disposal. But as you add more and more tools to a single MCP server, the discovery payload grows – and so does the prompt context the model must carry. Too many tools can bloat the context and overwhelm smaller models.

In our MCP server we had 196 tools exposed, which we acknowledged as “probably…too many”. The LLM’s prompt became huge after discovery. While a more advanced model with a large context window handled it without issue, lower-tier models struggled. Each subsequent call the model made had to include that large JSON function spec, eating up tokens and potentially causing it to hit context limits or simply become confused.

There is a design tension here: you want enough tools to be useful – a certain critical mass of capabilities – but not so many that you dilute relevance and overload the model. Finding that balance is part of the MCP subtlety. It may involve hard decisions about scoping: Do you really need every single endpoint available to the AI, or just the key ones? Perhaps your initial development includes dozens of tools, but you later prune or reorganize them when you see which are actually used. Another tactic is to dynamically load tools – while MCP’s spec doesn’t currently support partial discovery out of the box, a server could theoretically filter tools, and allow for conditional discovery based on authorisation scopes or environment configurations.

You might also consider splitting your tools across multiple MCP servers, but this strategy has limits. Because each server still contributes its tools to the overall discovery context, splitting into sub-servers primarily helps when you can selectively enable or disable subsets of functionality depending on context or user role. It's less about reducing overall context load and more about having finer-grained control over what the user allows to be presented to the model.

In summary, pay attention to the size of the context your MCP is generating. If you target powerful models like GPT-4 or Claude 4 with huge context windows, you can lean toward convenience (lots of tools in one). But if you aim to support local models or earlier generation models with limited context, lean toward minimalism. Trim any fat – overly verbose descriptions, rarely-used endpoints – to keep the prompt as tight as possible. Your LLM (and your token budget) will thank you.

Navigating Host-Specific Behaviours and Prompt Usage

So far we’ve discussed differences on the model side, but there are also variations on the host side to be aware of. "Host" here means the application or agent that is hosting the model and the MCP client which connect to the MCP server(s) – examples include but are not limited to Claude Desktop, Cursor, Windsurf, VS Code, custom agent frameworks, etc. Each client may present the MCP’s capabilities to the model (and user) in different ways. This can affect how certain features like Resources and Prompts are used.

For instance, Prompts in MCP are defined as user-controlled parameterised templates or preset interactions. In Anthropic’s Claude Desktop (one of the flagship MCP hosts), these appear as what some call "desktop prompts" – essentially pre-built prompts that a user can manually select to perform a task or query on some data. The current behavior in Claude Desktop is that these prompts are not automatically invoked by the AI; instead, the user has to drag them into the conversation or trigger them via the UI. The LLM itself won’t decide to use a prompt template mid-conversation – it’s up to the user or client interface to include this in context. Some MCP clients might allow more dynamic usage of prompts – for example, detecting when a model might benefit from a particular prompt and then suggesting or automatically including it. This can serve as an alternative to embedding complex process guidance within tool descriptions, offering a cleaner and more reusable way to steer model behavior.

Resources introduce a different kind of complexity. Although conceptually simpler—they provide contextual data to the model—they rely heavily on client behaviour. In Claude Desktop, for instance, resources are only included in the conversation when manually dragged into context by the user. Additionally, parameterised resources—a supported feature of the MCP spec—are currently not supported in Claude Desktop and any parameterised resources do not appear in its user interface at all.

The key is to design with the client in mind. In environments like Claude Desktop, prompts aren’t triggered automatically and can be hard for users to find, so using tools often makes more sense—they can be invoked directly by the model. Still, prompts and resources can be worthwhile additions. Just make sure users understand how and when to use them. Even without automation, good user guidance helps unlock their value. And when the host client does support automated invocation—like triggering prompts based on context or dynamically injecting resources—these components become especially effective. As more clients gain these capabilities, prompts and resources will become even more powerful tools in your MCP design.

In short, know your host and client audience. The best MCP integration for a desktop assistant might differ from one for a headless agent. Adjust which features you lean on (tools vs resources vs prompts) and how you describe them, so that you play nicely with the client’s interaction model.

Tuning MCP to Specific Use Cases

A recurring theme in all these points is that context matters – both the technical context (model/client) and the use case context. MCP gives you a lot of flexibility to create custom integrations that serve very particular needs. Two different MCP servers might both wrap, say, a Project Management API, but one could be tuned for a software engineering assistant and another for a sales assistant. They might expose different subsets of functionality or phrase things differently. This is a strength of MCP: it’s not one-size-fits-all, and you, as the developer, have the opportunity (and responsibility) to tailor it.

When planning an MCP integration, start by considering the end goal: What will the AI + this tool be used for? If it’s a general-purpose connector (like a generic database query tool), you’ll want to include broad capabilities. But if it’s a very focused assistant (maybe an AI that helps schedule meetings via a calendar API), you might only need a handful of highly-optimized tools. MCP makes it easy to tailor integrations for the needs of specific users. Remove endpoints that don’t add value for the target use case. Consolidate or augment ones that do. For example, if an API has 50 endpoints but your HR assistant really only needs 5 of them, you can provide just those 5 as MCP tools – perhaps even consolidate multiple tools into a single, higher-level one, as discussed earlier. – and maybe add one or two custom prompts that reflect common HR queries.

You should also consider the critical paths in your use case. Identify the likely sequence of actions the AI will need to take and optimise those. For instance, in a bug triage assistant for GitHub, the common flow might be: list open issues -> read an issue -> maybe comment or label it. You’d ensure those tools/resources (“list_issues”, “get_issue”, “comment_on_issue”, etc.) are well-tuned (simple schemas, good descriptions, tested thoroughly), whereas less frequent actions (closing an issue, adding a collaborator) could be lower priority or even omitted initially. By tailoring to the use case, you both reduce clutter for the model and deliver a better experience for the model and user.

If your MCP server is too limited in functionality, the benefits of connecting it to an LLM quickly diminish. The power of MCP lies in giving the model meaningful choices and flexibility. If there’s not enough scope for the AI to reason across options or take varied actions, then the setup becomes more like a fixed workflow than a dynamic integration. In such cases, it may be more practical to implement a simpler, fixed-path solution instead of using MCP at all.

In essence, use-case specific tuning is about being strategic: choose and design your MCP server components based on what’s actually needed, not just what the underlying API offers. This focus will naturally help with many of the earlier points like context size management (fewer, more relevant tools) and clarity (the model isn’t distracted by irrelevant options). It’s part of why MCP is “easy to learn, hard to master” – the protocol itself is generic, but making an excellent integration requires understanding the domain and iterating on what works best.

Composing Multiple MCPs for Greater Capability

One powerful feature of MCP is that a single client can connect to multiple servers at once. Since each host runs one client, a host app (like a chat tool) can access many different MCP servers in parallel—each offering different tools, resources, or prompts. This lets the AI use capabilities from different domains (e.g. messaging, files, CRM) in the same session without needing to bundle everything into one server. This means you can compose multiple MCP servers to dramatically expand the AI’s toolkit. However, doing so adds another layer of subtlety to manage.

Imagine you have separate MCP servers for different domains: one for your internal knowledge base, one for an external CRM, and one for a coding assistant. In theory, your AI could use all of these in one session – that’s incredibly powerful, but also potentially overwhelming. The model now has an even larger combined set of tools and resources, and it must choose which server’s tool to call for each need. Simply throwing 3 MCPs worth of capabilities at the model can reintroduce the context bloat and confusion we cautioned against earlier.

The key to doing it successfully is again thoughtful composition. Some tips when using multiple MCP servers:

Group by related functionality : Avoid having two servers that overlap heavily in purpose. This can lead to redundant tools and confusion for the model. If overlap is unavoidable, consider disabling or hiding one set of tools.
Be mindful of naming: When tools across servers have similar names or functions, clarify their purpose in the descriptions or use namespaced naming (e.g., sales_lookup_contact vs dev_lookup_function). Since the model sees one merged list, ambiguity can lead to errors, like mixing up a CRM “create_record” with a DevOps one.
Test combined usage: Multiple servers can introduce conflicting expectations. A prompt from one might assume it’s the only one present, while another might assert a different role or priority. This can confuse the model. Test your setup with all servers active to check how the model responds. A system note like “You have tools from different domains—use whichever fits the task” can help set expectations and improve coherence.

Composing MCPs is somewhat akin to microservices architecture in software – smaller, focused servers that together provide a wide range of features. It offers modularity (you can develop and maintain each MCP independently, maybe even different teams for different services) and flexibility (users can mix and match which servers to connect). But just like microservices, it introduces complexity in orchestration. Expect to refine how you integrate multiple MCPs, especially as the number grows. There might even be cases where you decide to merge some servers after all, or vice versa, to achieve a better balance.

In any case, the ability to connect multiple MCP servers gives a glimpse of how agientic AI systems can scale – by plugging into many specialised “skills” on demand. As developers, we are still learning best practices for this, but it’s clear that a careful, strategy-driven approach is needed to get the most out of it without confusing the AI.

Conclusion: Easy to Learn, Hard to Master

Mastering MCP is a bit like learning SQL or playing chess—you can get started quickly, but real expertise takes time and experience. The basics are easy enough to pick up – you can get a simple MCP server running in minutes – but the depth and nuance can take much longer to fully grasp. This is actually a good thing: it means MCP, as a tool, has rich capabilities that reward the effort you put into refining your integration.

To recap, mastering MCP involves attention to many details: designing simple and robust parameter schemas, giving the LLM clear guidance through descriptions, cleverly wrapping underlying logic to help (not hinder) the AI, handling pagination and large data in an LLM-friendly way, managing the scope of what you expose to avoid overload, adapting to the strengths and weaknesses of your target model, considering how different hosts will use the protocol, tuning everything to the specific user scenario, and finally orchestrating multiple MCP servers if needed. Each of these gives you a small but important lever to adjust, and together they define the agent’s experience.

For AI developers, the journey of building with MCP can be incredibly rewarding. You’re essentially shaping how an AI interacts with the world of data and services. When done well, the AI feels like a seamless extension of those services – using them efficiently and intelligently. Done poorly, you might see the AI stumble, misuse tools, or get bogged down by the very integrations that were meant to help it.

In closing, MCP truly opens up a world of possibilities for AI applications by bridging them with external tools and data. It’s a young technology, and best practices are still evolving – which makes it an exciting space to work in. By understanding the subtle aspects discussed above, you can go beyond the basics and build MCP integrations that are not only functional, but optimised and resilient. Easy to pick up, hard to master – yes – but that just means there’s a lot of opportunity for those willing to go the extra mile.

DEV Community