Stephen Phillips

Posted on Jul 3

Tool soup is the first real MCP problem. A dynamic proxy is one way out

#ai #agents #automation #mcp

MCP is useful because it gives agents hands.

That is also where the trouble starts.

A toy agent with three tools is easy to reason about. Search the web, read a file, write a note. Fine. A business agent is different. The moment you connect email, CRM, documents, analytics, Stripe, WordPress, calendars, spreadsheets and customer records, you no longer have an "AI assistant". You have a small operating system with a language model at the centre.

The current MCP conversation has a pattern: everyone gets excited about how many tools agents can reach, then a few days later they start asking why the agent chose the wrong one.

That is tool soup.

What MCP actually gives you

The official MCP tools spec says servers can expose tools that language models invoke. Those tools can query databases, call APIs or perform computations. Each tool has a name and schema so the client can discover it and call it in a structured way.

That is a big improvement over bespoke glue code. You do not want every agent framework to invent its own way to talk to Gmail, Postgres, Shopify or WordPress.

But discovery is not judgment. The fact that a model can see a tool does not mean it should see that tool for every task.

For a small business, this distinction matters. The owner does not care that the agent has 80 integrations. They care that the invoice went to the right customer, the refund was not issued twice, and the marketing email did not mention the wrong product.

The failure mode

Tool soup usually shows up in boring ways:

The agent spends too long deciding what to call.
It picks a broad tool when a narrow one would do.
It retrieves irrelevant context and pollutes the conversation.
It can technically update live systems, so every mistake feels expensive.
Debugging gets weird because the prompt, the tool schema and the external system all affect the outcome.

This is not a reason to avoid MCP. It is a reason to treat tool exposure as product design.

The official spec also says applications should make clear which tools are exposed, show when tools are invoked, and keep a human in the loop for operations. That is the right instinct. If an agent can change business state, the user needs a veto.

The wrong answer: give the agent everything

The lazy version of MCP is to connect every server at startup and hope the model figures it out.

That feels powerful in a demo. It is rough in a real workflow.

Tool schemas cost context. Similar tools compete with each other. A CRM search tool, a database query tool, a spreadsheet reader and a general browser tool may all look plausible for the same task. The model has to reason about the work and about the tool menu at the same time.

That is like asking a plumber to carry the entire workshop into every bathroom. More tools does not mean more focus.

A better mental model: a rotating tool belt

The better pattern is not "one agent sees every tool".

It is closer to a rotating tool belt.

The agent starts with a small set of proxy tools. It tells the proxy what it is working on: the tech stack, the project context, the task. The proxy activates the few MCP servers that match the job. When the task changes, the belt changes.

That is the idea behind Dynamic MCP Proxy, a HappyMonkeyAI project we have been using as a practical answer to tool soup.

The proxy keeps the initial surface small. The README describes a proxy_handshake({ tech_stack, task_description }) flow: the agent gives context, the matcher scores catalogue entries, the top servers are activated lazily, and a tool budget is enforced with LRU eviction.

In plain English: the agent does not carry the whole toolbox. It asks for the tools needed for the current job.

Why a proxy layer helps

A dynamic proxy does a few useful things at once.

First, it keeps the active tool count under control. DynamicMCPProxy is designed around a tool budget, with the README calling out a 50-tool limit for Google Antigravity-style environments. The exact number matters less than the principle: the model should not be staring at a giant menu when the task needs five tools.

Second, it makes tool loading contextual. A WordPress content task and a Docker deployment task should not expose the same tools. A support workflow should not need infrastructure tools. A finance workflow should not need image-generation tools.

Third, it lets private tools live in a catalogue. Public MCP servers can sit in catalogue.json; personal or company-specific servers can sit in a gitignored overlay. That is important for real businesses because their useful tools are often private: internal APIs, local scripts, CRM wrappers, document stores, reporting endpoints.

Fourth, it gives the agent a way to evolve its toolset during long jobs. If the task moves from planning to deployment, the proxy can activate different servers instead of forcing the first context window to carry every possible future tool.

A small business example

Imagine a local agency using agents for lead handling.

Without a proxy, the agent might start with access to:

Email
CRM
WordPress
Google Sheets
Analytics
Search
Calendar
Stripe
Filesystem
Internal notes
Social scheduling
Project management

That sounds impressive until the agent has to decide what to do with a single enquiry.

With a dynamic proxy, the first handshake might activate only:

Inbox reader
CRM lookup
Service notes
Draft reply writer
Follow-up task creator

That is enough. The agent can read the lead, identify the likely service, draft a reply, update the CRM with a summary, and wait for approval.

If the user then says, "turn this into a blog post too", the proxy can rotate the belt: WordPress drafts, keyword research, brand notes, image brief. The support tools can disappear.

That is a cleaner operating model.

Routing beats dumping

The production pattern is routing, not dumping.

A coordinator receives the user request. It decides which small capability set is relevant. Then the worker agent gets only those tools.

For example:

Content request: WordPress drafts, keyword research, image brief, internal brand docs.
Finance request: invoices, payment records, customer ledger, draft-only email.
Support request: inbox, orders, refund policy, draft response.
Reporting request: analytics, CRM exports, spreadsheet writer.

Each worker is simpler. Each approval boundary is clearer. The logs make more sense.

This also helps cost and latency. If you inject every available tool into every conversation, you burn context before the model has even understood the job.

Observability is not optional

The interesting trend is not just MCP itself. It is MCP plus routing, logging and observability.

Pydantic Logfire now has MCP server docs, which points in the right direction: agents should be able to inspect traces and failures instead of guessing. DynamicMCPProxy also exposes proxy health, active servers, usage counts and metrics. That kind of surface matters because dynamic tool loading should not be invisible magic.

You want to know:

Which servers were active?
Why were they selected?
Which tool did the agent call?
What arguments did it pass?
What came back?
Did the user approve the action?
What changed afterwards?

If you cannot answer those questions, you do not have automation. You have vibes with API keys.

The small business version

A small business does not need a giant autonomous agent. It needs a few dull workflows that save staff time without creating new mess.

Good first workflows:

Turn contact-form submissions into drafted replies.
Summarise sales calls and update CRM notes.
Pull weekly analytics and explain what changed.
Draft WordPress posts from approved research notes.
Check overdue invoices and draft polite reminders.

Bad first workflows:

Let the agent send emails without review.
Let it issue refunds before the policy is encoded.
Give it unrestricted filesystem and database access.
Connect every SaaS account because the demo looked good.

Start narrow. Keep approval visible. Log everything.

A practical checklist

Before adding an MCP tool or server to a business agent, ask:

Is this tool read-only or can it change state?
If it changes state, who approves the call?
Is there a narrower version of the tool?
Does the agent need this tool for this workflow, or is it just convenient?
Should this be always-on, or loaded only after a task handshake?
Can we replay the trace when something goes wrong?
Can we disable the tool quickly?
Does the user understand what the tool can touch?

Most agent failures are not dramatic. They are small mismatches between intent and tool access.

MCP makes business automation easier to build. It does not remove the need for boundaries. If anything, it makes boundaries more important, because the agent can finally do real work.

That is why the proxy pattern feels right to me. Not a giant agent with every tool under the sun, but a small coordinator with a rotating tool belt.

The agent should not carry the whole workshop. It should carry what the job needs.

Source notes used while drafting: