DEV Community

Cover image for Don't Build Your MCP Server as an API Wrapper
Lovanaut
Lovanaut

Posted on

Don't Build Your MCP Server as an API Wrapper

Anthropic recently published a useful post on building agents that reach production systems with MCP:

Building agents that reach production systems with MCP

The most important line for MCP server builders is not "build an MCP server." It is the design guidance underneath it:

Group tools around intent, not endpoints.

That distinction is easy to underestimate.

If you already have a REST API, the obvious first version of your MCP server is a thin wrapper around it:

list_responses
get_response
update_response
delete_response
export_responses
send_notification
Enter fullscreen mode Exit fullscreen mode

That works for demos. It is not enough for production agents.

I've been building FORMLOVA, a form-operations product where users can create forms, review responses, classify sales pitches, run analytics, and trigger workflows through MCP clients. The hardest part has not been exposing database operations. The hard part has been deciding what meaning the MCP layer should carry.

This post is a practical guide to that boundary.

The problem with endpoint-shaped tools

Suppose a user asks:

Show me this month's conversion rate, excluding sales pitches.
Enter fullscreen mode Exit fullscreen mode

With endpoint-shaped tools, the agent has to do this:

1. list_responses
2. handle pagination
3. inspect spam_label
4. decide which labels to remove
5. filter by date range
6. aggregate the count
7. compute the metric
8. explain the result
Enter fullscreen mode Exit fullscreen mode

That is a lot of domain logic to push into the model on every run.

The more production-shaped the workflow becomes, the worse this gets:

  • Which label means "sales"?
  • Should uncertain responses be removed too?
  • Should unclassified responses remain?
  • What happens if a human manually corrected a label?
  • Does the query need to respect soft-deleted rows?
  • Should the result be allowed to trigger a workflow?

If your MCP server does not answer those questions, the model has to reconstruct them from tool descriptions and prompt context. That is fragile.

The MCP server should not be just an HTTP client with tool schemas. It should carry the product's operational semantics.

A small example: exclude_sales

FORMLOVA classifies incoming form responses into three labels:

type SpamLabel = "legitimate" | "sales" | "suspicious";
Enter fullscreen mode Exit fullscreen mode

The classifier is not the interesting part for this post. The MCP design is.

Several response and analytics tools accept an exclude_sales parameter:

server.registerTool("get_responses", {
  inputSchema: {
    form_id: z.number().int(),
    limit: z.number().int().min(1).max(100).default(50),
    exclude_sales: z.boolean().default(false),
  },
});

server.registerTool("get_form_analytics", {
  inputSchema: {
    form_id: z.number().int(),
    exclude_sales: z.boolean().default(false),
  },
});
Enter fullscreen mode Exit fullscreen mode

The implementation is deliberately boring:

if (exclude_sales) {
  query = query.or("spam_label.is.null,spam_label.neq.sales");
}
Enter fullscreen mode Exit fullscreen mode

That line encodes a product decision:

  • sales responses are excluded
  • suspicious responses remain visible
  • null responses remain visible

Why? Because uncertain and unclassified responses should not disappear silently. A real inquiry misclassified as sales is much more expensive than a sales pitch slipping through.

This is the kind of rule that belongs on the server, not in the model's working memory.

The user says:

Analyze responses without sales pitches.
Enter fullscreen mode Exit fullscreen mode

The agent maps that to:

{ "exclude_sales": true }
Enter fullscreen mode Exit fullscreen mode

The server owns the domain rule.

That is the difference between an API wrapper and an intent-aware tool.

Labels should become operational state

A common mistake with AI classification features is to stop at the badge.

You run a classifier, store a label, and show it in the UI:

type ResponseClassification = {
  spam_label: "legitimate" | "sales" | "suspicious" | null;
  spam_score: number | null;
};
Enter fullscreen mode Exit fullscreen mode

That is useful, but incomplete.

In a production workflow, the label should become operational state:

legitimate  -> include in analytics, notify the team
sales       -> exclude from analytics, suppress routine notifications
suspicious  -> send to human review
Enter fullscreen mode Exit fullscreen mode

FORMLOVA triggers workflows after classification:

await executeWorkflows(formId, "response.classified", {
  form_id: formId,
  response_id: responseId,
  spam_score: spamResult.score,
  spam_label: spamResult.label,
});
Enter fullscreen mode Exit fullscreen mode

Now the label is not just UI metadata. It is a condition for the next operation.

Example workflow shapes:

when response.classified
if spam_label == "legitimate"
then send Slack notification

when response.classified
if spam_label == "suspicious"
then ask a human to review

when response.classified
if spam_label == "sales"
then skip normal notifications
Enter fullscreen mode Exit fullscreen mode

This is where the MCP layer starts to matter. The agent is not just reading rows. It is moving form responses through an operations pipeline.

Manual override is part of the model

If an AI labels a legitimate inquiry as sales, the user must be able to fix it.

More importantly, the system must remember that a human fixed it.

FORMLOVA stores label source:

type LabelSource = "auto" | "manual";
Enter fullscreen mode Exit fullscreen mode

Automatic classification updates only rows that are still automatic or unclassified:

await db
  .from("responses")
  .update({
    spam_label: spamResult.label,
    spam_score: spamResult.score,
    spam_label_source: "auto",
    spam_classified_at: new Date().toISOString(),
  })
  .eq("id", responseId)
  .or("spam_label_source.is.null,spam_label_source.eq.auto");
Enter fullscreen mode Exit fullscreen mode

Manual correction flips the source:

await db
  .from("responses")
  .update({
    spam_label: newLabel,
    spam_label_source: "manual",
    spam_classified_at: new Date().toISOString(),
  })
  .eq("id", responseId);
Enter fullscreen mode Exit fullscreen mode

The MCP tool exposes this as part of response management:

server.registerTool("update_response", {
  inputSchema: {
    response_id: z.number().int(),
    status: z.enum(["new", "in_progress", "resolved", "spam"]).optional(),
    notes: z.string().optional(),
    tags: z.array(z.string().max(50)).max(20).optional(),
    spam_label: z.enum(["legitimate", "sales", "suspicious"]).optional(),
  },
});
Enter fullscreen mode Exit fullscreen mode

The user does not think in database terms:

This one is not sales. Mark it as legitimate.
Enter fullscreen mode Exit fullscreen mode

The agent finds the response, calls update_response, and the server protects the human correction from future automatic runs.

This is another intent boundary. The user is not "updating a row." They are correcting the operational state of an inquiry.

Blocking and classifying are different layers

For contact forms, it is tempting to ask:

If AI can detect sales pitches, why not block them automatically?

Because a false positive is too expensive.

Bot defenses belong before submission:

  • honeypot fields
  • Turnstile / reCAPTCHA
  • rate limiting
  • signed form tokens

Those stop mechanical abuse.

Human-written sales pitches are different. They may be annoying, but they are still real submitted content. If you silently drop one real customer inquiry because the model was wrong, the damage is not recoverable from the form layer.

So FORMLOVA classifies after arrival:

Before submission: block obvious bots
After submission: classify meaning
After classification: let the operator decide
Enter fullscreen mode Exit fullscreen mode

This separation is important for MCP tool design.

Do not turn every classifier into an automatic blocker. Use classification as a state that downstream tools can act on.

Workflows need stronger confirmation than CRUD

Another subtle MCP design problem: some tools look harmless when called, but create future side effects.

Example: saving a workflow rule.

server.registerTool("set_workflow", {
  inputSchema: {
    form_id: z.number().int(),
    name: z.string().min(1),
    trigger_type: z.enum([
      "response.created",
      "response.updated",
      "capacity.reached",
      "deadline.approaching",
      "response.classified",
    ]),
    conditions: z.array(conditionSchema).optional(),
    actions: z.array(actionSchema),
  },
});
Enter fullscreen mode Exit fullscreen mode

The tool call itself only saves a rule.

But the rule may later send email, call a webhook, or update data automatically. That is a future external side effect.

Your MCP design should treat this differently from a normal "create row" operation. At minimum, the tool description should require the agent to summarize:

  • trigger
  • conditions
  • actions
  • external destinations

Then get confirmation before saving.

For high-risk operations, server-side confirmation is better than prompt-only confirmation. Prompt instructions are not a reliable safety boundary.

Chat is not the whole interface

Anthropic's post also talks about rich semantics: MCP Apps, elicitation, forms, dashboards, charts.

That matters because not every operation should be rendered as text.

In a form-ops product:

Good for chat:

Exclude sales pitches from this month's analysis.
Show only suspicious responses.
Mark this response as legitimate.
Notify the team only for non-sales responses.
Enter fullscreen mode Exit fullscreen mode

Good for UI:

Response list
Classification distribution
Review queue
Analytics chart
Before-publish checklist
Enter fullscreen mode Exit fullscreen mode

The boundary I use:

Chat: intent
MCP: meaning, constraints, execution
UI: inspection, comparison, correction
Enter fullscreen mode Exit fullscreen mode

If your MCP server returns only text, everything becomes a transcript. That is not always the best user experience. Sometimes the right tool result is a dashboard, a chart, or a form asking for missing input.

MCP and skills should not be collapsed

Anthropic also frames MCP and Skills as complementary:

  • MCP gives access to tools and data
  • Skills teach the agent how to use those tools to do real work

That distinction is useful.

In FORMLOVA, MCP can expose the ability to:

  • list responses
  • exclude sales
  • update labels
  • create workflows
  • run analytics

But "how to run a webinar registration workflow" is procedural knowledge:

send confirmation email
send reminder before the event
collect post-event feedback
route low ratings to follow-up
Enter fullscreen mode Exit fullscreen mode

That is not just an API surface. It is a playbook.

If you put all playbook knowledge into tool descriptions, your MCP surface becomes heavy and brittle. A cleaner split is:

MCP      = capabilities
Workflow = repeatable product-side automation
Skill    = procedural knowledge for using the capabilities
Enter fullscreen mode Exit fullscreen mode

This is where MCP gets more valuable over time: the same remote server can be used by more clients and more playbooks without changing the underlying product API every time.

A checklist for MCP server design

Before publishing a production MCP server, ask:

  1. Are my tools endpoint-shaped or intent-shaped?

If the model always has to stitch five primitives together, consider whether the server should expose a higher-level intent.

  1. Which domain rules are currently living in prompts?

Move stable rules into server behavior. Prompts are instructions. Server logic is enforcement.

  1. Do labels and statuses affect downstream operations?

If a label matters, make it usable in filters, analytics, exports, and workflows.

  1. Can a human correct AI output?

If yes, store the source of the correction and protect manual overrides.

  1. Does this tool create future side effects?

Workflow creation, notification rules, and scheduled jobs may need confirmation even if they do not execute immediately.

  1. Should the result be text, UI, or a follow-up question?

Do not force tables, charts, review queues, and confirmation forms into plain text.

  1. What belongs in MCP vs a skill or workflow?

Keep capabilities, reusable automations, and procedural playbooks separate.

The core idea

An MCP server can be a thin wrapper around your API.

But if production agents are going to use it reliably, it should become a semantic layer over your product.

For FORMLOVA, that means a form response is not just a row. It can be a legitimate inquiry, a sales pitch, an uncertain case, a Slack notification trigger, an analytics input, a workflow event, or a manually corrected state.

The MCP layer should expose those meanings directly.

That is what "group tools around intent, not endpoints" means in practice.

Further reading

FORMLOVA is free to start if you want to try the MCP-based form-operations flow directly:

Top comments (0)