Lovanaut

Posted on Apr 24 • Edited on Jul 18

Don't Build Your MCP Server as an API Wrapper

#agents #api #architecture #mcp

Anthropic recently published a useful post on building agents that reach production systems with MCP:

Building agents that reach production systems with MCP

The most important line for MCP server builders is not "build an MCP server." It is the design guidance underneath it:

Group tools around intent, not endpoints.

That distinction is easy to underestimate.

If you already have a REST API, the obvious first version of your MCP server is a thin wrapper around it:

list_responses
get_response
update_response
delete_response
export_responses
send_notification

That works for demos. It is not enough for production agents.

I've been building FORMLOVA, a form-operations product where users can create forms, review responses, classify sales pitches, run analytics, and trigger workflows through MCP clients. The hardest part has not been exposing database operations. The hard part has been deciding what meaning the MCP layer should carry.

This post is a practical guide to that boundary.

The problem with endpoint-shaped tools

Suppose a user asks:

Show me this month's conversion rate, excluding sales pitches.

With endpoint-shaped tools, the agent has to do this:

1. list_responses
2. handle pagination
3. inspect spam_label
4. decide which labels to remove
5. filter by date range
6. aggregate the count
7. compute the metric
8. explain the result

That is a lot of domain logic to push into the model on every run.

The more production-shaped the workflow becomes, the worse this gets:

Which label means "sales"?
Should uncertain responses be removed too?
Should unclassified responses remain?
What happens if a human manually corrected a label?
Does the query need to respect soft-deleted rows?
Should the result be allowed to trigger a workflow?

If your MCP server does not answer those questions, the model has to reconstruct them from tool descriptions and prompt context. That is fragile.

The MCP server should not be just an HTTP client with tool schemas. It should carry the product's operational semantics.

A small example: `exclude_sales`

FORMLOVA classifies incoming form responses into three labels:

type SpamLabel = "legitimate" | "sales" | "suspicious";

The classifier is not the interesting part for this post. The MCP design is.

Several response and analytics tools accept an exclude_sales parameter:

server.registerTool("get_responses", {
  inputSchema: {
    form_id: z.number().int(),
    limit: z.number().int().min(1).max(100).default(50),
    exclude_sales: z.boolean().default(false),
  },
});

server.registerTool("get_form_analytics", {
  inputSchema: {
    form_id: z.number().int(),
    exclude_sales: z.boolean().default(false),
  },
});

The implementation is deliberately boring:

if (exclude_sales) {
  query = query.or("spam_label.is.null,spam_label.neq.sales");
}

That line encodes a product decision:

sales responses are excluded
suspicious responses remain visible
null responses remain visible

Why? Because uncertain and unclassified responses should not disappear silently. A real inquiry misclassified as sales is much more expensive than a sales pitch slipping through.

This is the kind of rule that belongs on the server, not in the model's working memory.

The user says:

Analyze responses without sales pitches.

The agent maps that to:

{ "exclude_sales": true }

The server owns the domain rule.

That is the difference between an API wrapper and an intent-aware tool.

Labels should become operational state

A common mistake with AI classification features is to stop at the badge.

You run a classifier, store a label, and show it in the UI:

type ResponseClassification = {
  spam_label: "legitimate" | "sales" | "suspicious" | null;
  spam_score: number | null;
};

That is useful, but incomplete.

In a production workflow, the label should become operational state:

legitimate  -> include in analytics, notify the team
sales       -> exclude from analytics, suppress routine notifications
suspicious  -> send to human review

FORMLOVA triggers workflows after classification:

await executeWorkflows(formId, "response.classified", {
  form_id: formId,
  response_id: responseId,
  spam_score: spamResult.score,
  spam_label: spamResult.label,
});

Now the label is not just UI metadata. It is a condition for the next operation.

Example workflow shapes:

when response.classified
if spam_label == "legitimate"
then send Slack notification

when response.classified
if spam_label == "suspicious"
then ask a human to review

when response.classified
if spam_label == "sales"
then skip normal notifications

This is where the MCP layer starts to matter. The agent is not just reading rows. It is moving form responses through an operations pipeline.

Manual override is part of the model

If an AI labels a legitimate inquiry as sales, the user must be able to fix it.

More importantly, the system must remember that a human fixed it.

FORMLOVA stores label source:

type LabelSource = "auto" | "manual";

Automatic classification updates only rows that are still automatic or unclassified:

await db
  .from("responses")
  .update({
    spam_label: spamResult.label,
    spam_score: spamResult.score,
    spam_label_source: "auto",
    spam_classified_at: new Date().toISOString(),
  })
  .eq("id", responseId)
  .or("spam_label_source.is.null,spam_label_source.eq.auto");

Manual correction flips the source:

await db
  .from("responses")
  .update({
    spam_label: newLabel,
    spam_label_source: "manual",
    spam_classified_at: new Date().toISOString(),
  })
  .eq("id", responseId);

The MCP tool exposes this as part of response management:

server.registerTool("update_response", {
  inputSchema: {
    response_id: z.number().int(),
    status: z.enum(["new", "in_progress", "resolved", "spam"]).optional(),
    notes: z.string().optional(),
    tags: z.array(z.string().max(50)).max(20).optional(),
    spam_label: z.enum(["legitimate", "sales", "suspicious"]).optional(),
  },
});

The user does not think in database terms:

This one is not sales. Mark it as legitimate.

The agent finds the response, calls update_response, and the server protects the human correction from future automatic runs.

This is another intent boundary. The user is not "updating a row." They are correcting the operational state of an inquiry.

Blocking and classifying are different layers

For contact forms, it is tempting to ask:

If AI can detect sales pitches, why not block them automatically?

Because a false positive is too expensive.

Bot defenses belong before submission:

honeypot fields
Turnstile / reCAPTCHA
rate limiting
signed form tokens

Those stop mechanical abuse.

Human-written sales pitches are different. They may be annoying, but they are still real submitted content. If you silently drop one real customer inquiry because the model was wrong, the damage is not recoverable from the form layer.

So FORMLOVA classifies after arrival:

Before submission: block obvious bots
After submission: classify meaning
After classification: let the operator decide

This separation is important for MCP tool design.

Do not turn every classifier into an automatic blocker. Use classification as a state that downstream tools can act on.

Workflows need stronger confirmation than CRUD

Another subtle MCP design problem: some tools look harmless when called, but create future side effects.

Example: saving a workflow rule.

server.registerTool("set_workflow", {
  inputSchema: {
    form_id: z.number().int(),
    name: z.string().min(1),
    trigger_type: z.enum([
      "response.created",
      "response.updated",
      "capacity.reached",
      "deadline.approaching",
      "response.classified",
    ]),
    conditions: z.array(conditionSchema).optional(),
    actions: z.array(actionSchema),
  },
});

The tool call itself only saves a rule.

But the rule may later send email, call a webhook, or update data automatically. That is a future external side effect.

Your MCP design should treat this differently from a normal "create row" operation. At minimum, the tool description should require the agent to summarize:

trigger
conditions
actions
external destinations

Then get confirmation before saving.

For high-risk operations, server-side confirmation is better than prompt-only confirmation. Prompt instructions are not a reliable safety boundary.

Chat is not the whole interface

Anthropic's post also talks about rich semantics: MCP Apps, elicitation, forms, dashboards, charts.

That matters because not every operation should be rendered as text.

In a form-ops product:

Good for chat:

Exclude sales pitches from this month's analysis.
Show only suspicious responses.
Mark this response as legitimate.
Notify the team only for non-sales responses.

Good for UI:

Response list
Classification distribution
Review queue
Analytics chart
Before-publish checklist

The boundary I use:

Chat: intent
MCP: meaning, constraints, execution
UI: inspection, comparison, correction

If your MCP server returns only text, everything becomes a transcript. That is not always the best user experience. Sometimes the right tool result is a dashboard, a chart, or a form asking for missing input.

MCP and skills should not be collapsed

Anthropic also frames MCP and Skills as complementary:

MCP gives access to tools and data
Skills teach the agent how to use th

FORMLOVA MCP Form Service Guide

Top comments (1)

The Data Nerd • Apr 24

The API wrapper criticism usually misses the actual failure mode. The problem is not thinness, it is faithfulness. Wrapping an API directly preserves the pull-on-demand model that makes context retrieval expensive. The MCP pattern earns its value when the server aggregates or ranks data across time before surfacing it. Static pass-through is not the use case MCP optimizes for.