DEV Community

StemSplit
StemSplit

Posted on

5 Things I Wish I'd Known Before Writing a Production MCP Server in TypeScript (2026)

"Wire @modelcontextprotocol/sdk to your API, register a few tools with Zod schemas, ship it."

That's the shape of every MCP server tutorial I read before writing my first one. Three weeks of dogfooding stemsplit-mcp — the open-source MCP server I built for StemSplit's audio separation API — and that summary turns out to be just the prologue.

Below are the five things that make the difference between an MCP server that works in a tutorial and one you'd actually depend on. All of them are now hardcoded into stemsplit-mcp (source, MIT-licensed) and 100% portable to any other MCP server you write.

If you're about to write your first MCP server: read this. If you've already shipped one: there's a non-zero chance you have at least one of these bugs.

What You'll Get From This Post

  • ✅ A withRetry helper that handles transient failures without double-charging users
  • ✅ A simple rule for deciding which requests are safe to retry
  • ✅ Why relative paths from an LLM are a bug, and how to reject them with a helpful message
  • ✅ A structured error shape that gives the LLM machine-readable context
  • ✅ How to wire MCP progress notifications so long jobs don't look frozen

1. Retry transient failures — but only the right ones

The first version of every MCP server you write will look like this:

async function callApi<T>(path: string): Promise<T> {
  const res = await fetch(`${baseUrl}${path}`, { headers });
  if (!res.ok) throw new Error(`${res.status} ${res.statusText}`);
  return res.json();
}
Enter fullscreen mode Exit fullscreen mode

One bad upstream gateway, one transient 502, one TLS handshake hiccup, and the entire MCP tool call fails. The LLM sees the error, the user sees the error, the user concludes your tool is broken.

The fix isn't "retry everything." Retrying a POST /jobs that the server already processed will double-charge your user. The right fix is to classify the error, and to classify the request.

Here's the helper I now use everywhere:

export type RetryDecision = boolean | { retryAfterMs: number };

export interface RetryOptions {
  maxAttempts: number;
  initialDelayMs: number;
  maxDelayMs: number;
  shouldRetry: (err: unknown) => RetryDecision;
  onRetry?: (err: unknown, attempt: number, delayMs: number) => void;
}

export async function withRetry<T>(
  fn: () => Promise<T>,
  options: RetryOptions,
): Promise<T> {
  let attempt = 0;
  while (true) {
    attempt++;
    try {
      return await fn();
    } catch (err) {
      const decision = options.shouldRetry(err);
      if (!decision || attempt >= options.maxAttempts) throw err;

      const baseDelay = Math.min(
        options.initialDelayMs * 2 ** (attempt - 1),
        options.maxDelayMs,
      );
      const jitter = Math.random() * baseDelay * 0.25;
      const explicit =
        typeof decision === "object" ? decision.retryAfterMs : null;
      const delayMs = explicit ?? baseDelay + jitter;

      options.onRetry?.(err, attempt, delayMs);
      await new Promise((r) => setTimeout(r, delayMs));
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The interesting bit is the RetryDecision union: shouldRetry can return either true/false, or an object with an explicit retryAfterMs. That last one lets you honor Retry-After headers on 429s without writing a separate code path.

Then the policy is split by request shape:

function shouldRetryApiError(err: unknown, mutating: boolean): RetryDecision {
  if (err instanceof StemSplitError) {
    // Network-level error — server may not have seen the request
    if (err.code === "NETWORK_ERROR") return true;

    // 5xx on a read-only request — safe to retry
    if (err.status && err.status >= 500 && !mutating) return true;

    // 429 — honor Retry-After
    if (err.status === 429 && err.retryAfterMs !== undefined) {
      return { retryAfterMs: err.retryAfterMs };
    }
  }
  return false;
}
Enter fullscreen mode Exit fullscreen mode

The mutating flag is the rule. A 5xx response on a POST /jobs could mean the job was created and the response was just lost. Retrying it could double-charge. So POST /jobs is mutating: true, gets fewer attempts (3), and only retries on errors that prove the server never saw the request.

By contrast, GET /jobs/:id is mutating: false and gets a more aggressive policy (4 attempts, 5xx retries enabled). Even POST /upload — which only returns a presigned URL and doesn't change billing state — can be marked mutating: false.

This single distinction has saved me three production incidents already, mostly from rare 502s during the 10-minute polling loops on long stem-separation jobs.

2. Reject relative paths up front

When you give an LLM a tool that takes a file path, the LLM will eventually pass song.mp3. Or ./song.mp3. Or worse, file:///Users/me/Music/song.mp3 — a URL-form path that looks identical to its argument but fails Node's fs.createReadStream.

If you don't validate this, Node will resolve those against the MCP server's process working directory. For Claude Desktop and Cursor, that's usually / or /Applications/.... The file doesn't exist there. The LLM gets an "ENOENT" error, has no idea what /song.mp3 is, and either retries the same path or gives up.

The fix is to reject the bad path before you try to open the file, with an error message designed for the LLM to act on:

import { isAbsolute } from "node:path";

function isTildeHome(p: string): boolean {
  return p === "~" || p.startsWith("~/");
}

export function classifyLocalPath(source: string): string {
  const trimmed = source.trim();

  if (trimmed.startsWith("file://")) {
    throw new Error(
      "file:// URIs are not supported. Pass the absolute filesystem path " +
      "instead (e.g. /Users/you/Music/song.mp3).",
    );
  }

  if (!isTildeHome(trimmed) && !isAbsolute(trimmed)) {
    throw new Error(
      `Relative paths are not supported (got "${trimmed}"). ` +
      `Pass an absolute path like "/Users/you/Music/song.mp3" or a ` +
      `home-anchored path like "~/Music/song.mp3". ` +
      `If you do not know the absolute path, ask the user for it before retrying.`,
    );
  }

  return trimmed;
}
Enter fullscreen mode Exit fullscreen mode

Two things to call out:

  1. The error tells the LLM exactly what to do next. Not "invalid path" — "ask the user for the absolute path." This is the difference between a tool the LLM gives up on and a tool it recovers from gracefully.
  2. ~/foo is accepted because Claude Desktop is good at resolving it. It's a human-friendly form, and you'll get fewer dead-end conversations if you support it. Most filesystem helpers (fs.realpath, path.resolve with os.homedir()) handle it for you.

Side benefit: this also makes your tool description shorter. You can write path: "Absolute path like /Users/you/Music/song.mp3" in your Zod schema and the LLM will get the same hint from both the schema and the error message.

3. Make errors machine-readable

The default MCP error shape is just a string. That's fine for users, terrible for LLMs.

LLMs that have to figure out what to do next from an error message do best when the error has discrete states. "Out of credits" needs a different recovery than "rate limit hit" needs a different recovery than "file too large."

So I always wrap upstream errors into a class with a code:

export type StemSplitErrorCode =
  | "AUTH_INVALID"
  | "INSUFFICIENT_CREDITS"
  | "RATE_LIMIT_EXCEEDED"
  | "FILE_TOO_LARGE"
  | "UNSUPPORTED_FORMAT"
  | "JOB_FAILED"
  | "JOB_TIMEOUT"
  | "NETWORK_ERROR"
  | "API_ERROR";

export class StemSplitError extends Error {
  constructor(
    public readonly code: StemSplitErrorCode,
    public readonly userMessage: string,
    public readonly status?: number,
    public readonly retryAfterMs?: number,
    public readonly details?: unknown,
  ) {
    super(userMessage);
    this.name = "StemSplitError";
  }
}

export async function buildErrorFromResponse(
  res: Response,
): Promise<StemSplitError> {
  const text = await res.text();
  let body: { error?: string; code?: string } = {};
  try { body = JSON.parse(text); } catch { /* ignore */ }

  if (res.status === 401) {
    return new StemSplitError(
      "AUTH_INVALID",
      "StemSplit API key invalid. Check STEMSPLIT_API_KEY.",
      401,
    );
  }
  if (res.status === 402) {
    return new StemSplitError(
      "INSUFFICIENT_CREDITS",
      "Not enough StemSplit credits. Top up at stemsplit.io/app/billing.",
      402,
    );
  }
  if (res.status === 429) {
    const retryAfter = Number(res.headers.get("retry-after"));
    return new StemSplitError(
      "RATE_LIMIT_EXCEEDED",
      `Rate limited by StemSplit. Retry in ${retryAfter || 60}s.`,
      429,
      isFinite(retryAfter) ? retryAfter * 1000 : undefined,
    );
  }
  // ...etc
}
Enter fullscreen mode Exit fullscreen mode

When you serialize this back to the MCP client, include both:

{
  isError: true,
  content: [{ type: "text", text: err.userMessage }],
  _meta: { code: err.code, status: err.status }
}
Enter fullscreen mode Exit fullscreen mode

Anthropic's clients ignore _meta they don't understand, so this is forward-compatible. And the LLM gets a clean userMessage that's safe to relay to the user.

4. Fire progress notifications for anything over ~5s

MCP supports progress notifications:

await server.notification({
  method: "notifications/progress",
  params: {
    progressToken,
    progress: 35,    // 0–100
    total: 100,
  },
});
Enter fullscreen mode Exit fullscreen mode

If your tool takes more than 5 seconds (audio processing definitely does), use them. Without them, Claude Desktop will sit at "Running tool stemsplit/separate_stems..." indefinitely and the user has no idea if you're stuck or making progress.

The trick is wiring this through your polling loop:

export async function pollUntilDone<T>(
  fetchStatus: () => Promise<{ status: string; progress?: number } & T>,
  options: {
    onProgress?: (progress: number) => void;
    intervalMs?: number;
    timeoutMs?: number;
  } = {},
): Promise<T> {
  const interval = options.intervalMs ?? 3000;
  const timeout = options.timeoutMs ?? 10 * 60 * 1000;
  const start = Date.now();

  while (true) {
    const status = await fetchStatus();
    if (status.progress !== undefined) options.onProgress?.(status.progress);
    if (status.status === "COMPLETED") return status;
    if (status.status === "FAILED") throw new Error("Job failed");
    if (Date.now() - start > timeout) throw new Error("Job timed out");
    await new Promise((r) => setTimeout(r, interval));
  }
}
Enter fullscreen mode Exit fullscreen mode

Then your tool handler wires the MCP progress token through:

const progressToken = request.params?._meta?.progressToken;

const job = await pollUntilDone(
  () => client.getJob(jobId),
  {
    onProgress: progressToken
      ? (p) => server.notification({
          method: "notifications/progress",
          params: { progressToken, progress: p, total: 100 },
        })
      : undefined,
  },
);
Enter fullscreen mode Exit fullscreen mode

This is the difference between users abandoning a long job at 30 seconds and waiting patiently because they can see the bar moving.

5. Re-fetch presigned URLs on demand

This one isn't MCP-specific, but it bites every MCP server that wraps an API with expiring URLs.

Cloud storage providers (Cloudflare R2, S3, GCS) hand out presigned URLs that expire after 1–24 hours. If your MCP tool stores the URL in the chat history and the user comes back tomorrow asking "can you re-download those stems?", the URLs are dead and the LLM gets a 403.

Don't make the user re-run the entire separation job. Instead, expose a separate download_stems tool that takes a jobId, re-fetches the latest presigned URLs from your API, and downloads:

async function handleDownloadStems(jobId: string, outputDir: string) {
  const job = await client.getJob(jobId);
  if (job.status !== "COMPLETED") {
    throw new StemSplitError("JOB_FAILED", `Job ${jobId} not complete.`);
  }
  return downloadAllStems(job.outputs, outputDir);
}
Enter fullscreen mode Exit fullscreen mode

The LLM picks up on this naturally — if it has the jobId from an earlier chat, it'll call download_stems instead of re-running separate_stems. Your user re-downloads in 2 seconds instead of waiting 90 seconds for a fresh separation.

Bonus: this is also how you let the user choose a different output directory on the second download without redoing work.

Putting it all together

These five patterns make a real difference for any MCP server that hits a remote API:

  1. withRetry with mutating-aware policy — kills 90% of transient failures.
  2. Absolute path validation with actionable errors — saves you from confusing LLMs.
  3. Structured error codes — lets the LLM choose recovery strategy.
  4. Progress notifications — keeps users waiting instead of giving up.
  5. Re-fetch-by-ID tool — turns expiring URLs from a footgun into a feature.

None of these are in the MCP SDK examples. They're the lessons you only learn from running an MCP server against real users.

If you want to see all five in one place, the full implementation is in github.com/StemSplit/stemsplit-mcp — ~1.5k lines of TypeScript, MIT-licensed, every pattern above in production today.

And if you happen to want stem separation in your MCP-enabled AI assistant, stemsplit-mcp is on npm and works with Claude Desktop, Cursor, Cline, Windsurf, and Zed today. The StemSplit API it talks to is a hosted Demucs / HT-Demucs FT pipeline (same models you'd self-host) with a generous free tier — sign up at stemsplit.io/free-trial and you can have a working setup in five minutes.

Happy MCP-ing. Tell me what you build.


Source code: github.com/StemSplit/stemsplit-mcp • npm: stemsplit-mcp • Hosted API: stemsplit.io/developers

Top comments (0)