The 5 pieces of AI plumbing every SaaS needs in 2026 (with code)

#ai #typescript #webdev #nextjs

Every SaaS is adding AI features in 2026. Most teams burn the first two weeks on the same five pieces of plumbing — none of which are the actual product. Here's each one, with working TypeScript for Next.js 15.

1. A streaming endpoint (not a blocking one)

Users won't stare at a spinner for 20 seconds. Stream tokens as they generate with server-sent events:

// app/api/chat/route.ts
const runner = anthropic.beta.messages.toolRunner({
  model: "claude-opus-4-8",
  max_tokens: 64000,
  thinking: { type: "adaptive" },
  system: SYSTEM_PROMPT,
  tools,
  messages,
  stream: true,
});

const stream = new ReadableStream({
  async start(controller) {
    for await (const messageStream of runner) {
      for await (const event of messageStream) {
        if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
          controller.enqueue(encode(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`));
        }
      }
    }
    controller.close();
  },
});
return new Response(stream, { headers: { "Content-Type": "text/event-stream" } });

2. Typed tool handlers

The difference between a chatbot and a product is tools — the model acting on your data. Define them once with Zod; the SDK's tool runner handles the execution loop:

export const searchOrders = betaZodTool({
  name: "search_orders",
  description: "Look up a customer's orders. Call when the user asks about order status.",
  inputSchema: z.object({ email: z.string().email() }),
  run: async ({ email }) => db.orders.findByEmail(email),
});

No manual agentic loop, no JSON schema by hand, inputs typed end to end.

3. Usage metering (or users will bankrupt you)

One enthusiastic user on your £10/month plan can generate £200 of API costs. Meter every request and weight output tokens (they cost ~5x input):

export function billableUnits(u: Usage): number {
  return u.input_tokens + (u.cache_read_input_tokens ?? 0) / 10 + u.output_tokens * 5;
}
// After each response:
await recordUsage(userId, billableUnits(message.usage));
// Before each request:
if (await getUsage(userId) > planLimit) return quotaExceeded();

4. Prompt caching that actually caches

Prompt caching can cut input costs ~90% — but it's a prefix match. One interpolated timestamp in your system prompt and you pay full price on every request. Rules:

System prompt is a frozen constant. No dates, no user names, no feature flags in it.
Dynamic context goes in the user message, after the cache breakpoint.
Never change the tool list mid-conversation — tools render at position 0 of the prefix.

export const SYSTEM_PROMPT = [{
  type: "text" as const,
  text: STABLE_INSTRUCTIONS,          // never interpolate into this
  cache_control: { type: "ephemeral" as const },
}];

Verify it works: usage.cache_read_input_tokens should be non-zero from the second request on.

5. A chat component that handles streams properly

Parse the SSE buffer across chunk boundaries — the naive split on every chunk drops tokens:

let buffer = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n\n");
  buffer = lines.pop() ?? "";   // keep the partial event for the next chunk
  for (const line of lines) handleEvent(line);
}

Don't want to write the rest?

All five pieces above are open source (MIT) in agentship-lite — copy them into any Next.js app.

If you want the full SaaS around it — Stripe subscriptions, auth, Postgres schema, plan gating wired to the metering — that's AgentShip, currently £49 early access.