DEV Community

mt211211
mt211211

Posted on

The 5 pieces of AI plumbing every SaaS needs in 2026 (with code)

Every SaaS is adding AI features in 2026. Most teams burn the first two weeks on the same five pieces of plumbing — none of which are the actual product. Here's each one, with working TypeScript for Next.js 15.

1. A streaming endpoint (not a blocking one)

Users won't stare at a spinner for 20 seconds. Stream tokens as they generate with server-sent events:

// app/api/chat/route.ts
const runner = anthropic.beta.messages.toolRunner({
  model: "claude-opus-4-8",
  max_tokens: 64000,
  thinking: { type: "adaptive" },
  system: SYSTEM_PROMPT,
  tools,
  messages,
  stream: true,
});

const stream = new ReadableStream({
  async start(controller) {
    for await (const messageStream of runner) {
      for await (const event of messageStream) {
        if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
          controller.enqueue(encode(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`));
        }
      }
    }
    controller.close();
  },
});
return new Response(stream, { headers: { "Content-Type": "text/event-stream" } });
Enter fullscreen mode Exit fullscreen mode

2. Typed tool handlers

The difference between a chatbot and a product is tools — the model acting on your data. Define them once with Zod; the SDK's tool runner handles the execution loop:

export const searchOrders = betaZodTool({
  name: "search_orders",
  description: "Look up a customer's orders. Call when the user asks about order status.",
  inputSchema: z.object({ email: z.string().email() }),
  run: async ({ email }) => db.orders.findByEmail(email),
});
Enter fullscreen mode Exit fullscreen mode

No manual agentic loop, no JSON schema by hand, inputs typed end to end.

3. Usage metering (or users will bankrupt you)

One enthusiastic user on your £10/month plan can generate £200 of API costs. Meter every request and weight output tokens (they cost ~5x input):

export function billableUnits(u: Usage): number {
  return u.input_tokens + (u.cache_read_input_tokens ?? 0) / 10 + u.output_tokens * 5;
}
// After each response:
await recordUsage(userId, billableUnits(message.usage));
// Before each request:
if (await getUsage(userId) > planLimit) return quotaExceeded();
Enter fullscreen mode Exit fullscreen mode

4. Prompt caching that actually caches

Prompt caching can cut input costs ~90% — but it's a prefix match. One interpolated timestamp in your system prompt and you pay full price on every request. Rules:

  • System prompt is a frozen constant. No dates, no user names, no feature flags in it.
  • Dynamic context goes in the user message, after the cache breakpoint.
  • Never change the tool list mid-conversation — tools render at position 0 of the prefix.
export const SYSTEM_PROMPT = [{
  type: "text" as const,
  text: STABLE_INSTRUCTIONS,          // never interpolate into this
  cache_control: { type: "ephemeral" as const },
}];
Enter fullscreen mode Exit fullscreen mode

Verify it works: usage.cache_read_input_tokens should be non-zero from the second request on.

5. A chat component that handles streams properly

Parse the SSE buffer across chunk boundaries — the naive split on every chunk drops tokens:

let buffer = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n\n");
  buffer = lines.pop() ?? "";   // keep the partial event for the next chunk
  for (const line of lines) handleEvent(line);
}
Enter fullscreen mode Exit fullscreen mode

Don't want to write the rest?

All five pieces above are open source (MIT) in agentship-lite — copy them into any Next.js app.

If you want the full SaaS around it — Stripe subscriptions, auth, Postgres schema, plan gating wired to the metering — that's AgentShip, currently £49 early access.

Top comments (0)