Every SaaS is adding AI features in 2026. Most teams burn the first two weeks on the same five pieces of plumbing — none of which are the actual product. Here's each one, with working TypeScript for Next.js 15.
1. A streaming endpoint (not a blocking one)
Users won't stare at a spinner for 20 seconds. Stream tokens as they generate with server-sent events:
// app/api/chat/route.ts
const runner = anthropic.beta.messages.toolRunner({
model: "claude-opus-4-8",
max_tokens: 64000,
thinking: { type: "adaptive" },
system: SYSTEM_PROMPT,
tools,
messages,
stream: true,
});
const stream = new ReadableStream({
async start(controller) {
for await (const messageStream of runner) {
for await (const event of messageStream) {
if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
controller.enqueue(encode(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`));
}
}
}
controller.close();
},
});
return new Response(stream, { headers: { "Content-Type": "text/event-stream" } });
2. Typed tool handlers
The difference between a chatbot and a product is tools — the model acting on your data. Define them once with Zod; the SDK's tool runner handles the execution loop:
export const searchOrders = betaZodTool({
name: "search_orders",
description: "Look up a customer's orders. Call when the user asks about order status.",
inputSchema: z.object({ email: z.string().email() }),
run: async ({ email }) => db.orders.findByEmail(email),
});
No manual agentic loop, no JSON schema by hand, inputs typed end to end.
3. Usage metering (or users will bankrupt you)
One enthusiastic user on your £10/month plan can generate £200 of API costs. Meter every request and weight output tokens (they cost ~5x input):
export function billableUnits(u: Usage): number {
return u.input_tokens + (u.cache_read_input_tokens ?? 0) / 10 + u.output_tokens * 5;
}
// After each response:
await recordUsage(userId, billableUnits(message.usage));
// Before each request:
if (await getUsage(userId) > planLimit) return quotaExceeded();
4. Prompt caching that actually caches
Prompt caching can cut input costs ~90% — but it's a prefix match. One interpolated timestamp in your system prompt and you pay full price on every request. Rules:
- System prompt is a frozen constant. No dates, no user names, no feature flags in it.
- Dynamic context goes in the user message, after the cache breakpoint.
- Never change the tool list mid-conversation — tools render at position 0 of the prefix.
export const SYSTEM_PROMPT = [{
type: "text" as const,
text: STABLE_INSTRUCTIONS, // never interpolate into this
cache_control: { type: "ephemeral" as const },
}];
Verify it works: usage.cache_read_input_tokens should be non-zero from the second request on.
5. A chat component that handles streams properly
Parse the SSE buffer across chunk boundaries — the naive split on every chunk drops tokens:
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n\n");
buffer = lines.pop() ?? ""; // keep the partial event for the next chunk
for (const line of lines) handleEvent(line);
}
Don't want to write the rest?
All five pieces above are open source (MIT) in agentship-lite — copy them into any Next.js app.
If you want the full SaaS around it — Stripe subscriptions, auth, Postgres schema, plan gating wired to the metering — that's AgentShip, currently £49 early access.
Top comments (0)