DEV Community: GDS K S

Streaming LLM responses in TypeScript: SSE, ReadableStream, and the React 19 useChat hook.

GDS K S — Mon, 20 Jul 2026 06:48:18 +0000

Streaming LLM responses in TypeScript: SSE, ReadableStream, and the React 19 useChat hook.

The first time I wired an LLM response that streamed token by token instead of arriving as one lump after 4 seconds, I shipped it to production the same afternoon. The difference in perceived speed is that obvious to anyone who has used ChatGPT and then tried a non-streaming competitor. Users will wait 8 seconds if they can see the cursor moving. They will not wait 4 seconds for a blank screen.

This tutorial walks the full stack from scratch. By the end you have a working Next.js API route that streams from an LLM over Server-Sent Events, a frontend that parses the stream manually with ReadableStream, and then the same UI rebuilt with the Vercel AI SDK's useChat hook so you can see what the abstraction actually buys you.

TL;DR

Layer	What you build	Key API
Next.js route	Streams LLM output as SSE	`streamText` + `toUIMessageStreamResponse`
Vanilla client	Parses the stream by hand	`ReadableStream`, `TextDecoderStream`
React 19 client	Managed state + cancellation	`useChat` from `ai/react`
Edge cases	Backpressure, cancel, tool chunks	`AbortController`, partial JSON guard

1. Why streaming matters

A standard fetch returns after the entire response body is ready. For short completions, that is fine. For anything over 100 tokens, users see a spinner, then a wall of text, then confusion about whether the app is fast or slow.

Streaming changes the shape of that experience. The first token lands in under 300ms for most hosted models. The user starts reading while the model is still writing. Perceived latency drops by 60 to 70 percent even if the total time to complete the response does not change.

Cost transparency is the second reason to care. When you stream, you count tokens as they arrive. If your route has a budget ceiling and the response is going to blow past it, you can cut the stream at 800 tokens without ever waiting for the full completion. That cut is not possible with a blocking call.

2. The plumbing in one diagram

Here is how the pieces connect for a streaming chat request:

Browser                     Next.js API Route          LLM Provider
  |                               |                         |
  |-- POST /api/chat ------------>|                         |
  |   { messages: [...] }         |                         |
  |                               |-- streamText() -------->|
  |                               |                         |
  |<-- HTTP 200 (SSE stream) -----|<-- token chunks --------|
  |   Content-Type: text/event-stream                       |
  |                               |                         |
  | chunk: "The "                 |                         |
  | chunk: "quick "               |                         |
  | chunk: "brown "               |                         |
  |   ...                         |                         |
  | [DONE]                        |                         |

The route opens a persistent HTTP response with Content-Type: text/event-stream. Each chunk the LLM returns gets written to that response immediately. The browser receives chunks as they arrive and renders them without waiting for the response to close.

3. The backend: a streaming Next.js route

Install the Vercel AI SDK and the Anthropic provider:

npm install ai @ai-sdk/anthropic

Create app/api/chat/route.ts:

import { streamText, UIMessage, convertToModelMessages } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

export const runtime = "edge";

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: anthropic("claude-sonnet-4-6-20250929"),
    messages: await convertToModelMessages(messages),
    maxTokens: 1024,
    onError: (error) => {
      console.error("[chat] stream error:", error);
    },
  });

  return result.toUIMessageStreamResponse();
}

Three things worth noting here.

runtime = "edge" puts the route on the V8 isolate runtime where streaming responses work without any special Node.js stream adapter. If you need the Node runtime, replace toUIMessageStreamResponse() with pipeTextStreamToResponse(res) and a writable HTTP response object.

The model ID pins to a dated snapshot. The alias claude-sonnet-4 rolls forward whenever Anthropic ships a patch. A pinned ID means your streaming format cannot change under you between deploys.

onError handles stream-level errors. The AI SDK swallows errors into the stream by default to prevent server crashes. If you want an error boundary in the client, emit a special chunk and handle it there rather than relying on an HTTP 500, which the stream transport will not surface cleanly.

What the SSE wire format looks like

Open Network tab on the request and you will see chunks like this:

f:{"messageId":"msg_01ABC..."}
0:"The "
0:"quick "
0:"brown fox"
e:{"finishReason":"stop","usage":{"promptTokens":24,"completionTokens":12}}
d:{"finishReason":"stop","usage":{"promptTokens":24,"completionTokens":12}}

The prefix before the colon is a type code. 0: is a text delta. f: is a message ID annotation. e: and d: are finish events. This is the Vercel AI SDK's data stream protocol, not raw SSE. The raw OpenAI SSE format uses data: {"choices":[...]} lines. The SDK normalizes both.

4. The frontend: parsing the stream manually

Before reaching for useChat, build the vanilla version. Knowing what happens at this layer means you can debug anything the abstraction hides.

// hooks/useManualStream.ts
import { useState, useRef } from "react";

export function useManualStream() {
  const [output, setOutput] = useState("");
  const [status, setStatus] = useState<"idle" | "streaming" | "done" | "error">("idle");
  const abortRef = useRef<AbortController | null>(null);

  async function send(userMessage: string) {
    abortRef.current?.abort();
    const controller = new AbortController();
    abortRef.current = controller;

    setOutput("");
    setStatus("streaming");

    try {
      const res = await fetch("/api/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
          messages: [{ role: "user", parts: [{ type: "text", text: userMessage }] }],
        }),
        signal: controller.signal,
      });

      if (!res.ok || !res.body) {
        throw new Error(`HTTP ${res.status}`);
      }

      const reader = res.body
        .pipeThrough(new TextDecoderStream())
        .getReader();

      let accumulated = "";

      while (true) {
        const { value, done } = await reader.read();
        if (done) break;

        // Each SSE chunk may contain multiple lines
        for (const line of value.split("\n")) {
          const trimmed = line.trim();
          // Text delta lines start with "0:"
          if (trimmed.startsWith("0:")) {
            try {
              // The value after "0:" is a JSON-encoded string
              const text = JSON.parse(trimmed.slice(2)) as string;
              accumulated += text;
              setOutput(accumulated);
            } catch {
              // Partial JSON guard: skip malformed chunks
            }
          }
        }
      }

      setStatus("done");
    } catch (err) {
      if ((err as Error).name === "AbortError") {
        setStatus("idle");
      } else {
        console.error("[stream] failed:", err);
        setStatus("error");
      }
    }
  }

  function cancel() {
    abortRef.current?.abort();
    setStatus("idle");
  }

  return { output, status, send, cancel };
}

A usage example in a component:

// app/page.tsx (vanilla version)
"use client";
import { useState } from "react";
import { useManualStream } from "@/hooks/useManualStream";

export default function ChatPage() {
  const [input, setInput] = useState("");
  const { output, status, send, cancel } = useManualStream();

  return (
    <main style={{ maxWidth: 680, margin: "0 auto", padding: "2rem" }}>
      <div style={{ minHeight: 200, marginBottom: "1rem", whiteSpace: "pre-wrap" }}>
        {output}
        {status === "streaming" && <span aria-live="polite"> ▋</span>}
      </div>
      <textarea
        value={input}
        onChange={(e) => setInput(e.target.value)}
        rows={3}
        style={{ width: "100%", marginBottom: "0.5rem" }}
      />
      <button onClick={() => send(input)} disabled={status === "streaming"}>
        Send
      </button>
      {status === "streaming" && (
        <button onClick={cancel} style={{ marginLeft: "0.5rem" }}>
          Stop
        </button>
      )}
    </main>
  );
}

No conversation history, no multi-turn context. The point is to see the ReadableStream + TextDecoderStream pipeline in isolation before you add state management.

5. The frontend: the useChat version

The Vercel AI SDK's useChat hook handles message history, status transitions, streaming, and cancellation. Install ai if you have not already:

npm install ai

// app/chat/page.tsx (useChat version)
"use client";
import { useChat } from "ai/react";
import { DefaultChatTransport } from "ai";

export default function ChatPage() {
  const { messages, sendMessage, status, stop } = useChat({
    transport: new DefaultChatTransport({ api: "/api/chat" }),
  });

  function handleSubmit(e: React.FormEvent<HTMLFormElement>) {
    e.preventDefault();
    const form = e.currentTarget;
    const text = new FormData(form).get("message") as string;
    if (!text.trim()) return;
    sendMessage({ text });
    form.reset();
  }

  return (
    <main style={{ maxWidth: 680, margin: "0 auto", padding: "2rem" }}>
      <ul style={{ listStyle: "none", padding: 0, minHeight: 300 }}>
        {messages.map((m) => (
          <li key={m.id} style={{ marginBottom: "1rem" }}>
            <strong>{m.role === "user" ? "You" : "AI"}:</strong>{" "}
            {m.parts
              .filter((p) => p.type === "text")
              .map((p, i) => <span key={i}>{p.type === "text" ? p.text : ""}</span>)}
          </li>
        ))}
      </ul>
      {status === "streaming" && (
        <button onClick={stop} style={{ marginBottom: "0.5rem" }}>
          Stop generating
        </button>
      )}
      <form onSubmit={handleSubmit} style={{ display: "flex", gap: "0.5rem" }}>
        <input name="message" style={{ flex: 1 }} autoComplete="off" />
        <button type="submit" disabled={status === "streaming" || status === "submitted"}>
          Send
        </button>
      </form>
    </main>
  );
}

The hook's status field cycles through "submitted" (request sent, waiting for first token), "streaming" (chunks arriving), "ready" (complete), and "error". That four-state machine replaces the manual tracking in the vanilla version.

messages is an array of UIMessage objects. Each message has a parts array rather than a flat content string. That structure supports tool call results, images, and file attachments alongside text in the same message. For text-only chat, filter parts to type === "text".

The stop() function calls AbortController.abort() on the underlying fetch internally. The stream closes, the model stops generating, and status returns to "ready".

6. Edge cases

Backpressure

streamText uses backpressure by design. The consumer drives token generation: the model produces a chunk only when the reader asks for the next one. In the manual version, the while (true) { reader.read() } loop applies backpressure automatically because the next read does not start until the previous chunk lands. If you pipe to two readers using tee(), the slower reader controls the pace and the faster reader buffers. For most chat UIs one reader is correct.

Cancellation

Both versions above cancel via AbortController. In the manual version, aborting the fetch closes the ReadableStream on the next read() call and throws an AbortError. Catch it and reset state. In the useChat version, stop() handles the same thing internally.

One gotcha: if you cancel on the client, the server-side streamText call continues running until the streaming HTTP response detects the closed connection and propagates the abort. On edge runtime this propagation is fast, under 100ms in practice. On Node runtime with some hosting providers, the server-side model call may run an extra second or two before it halts. Budget for that if you charge per token.

Partial JSON for tool calls

When you add tool calls to streamText, the wire format includes partial JSON chunks as the tool arguments stream in:

b:{"toolCallId":"call_01","toolName":"search","argsTextDelta":"{\"q"}
b:{"toolCallId":"call_01","toolName":"search","argsTextDelta":"uery\":\""}
b:{"toolCallId":"call_01","toolName":"search","argsTextDelta":"typescript\"}"}

The b: prefix marks a tool call delta. Do not try to parse the args until you receive the 9: tool result chunk that confirms the call finished. If you parse on each delta, your JSON parser will throw on every partial chunk except the last one. The useChat hook handles this accumulation internally. In the manual version, accumulate argsTextDelta into a buffer keyed by toolCallId, then parse only when you see the matching tool result.

The bottom line

Streaming is not a polish feature. For any LLM feature that generates more than two sentences, the choice between streaming and blocking decides the entire perceived quality of the interaction.

The manual ReadableStream path takes about 40 lines of TypeScript and gives you full control over chunk handling, cancellation, and partial JSON. The useChat path from the Vercel AI SDK cuts that to a dozen lines and adds multi-turn history, status management, and tool call accumulation for free.

Start with the manual version once so you understand what the wire looks like. Then move to useChat for anything you ship. The abstraction earns its overhead at the point where you need tool calls, file parts, or reconnect logic, and you will not want to rebuild those from scratch.

What is the first LLM feature in your app where you wished the response arrived token by token instead of all at once? Drop a comment with the use case.

GDS K S · thegdsks.com · follow on X @thegdsks

Streaming is not about speed. Give users something to read while the model finishes thinking.

Type-safe LLM outputs with Zod: stop guessing what the model returns.

GDS K S — Wed, 15 Jul 2026 03:34:13 +0000

Type-safe LLM outputs with Zod: stop guessing what the model returns.

I shipped a classifier to production in January. The prompt asked for JSON with a single category field. For three weeks it worked fine. Then the model started returning {"category":"bug","explanation":"this looks like a crash"} and the consumer threw a runtime error because it only expected one key. No schema change, no deploy. The model just decided to be helpful.

Zod plus a bit of discipline around the parse step closes that gap. This tutorial walks through defining schemas for LLM output shapes, using them with the Vercel AI SDK and the raw Anthropic SDK, and building a retry loop that handles the cases where the model still gets it wrong.

TL;DR

Step	What	Why
Define Zod schema	Describe the shape you want	Single source of truth for your types
Use generateText with Output.object	Vercel AI SDK path	Schema-enforced, provider-agnostic
Use tool use with tool_choice	Anthropic SDK path	Forces structured output without extra wrappers
Parse and retry on failure	ZodError catches drift	Recovers without crashing callers

1. The problem: free-form JSON is a contract nobody signed

Most LLM tutorials show JSON.parse(response) and call it a day. The problem is that the model never agreed to your schema. Ask it to return {"category": "bug"} and it might return:

{"category": "bug"} (correct)
{"Category": "Bug"} (wrong casing)
{"category": "bug", "confidence": 0.9} (extra field)
{"error": "I cannot classify this"} (helpful, but not your schema)
A markdown fence wrapping the JSON because the model felt polite

Without a parse step that actually validates the shape, every one of those paths silently corrupts downstream data.

The fix is three lines of Zod plus one .safeParse() call. Every technique in this article builds on that pattern, whether you use the Vercel AI SDK, the raw Anthropic SDK, or both.

import { z } from "zod";

const ClassifyResult = z.object({
  category: z.enum(["bug", "feature", "question"]),
});

type ClassifyResult = z.infer<typeof ClassifyResult>;

// At runtime:
const parsed = ClassifyResult.safeParse(JSON.parse(rawOutput));
if (!parsed.success) {
  // parsed.error is a ZodError with field-level detail
  console.error("Shape violation:", parsed.error.issues);
}

Install Zod 4 (currently the stable major):

npm install zod@^4.0.0

The core APIs (z.object, z.string, z.enum, z.discriminatedUnion, z.infer) all carry over from Zod 3. If you're already on Zod 3, the migration is largely additive for these use cases.

2. Defining schemas for LLM output shapes

LLM outputs tend to fall into three shapes: flat classifiers, richer extractors, and discriminated results where the model picks a branch. Zod handles all three.

Flat classifier

import { z } from "zod";

export const SentimentSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  score: z.number().min(-1).max(1),
});

export type Sentiment = z.infer<typeof SentimentSchema>;

Structured extractor

export const InvoiceSchema = z.object({
  vendor: z.string(),
  amount_usd: z.number().positive(),
  due_date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
  line_items: z.array(
    z.object({
      description: z.string(),
      quantity: z.number().int().positive(),
      unit_price: z.number().positive(),
    })
  ),
});

export type Invoice = z.infer<typeof InvoiceSchema>;

Discriminated union for multi-intent routing

This is the shape to reach for when you want the model to pick one of three or more distinct output paths rather than a flat enum.

export const RoutingResult = z.discriminatedUnion("intent", [
  z.object({
    intent: z.literal("search"),
    query: z.string(),
    filters: z.array(z.string()).optional(),
  }),
  z.object({
    intent: z.literal("create"),
    resource_type: z.string(),
    fields: z.record(z.string(), z.unknown()),
  }),
  z.object({
    intent: z.literal("clarify"),
    question: z.string(),
  }),
]);

export type RoutingResult = z.infer<typeof RoutingResult>;

The discriminated union is strict: if intent is "search", Zod knows to expect query, and a parse attempt with intent: "create" plus a query field fails cleanly.

3. Vercel AI SDK: generateText with Output.object

The Vercel AI SDK added schema-native structured output in recent versions. You pass a Zod schema through Output.object and the SDK handles the prompt scaffolding and parse step for you.

import { generateText, Output } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const SentimentSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  score: z.number().min(-1).max(1),
});

async function analyzeSentiment(text: string) {
  const { output } = await generateText({
    model: anthropic("claude-sonnet-4-5-20251022"),
    output: Output.object({ schema: SentimentSchema }),
    prompt: `Analyze the sentiment of this text: "${text}"`,
  });

  // output is typed as { sentiment: "positive" | "negative" | "neutral"; score: number }
  return output;
}

// Usage
const result = await analyzeSentiment("The deploy went sideways at 3am.");
console.log(result.sentiment); // TypeScript knows this is the enum
console.log(result.score);     // TypeScript knows this is a number

The return type flows from the schema without any casting. If the model returns something that does not match, the SDK throws before the result reaches your code.

For streaming partial objects as they arrive:

import { streamText, Output } from "ai";

const { partialOutputStream } = streamText({
  model: anthropic("claude-sonnet-4-5-20251022"),
  output: Output.object({ schema: InvoiceSchema }),
  prompt: "Extract the invoice details from this text: ...",
});

for await (const partial of partialOutputStream) {
  // partial is a Partial<Invoice> as fields arrive
  console.log(partial);
}

Add an onError callback for stream errors since they arrive in-band rather than as thrown exceptions.

4. Raw Anthropic SDK: tool use as structured output

If you use the Anthropic SDK directly and want schema-enforced output without the Vercel SDK wrapper, the reliable path is tool use with tool_choice forced to your schema tool.

The idea: define a "tool" whose input_schema describes the JSON shape you want. Force the model to call that tool with tool_choice: { type: "tool", name: "..." }. The model then returns a tool_use block with structured input instead of free text. You parse that input with Zod.

import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// 1. Define your Zod schema
const ClassifyResult = z.object({
  category: z.enum(["bug", "feature", "question"]),
  confidence: z.number().min(0).max(1),
  reasoning: z.string(),
});

type ClassifyResult = z.infer<typeof ClassifyResult>;

// 2. Mirror the schema in JSON Schema for the tool definition
const classifyTool = {
  name: "classify_ticket",
  description: "Classify a support ticket into exactly one category with a confidence score.",
  input_schema: {
    type: "object" as const,
    properties: {
      category: {
        type: "string",
        enum: ["bug", "feature", "question"],
        description: "The category that best fits the ticket.",
      },
      confidence: {
        type: "number",
        description: "Confidence score from 0 to 1.",
      },
      reasoning: {
        type: "string",
        description: "One sentence explaining the classification.",
      },
    },
    required: ["category", "confidence", "reasoning"],
  },
};

async function classifyTicket(text: string): Promise<ClassifyResult> {
  const response = await client.messages.create({
    model: "claude-haiku-4-5-20251001",
    max_tokens: 512,
    tools: [classifyTool],
    // Force the model to call this specific tool
    tool_choice: { type: "tool", name: "classify_ticket" },
    messages: [
      {
        role: "user",
        content: `Classify this support ticket: "${text}"`,
      },
    ],
  });

  // 3. Extract the tool_use block from the response
  const toolUseBlock = response.content.find((b) => b.type === "tool_use");
  if (!toolUseBlock || toolUseBlock.type !== "tool_use") {
    throw new Error("Model did not return a tool_use block");
  }

  // 4. Validate with Zod
  const parsed = ClassifyResult.safeParse(toolUseBlock.input);
  if (!parsed.success) {
    throw new Error(`Schema violation: ${JSON.stringify(parsed.error.issues)}`);
  }

  return parsed.data;
}

The tool_choice parameter with type: "tool" is the key detail. Without it, the model may choose to answer in plain text. With it, the response always comes back as a structured tool_use block.

One practical note: you define the schema twice here, once in Zod and once in JSON Schema. For small schemas that duplication is tolerable. For larger ones, look at zod-to-json-schema on npm to generate the input_schema from your Zod definition automatically.

5. Handling parse failures: retry and repair

Even with forced tool use and schema prompting, models occasionally return output that fails validation. Network hiccups, context window pressure, and edge-case inputs all produce unexpected shapes. Build the retry loop before you need it.

Simple retry

async function classifyWithRetry(
  text: string,
  maxAttempts = 3
): Promise<ClassifyResult> {
  let lastError: unknown;

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await classifyTicket(text);
    } catch (err) {
      lastError = err;
      console.warn(`[classify] attempt ${attempt} failed:`, err);

      if (attempt < maxAttempts) {
        // Small back-off: 500ms, 1000ms, ...
        await new Promise((r) => setTimeout(r, attempt * 500));
      }
    }
  }

  throw new Error(`classify failed after ${maxAttempts} attempts: ${lastError}`);
}

Schema repair: feed the error back

When the model returns a close-but-wrong shape, feeding the validation error back in a second call often works better than a blind retry. The model can see what it got wrong and correct it.

async function classifyWithRepair(text: string): Promise<ClassifyResult> {
  // First attempt
  const response = await client.messages.create({
    model: "claude-haiku-4-5-20251001",
    max_tokens: 512,
    tools: [classifyTool],
    tool_choice: { type: "tool", name: "classify_ticket" },
    messages: [{ role: "user", content: `Classify: "${text}"` }],
  });

  const toolBlock = response.content.find((b) => b.type === "tool_use");
  const rawInput = toolBlock?.type === "tool_use" ? toolBlock.input : null;
  const parsed = ClassifyResult.safeParse(rawInput);

  if (parsed.success) {
    return parsed.data;
  }

  // Repair attempt: show the model what it returned and what the schema expects
  const repairMessages = [
    { role: "user" as const, content: `Classify: "${text}"` },
    {
      role: "assistant" as const,
      content: response.content,
    },
    {
      role: "user" as const,
      content: `Your output did not match the schema. Validation errors: ${JSON.stringify(parsed.error.issues)}. Please call classify_ticket again with a valid response.`,
    },
  ];

  const repairResponse = await client.messages.create({
    model: "claude-haiku-4-5-20251001",
    max_tokens: 512,
    tools: [classifyTool],
    tool_choice: { type: "tool", name: "classify_ticket" },
    messages: repairMessages,
  });

  const repairBlock = repairResponse.content.find((b) => b.type === "tool_use");
  const repaired = ClassifyResult.safeParse(
    repairBlock?.type === "tool_use" ? repairBlock.input : null
  );

  if (!repaired.success) {
    throw new Error(`repair attempt failed: ${JSON.stringify(repaired.error.issues)}`);
  }

  return repaired.data;
}

The repair pattern costs one extra call but recovers most edge cases without manual intervention.

6. Two schemas you can copy-paste

Classifier (intent routing for chat)

import { z } from "zod";

export const IntentSchema = z.object({
  intent: z.enum(["search", "create", "delete", "status", "help"]),
  confidence: z.number().min(0).max(1),
  extracted_entity: z.string().optional(),
});

export type Intent = z.infer<typeof IntentSchema>;

// JSON Schema for Anthropic tool use
export const intentToolSchema = {
  type: "object" as const,
  properties: {
    intent: {
      type: "string",
      enum: ["search", "create", "delete", "status", "help"],
    },
    confidence: { type: "number" },
    extracted_entity: { type: "string" },
  },
  required: ["intent", "confidence"],
};

Extractor (pull structured data from unstructured text)

export const ContactExtractSchema = z.object({
  name: z.string(),
  email: z.string().email().optional(),
  phone: z.string().optional(),
  company: z.string().optional(),
  notes: z.string().optional(),
});

export type ContactExtract = z.infer<typeof ContactExtractSchema>;

// JSON Schema for Anthropic tool use
export const contactExtractToolSchema = {
  type: "object" as const,
  properties: {
    name: { type: "string", description: "Full name of the contact" },
    email: { type: "string", description: "Email address if present" },
    phone: { type: "string", description: "Phone number if present" },
    company: { type: "string", description: "Company or organization if mentioned" },
    notes: { type: "string", description: "Any other relevant details" },
  },
  required: ["name"],
};

Usage with the raw Anthropic SDK:

const extractTool = {
  name: "extract_contact",
  description: "Extract contact information from the provided text.",
  input_schema: contactExtractToolSchema,
};

const response = await client.messages.create({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 512,
  tools: [extractTool],
  tool_choice: { type: "tool", name: "extract_contact" },
  messages: [
    {
      role: "user",
      content: `Extract contact info from: "Hi, I'm Sarah Chen at Acme Corp. Reach me at sarah@acme.io"`,
    },
  ],
});

const block = response.content.find((b) => b.type === "tool_use");
const result = ContactExtractSchema.safeParse(
  block?.type === "tool_use" ? block.input : null
);

if (result.success) {
  console.log(result.data.name);  // "Sarah Chen"
  console.log(result.data.email); // "sarah@acme.io"
}

The bottom line

JSON.parse without validation is a time bomb. The model will eventually return a shape you did not expect, and the failure mode is silent data corruption, not a loud error you can catch immediately.

The fix costs two things: a Zod schema (which you should write anyway for TypeScript types) and a .safeParse() call instead of a raw JSON.parse. The Vercel AI SDK with Output.object handles the wiring for you. The raw Anthropic SDK with tool_choice forced to a specific tool gives you the same guarantee with one extra setup step.

The retry and repair patterns are insurance. In practice, with tool_choice forced, parse failures happen in under 1% of calls for well-formed schemas. The 1% still matters when you are running 50,000 classifications a day.

What shape does your most chaotic LLM output have right now? Drop it in the comments.

GDS K S · thegdsks.com · follow on X @thegdsks

A Zod schema is the contract the model never gets to break.

Building a production AI agent in TypeScript with Mastra: a 2026 step-by-step.

GDS K S — Mon, 13 Jul 2026 23:30:15 +0000

Building a production AI agent in TypeScript with Mastra: a 2026 step-by-step.

I spent an afternoon last month wiring up an AI agent in raw TypeScript using the Anthropic SDK directly. The code worked, but I owned every piece of it: the tool dispatch loop, the conversation history array, the retry logic. Around 400 lines before the agent did anything interesting.

Mastra cuts that to about 60. A TypeScript-first agent framework with 24k+ GitHub stars, an active release cadence (88 releases as of May 2026), and a model router that talks to 40+ providers through one API. This tutorial goes from zero to a running agent with a custom tool and persistent memory. All code in this article runs.

TL;DR

Step	What you build	Time
Install	Scaffolded project with Mastra wired in	5 min
Agent	An agent with a system prompt and a model	10 min
Tool	A custom tool the agent calls	15 min
Memory	Conversation history across sessions	10 min
Deploy	A running HTTP server	5 min

1. Where Mastra sits in the stack

Raw SDK calls give you full control but you write the orchestration layer yourself: the tool call loop, history management, error handling, retries. Frameworks like LangChain and LlamaIndex solve this but lean heavily Python-first; the TypeScript ports lag the Python versions.

Mastra starts from TypeScript. The primitives map directly to what TypeScript developers already know: classes, Zod schemas, async functions. No port-lag exists because the framework itself is the TypeScript version.

The trade-off is the same as any framework: you trade control for speed. For prototypes and most production agents, the trade is worth it. For cases where the framework's tool dispatch or memory behavior does not match your exact needs, you can always drop a layer and call the Vercel AI SDK directly, which Mastra wraps under the hood.

2. Project setup

Mastra's scaffolder creates everything you need:

npm create mastra@latest

The wizard asks for a project name, model provider, and whether you want the starter example files. Pick your provider (OpenAI, Anthropic, Google, or any of the 40+ supported). For this tutorial I'll use Anthropic.

After the wizard finishes:

cd my-agent
npm install

Open .env and add your key:

ANTHROPIC_API_KEY=sk-ant-...

The generated project structure looks like this:

src/
  mastra/
    index.ts          # Mastra instance
    agents/
      weather-agent.ts
    tools/
      weather-tool.ts

You can rename or replace the starter files. The root entry point src/mastra/index.ts registers your agents and tools with the Mastra runtime:

// src/mastra/index.ts
import { Mastra } from "@mastra/core";
import { supportAgent } from "./agents/support-agent";

export const mastra = new Mastra({
  agents: { supportAgent },
});

That is the full configuration. No YAML, no config files, no factory functions to memorize.

3. Defining your first agent

An agent in Mastra is an instance of the Agent class. You give it an id, a name, a model, and instructions. The instructions are the system prompt.

// src/mastra/agents/support-agent.ts
import { Agent } from "@mastra/core/agent";

export const supportAgent = new Agent({
  id: "support-agent",
  name: "Support Agent",
  model: "anthropic/claude-sonnet-4-6-20250929",
  instructions: `You are a support agent for a SaaS product.
Answer questions clearly. If you do not know the answer, say so.
When the user describes a bug, ask for their account ID and browser version before troubleshooting.
Keep responses under 150 words unless the user asks for detail.`,
});

The model field uses Mastra's router format: provider/model-id. This means swapping from Anthropic to OpenAI is a one-line change. Pin the model ID to a dated version, not an alias, so you own when it upgrades.

To call the agent from your application code:

import { mastra } from "./mastra";

const agent = mastra.getAgentById("support-agent");

const response = await agent.generate("My dashboard stopped loading after your last update.");
console.log(response.text);

generate returns a full response after the model finishes. If you want streaming for a chat UI, swap to stream:

const stream = await agent.stream("What does the onboarding flow look like?");
for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}

Both methods return { text, toolCalls, toolResults, usage }. The usage object gives you token counts per call, which you should log in production.

4. Adding a custom tool

Tools are how agents take action beyond text generation. A tool has an id, a description the model uses to decide when to call it, an inputSchema validated by Zod, and an execute function.

Here is a tool that looks up an account by ID from a hypothetical database:

// src/mastra/tools/account-lookup.ts
import { createTool } from "@mastra/core/tools";
import { z } from "zod";

export const accountLookupTool = createTool({
  id: "account-lookup",
  description: "Look up a customer account by their account ID. Returns plan, status, and creation date.",
  inputSchema: z.object({
    accountId: z.string().describe("The customer account ID, format ACC-XXXXXXXX"),
  }),
  execute: async ({ context }) => {
    const { accountId } = context;
    // Replace with your real DB call
    const account = await fetchAccountFromDb(accountId);
    if (!account) {
      return { found: false, accountId };
    }
    return {
      found: true,
      accountId,
      plan: account.plan,
      status: account.status,
      createdAt: account.createdAt,
    };
  },
});

async function fetchAccountFromDb(accountId: string) {
  // Stub — wire to your actual data layer
  if (accountId === "ACC-00000001") {
    return { plan: "pro", status: "active", createdAt: "2025-03-15" };
  }
  return null;
}

Attach the tool to the agent by adding it to the tools array:

// src/mastra/agents/support-agent.ts
import { Agent } from "@mastra/core/agent";
import { accountLookupTool } from "../tools/account-lookup";

export const supportAgent = new Agent({
  id: "support-agent",
  name: "Support Agent",
  model: "anthropic/claude-sonnet-4-6-20250929",
  instructions: `You are a support agent for a SaaS product.
When a user provides an account ID, use the account-lookup tool to retrieve their account details before answering.
Answer questions clearly. If you do not know the answer, say so.`,
  tools: {
    "account-lookup": accountLookupTool,
  },
});

The model now sees the tool in its context. When a user says "My account ACC-00000001 is showing the wrong plan", the agent calls account-lookup, gets the account record back, and includes it in its reasoning before responding. You see the tool call in response.toolCalls and the result in response.toolResults.

The Zod schema does two things: it tells the model what the tool expects (the description and field names show up in the model's tool spec), and it validates the model's output before your execute function runs. If the model sends a malformed accountId, Mastra rejects the call before it hits your code.

5. Adding memory

Without memory, every call to agent.generate starts from a blank slate. The agent does not know what the user said two messages ago. For a support agent this breaks multi-turn conversations immediately.

Mastra's @mastra/memory package adds three layers: message history (the last N turns), working memory (key facts extracted and stored between turns), and semantic recall (vector search over past conversations). For most agents, message history is enough to start.

Install the package and a storage backend:

npm install @mastra/memory @mastra/libsql

Wire memory into the agent:

// src/mastra/agents/support-agent.ts
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { LibSQLStore } from "@mastra/libsql";
import { accountLookupTool } from "../tools/account-lookup";

const memory = new Memory({
  storage: new LibSQLStore({
    url: process.env.DATABASE_URL ?? "file:./agent.db",
  }),
  options: {
    lastMessages: 20,
  },
});

export const supportAgent = new Agent({
  id: "support-agent",
  name: "Support Agent",
  model: "anthropic/claude-sonnet-4-6-20250929",
  instructions: `You are a support agent for a SaaS product.
When a user provides an account ID, use the account-lookup tool before answering.
Keep responses under 150 words unless the user asks for detail.`,
  tools: {
    "account-lookup": accountLookupTool,
  },
  memory,
});

Pass a resource and thread when you call the agent. resource identifies the user; thread isolates a specific conversation:

const response = await agent.generate(
  "What plan am I on?",
  {
    memory: {
      resource: "user-42",
      thread: "support-session-2026-05-25",
    },
  },
);

On the first turn the agent has no history. On the second turn it gets the last 20 messages from that resource and thread. The user can say "follow up from earlier" and the agent knows what earlier means.

LibSQL writes to a local SQLite file in development. For production, point DATABASE_URL at a Turso connection string and the same code writes to a distributed SQLite database with no other changes.

6. Running locally and deploying

Development server:

npx mastra dev

This starts a local server with a REST API over your agents and a browser-based studio on port 4111 where you can send messages to the agent, inspect tool calls, and read the memory state. The studio is useful enough that I use it even for non-UI agents.

For production you have two options.

The first is the Mastra cloud deploy, which packages your agents as a managed service:

npx mastra deploy

The second is self-hosting. Add @mastra/express and mount the Mastra server into your existing Express app:

npm install @mastra/express

// src/index.ts
import express from "express";
import { MastraServer } from "@mastra/express";
import { mastra } from "./mastra";

const app = express();
app.use(express.json());

const server = new MastraServer({ app, mastra });
await server.init();

app.listen(process.env.PORT ?? 3000);

MastraServer.init() registers the agent endpoints under /api. Your agents become callable over HTTP with no extra routing code. Deploy this to Railway, Fly.io, or any Node.js host that accepts a Dockerfile.

The bottom line

Mastra removes the orchestration scaffolding that accounts for the first 300 lines of every TypeScript agent project. The three hours I spent on tool dispatch and conversation history management with raw SDK calls collapse to about 15 minutes with Mastra.

The framework makes a bet on TypeScript as the language where agents actually ship to production, not just where they get prototyped. That bet reflects what I see in production codebases. Python still dominates training and research. TypeScript dominates the web services those agents plug into.

The main thing I'd add to the stack described above: log response.usage on every generate call so you can see what the agent actually costs per session in production. The number surprises most teams the first week.

What does your agent stack look like right now? Raw SDK, a framework, something in-house? Curious what the pressure points are.

GDS K S · thegdsks.com · follow on X @thegdsks

The orchestration layer is the part nobody talks about until they've rewritten it twice.

OpenAI Codex now finishes 85% of scoped tasks. Here is the /goal workflow that gets you there.

GDS K S — Sun, 14 Jun 2026 02:56:59 +0000

OpenAI Codex now finishes 85% of scoped tasks. Here is the /goal workflow that gets you there.

OpenAI has been circulating an 85 to 90 percent success rate for Codex on well-scoped maintenance work. That number comes from internal testing, not an independent benchmark. But the mechanics behind it are real, and they explain both why it works and when it falls apart.

The feature is /goal. It shipped in Codex CLI 0.128.0 and became generally available across the CLI, IDE extension, and Codex app in version 0.133.0 on May 21, 2026. The short version: you set a goal, Codex loops until it believes the goal is complete, and the only hard stops are an evaluation that says "done" or a token budget that runs dry.

Understanding why that loop succeeds or fails on any given task is the whole game.

TL;DR

Scenario	Outcome	Why
Fix a failing test with a known error message	High pass rate	Scope is tight, completion is verifiable
Add a typed interface to an existing module	High pass rate	Output shape is checkable
Refactor a cross-cutting concern across 12 files	Fails often	Ambiguous scope, no clear done signal
Redesign the data model	Fails always	No binary done-check possible
Update a dependency and fix breakage	Medium	Depends on how far the breakage spreads

1. What /goal does and why "persisted" matters

A standard Codex turn is stateless. You ask something, it runs, the session ends. /goal breaks that pattern.

When you set a goal, Codex injects two prompts at the end of every turn automatically: goals/continuation.md and goals/budget_limit.md. The first tells the model to check whether the goal is complete and decide whether to continue. The second tracks token consumption and stops the loop before it exceeds your budget. The loop runs forward until one of those two conditions triggers.

Before version 0.133.0, goals were session-scoped. When the CLI process died, the goal died. The 0.133.0 release backed goals with dedicated storage so they track progress across active turns, including across CLI restarts. That is the "persisted" part. The goal state survives a reboot.

Version 0.132.0 (May 19, 2026) added one important fix: goal continuations now stop at usage limits instead of spinning indefinitely. Before that fix, a goal with no clear completion signal would run until the process died or the account hit a rate limit.

The loop pattern OpenAI uses here is not novel. Practitioners call this the "Ralph loop": an agent that checks its own output and decides whether to keep going. Codex adds budget accounting and a persistence layer on top. The prompt injection runs automatically; you never write the continuation prompts yourself.

2. The shape of a task that hits 85%

Three properties push a task into the high success range.

The goal must have a binary success check. "Fix the failing tests in src/auth" works. "Improve the auth module" does not. The agent needs to run a verification step and get a yes or no result. Passing CI is yes or no. "Better code" is not.

The scope must stay tight. A goal that touches one module or one interface definition gives the agent a small search space. If the fix requires changes in five unrelated parts of the codebase, the agent will solve three of them and stall on the fourth with no way to know it stalled.

The success condition must be observable from within the session. Write a shell command that returns 0 on success and non-zero on failure, and the agent can self-check. Tests are the obvious example. Type checks work too. Lint rules work. "The PR passes review" does not, because the agent cannot run that check.

Tasks I have seen work well:

Write a missing test for a specific function, run it green
Add a TypeScript interface that satisfies an existing as cast
Bump a dependency version and fix the type errors that surface
Extract a repeated code block into a shared utility and update all call sites in one directory

Every one of those has a finish line the agent can reach and measure.

3. The shape of a task that fails

The failure modes split into two categories: scope creep and unprovable completion.

Scope creep happens when the agent fixes one thing and reveals another. You ask it to fix a failing integration test. It fixes the test by updating the mock. The mock now diverges from the real API. The agent has no instruction to check that, so it declares done. The CI passes locally and fails in staging two days later. The agent did exactly what you said. The goal was too narrow.

Unprovable completion happens when the agent cannot self-check. "Refactor this service to be more readable" gives the agent nothing to verify. The agent will make changes, decide the changes look reasonable, mark the goal complete, and stop. Whether the code reads better is a human judgment. The agent will produce something and stop confidently regardless.

Architectural changes fail almost every time. If the task requires deciding where a module boundary should sit, or which service owns a responsibility, the agent hits the ambiguity and either picks one arbitrarily or loops until budget. That is not a capability gap. The task is genuinely underdetermined. No amount of looping closes that.

The 85% number, whatever its exact measurement method, almost certainly applies to a curated set of maintenance tasks with clear success criteria. If you point /goal at open-ended design work, you are not in the 85%. You are in a different distribution entirely.

4. Setup and a sample /goal call

Install or update the Codex CLI:

npm install -g @openai/codex
codex --version
# 0.133.0 or later for persistent goals

Check that goals are active (on by default since 0.133.0, but worth confirming):

codex doctor
# look for: goals: enabled, storage: ok

Set a goal from the CLI:

codex goal set "All tests in src/payments pass with no TypeScript errors"

Start a session in the repo and let it run:

cd /your/repo
codex
# Codex picks up the active goal and begins the loop

Watch it loop:

codex goal status
# shows: active goal, turns completed, tokens used, last evaluation result

The agent runs npm test or your configured test command at the end of each turn, checks the output, and decides whether to continue. If it cannot find a test command, it looks for package.json scripts named test, typecheck, or lint in that order.

For a task with a tighter scope, you can inline the success command:

codex goal set "Fix TypeScript errors in src/api/routes.ts" \
  --verify "npx tsc --noEmit --project tsconfig.json"

The --verify flag tells Codex which command to use as the done-check instead of inferring it. Pass anything that exits 0 on success.

Cancel a goal that has stalled:

codex goal cancel

List past goals and their outcomes:

codex goal list --limit 10

5. Wiring /goal into CI for safety

The loop does not replace CI. Treat it as a way to get closer to green before CI runs. The agent's output goes through type check, lint, and tests before merging, same as any other code.

A GitHub Actions job that verifies Codex-generated changes:

name: verify-codex-output

on:
  pull_request:
    branches: [main]

jobs:
  type-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - name: Install
        run: npm ci

      - name: Type check
        run: npx tsc --noEmit

      - name: Lint
        run: npx eslint src --max-warnings 0

      - name: Test
        run: npm test -- --coverage --passWithNoTests

  detect-scope-creep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Count changed files
        run: |
          CHANGED=$(git diff --name-only origin/main...HEAD | wc -l)
          echo "Changed files: $CHANGED"
          if [ "$CHANGED" -gt 20 ]; then
            echo "::warning::PR changes $CHANGED files. Review for unintended scope creep."
          fi

The scope-creep check is the one I added specifically for agent-authored PRs. If Codex touches more than 20 files on what should be a five-file task, someone needs to read what happened. The warning does not block the PR; it flags it for a slower review.

The important CI rule: never relax your existing quality gates for agent-generated code. If anything, add the file-count check. An agent that cannot measure its own scope will not stop itself from editing 40 files to fix a one-line bug.

Pre-commit hooks are the other layer. Add a quick type check before the commit even reaches CI:

# .pre-commit-config.yaml (if using pre-commit)
repos:
  - repo: local
    hooks:
      - id: tsc
        name: TypeScript check
        entry: npx tsc --noEmit
        language: system
        pass_filenames: false

Or wire it directly in package.json using husky:

{
  "scripts": {
    "prepare": "husky install"
  }
}

# .husky/pre-commit
npm run typecheck

Now every commit the agent makes, whether from a /goal loop or a single turn, goes through the type check locally before it can push.

The bottom line

The /goal loop works on tasks where "done" has a binary answer the agent can check itself. Write that verify command before you set the goal. If you cannot write that command, the task needs more scoping before you hand it to the agent.

The 85% figure covers curated maintenance tasks. You cannot carry that rate over to any task you hand the tool. Architectural decisions, ambiguous refactors, and cross-cutting changes will not approach that number regardless of turn count.

The persistence layer that shipped in 0.133.0 is the real unlock. A goal that survives a CLI restart means you can set a task running, close the terminal, and come back to a result rather than a dead session. That changes the workflow from "supervised agent" to something closer to a slow async job. Wire it into CI, cap the budget, and treat the output like any other unreviewed PR.

What is the first maintenance task in your backlog that has a clear test-based done condition? That is the one to try /goal on first.

GDS K S · thegdsks.com · follow on X @thegdsks

Set the verify command before the goal. If you cannot write it, the scope is not ready.

Building a production TypeScript CLI in 2026: oclif vs commander vs custom.

GDS K S — Tue, 09 Jun 2026 06:57:30 +0000

Building a production TypeScript CLI in 2026: oclif vs commander vs custom.

I shipped my first Node CLI in 2019 with a 12-line arg slicer and process.argv. It worked until it needed a second command and then collapsed into spaghetti. The other extreme is grabbing a full framework for a tool that runs one command. In 2026 there are three reasonable paths between those extremes, and each one wins on a specific slice of the problem.

This post covers @oclif/core v4, commander v14, and a zero-dependency parser that fits in 30 lines. Same "greet" command in all three. Same distribution steps at the end. Honest tradeoffs throughout.

TL;DR

	oclif v4	commander v14	zero-dep
npm install size	~8 MB	~220 kB	0 B
Type inference on flags	Full, generated	Good, manual	Manual
Plugin ecosystem	Yes (Heroku, Salesforce)	No	No
Learning curve	High (day 1)	Low (hour 1)	None
Best for	Multi-team, multi-command CLIs	Most real-world tools	One-shot scripts

1. The decision: framework vs no framework

Reach for a framework when the tool needs subcommands, a plugin system, or auto-generated help text. The second engineer who touches the CLI should be able to find where things live without reading your code twice.

Build your own when the tool does one thing, ships as a one-file script, or lives inside a monorepo where pulling in 8 MB of transitive deps is not welcome. A zero-dep parser also removes the surface area for supply-chain incidents, a real concern on tools that run in CI.

Commander sits in the middle: a 220 kB install that covers most real tools without the scaffolding overhead of oclif.

2. Project skeleton

Every path shares the same bin setup. Start with a package.json that declares the executable:

{
  "name": "greet-cli",
  "version": "1.0.0",
  "bin": {
    "greet": "./dist/cli.js"
  },
  "scripts": {
    "build": "tsc",
    "dev": "tsx src/cli.ts"
  },
  "type": "module"
}

The tsconfig.json for a CLI targets the Node release line you plan to support. Node 24 LTS handles ESM natively, so use "module": "NodeNext" and "moduleResolution": "NodeNext":

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "outDir": "dist",
    "strict": true,
    "declaration": true
  },
  "include": ["src"]
}

The entry file needs a shebang on line one and must be executable after build:

#!/usr/bin/env node
// src/cli.ts

After tsc, run chmod +x dist/cli.js once. In a proper CI pipeline, add that to the build script. npm link during development installs the greet binary into your PATH so you can test it as a real command.

3. The greet command, three ways

oclif v4

Scaffold with npx oclif generate greet-cli, then replace the generated command:

// src/commands/greet.ts
import { Args, Command, Flags } from "@oclif/core";

export default class Greet extends Command {
  static override description = "Print a greeting";

  static override args = {
    name: Args.string({ description: "Name to greet", required: true }),
  };

  static override flags = {
    loud: Flags.boolean({ char: "l", description: "Uppercase the output" }),
    times: Flags.integer({ char: "t", description: "Repeat N times", default: 1 }),
  };

  async run(): Promise<void> {
    const { args, flags } = await this.parse(Greet);
    const message = `Hello, ${args.name}!`;
    for (let i = 0; i < flags.times; i++) {
      this.log(flags.loud ? message.toUpperCase() : message);
    }
  }
}

Run it with ./bin/run.js greet Alice --loud --times 3. Help text generates automatically from the static properties. TypeScript infers the types on flags.times as number and flags.loud as boolean without any manual annotation.

The this.log and this.error methods route through oclif's output system, which makes testing easier: oclif provides a runCommand test helper that captures stdout without mocking console.

commander v14

Install: npm install commander. No generator needed.

#!/usr/bin/env node
// src/cli.ts
import { Command } from "commander";

const program = new Command();

program
  .name("greet")
  .description("Print a greeting")
  .version("1.0.0");

program
  .command("greet <name>")
  .description("Greet someone by name")
  .option("-l, --loud", "Uppercase the output")
  .option("-t, --times <n>", "Repeat N times", "1")
  .action((name: string, opts: { loud?: boolean; times: string }) => {
    const times = parseInt(opts.times, 10);
    const message = `Hello, ${name}!`;
    for (let i = 0; i < times; i++) {
      console.log(opts.loud ? message.toUpperCase() : message);
    }
  });

program.parse();

The string-to-number conversion on opts.times is manual. Commander parses all option values as strings unless you supply a custom parser function. That is the primary friction point for TypeScript users: you get good autocomplete on the option names but the values carry a weaker type until you cast or coerce them.

Commander v14 added .argument() as a chainable first-class citizen, which reads cleaner than embedding arguments in the command string for complex cases. The core API has been stable since v8, so the learning investment carries forward.

Zero-dependency, 30 lines

No install. No generator. Drop this into src/cli.ts:

#!/usr/bin/env node

type ParsedArgs = {
  positional: string[];
  flags: Record<string, string | boolean>;
};

function parseArgs(argv: string[]): ParsedArgs {
  const positional: string[] = [];
  const flags: Record<string, string | boolean> = {};
  let i = 0;
  while (i < argv.length) {
    const arg = argv[i];
    if (arg.startsWith("--")) {
      const key = arg.slice(2);
      const next = argv[i + 1];
      if (next && !next.startsWith("-")) {
        flags[key] = next;
        i += 2;
      } else {
        flags[key] = true;
        i += 1;
      }
    } else if (arg.startsWith("-") && arg.length === 2) {
      flags[arg.slice(1)] = true;
      i += 1;
    } else {
      positional.push(arg);
      i += 1;
    }
  }
  return { positional, flags };
}

const { positional, flags } = parseArgs(process.argv.slice(2));
const [command, name] = positional;

if (command === "greet" && name) {
  const times = flags.times ? parseInt(flags.times as string, 10) : 1;
  const msg = `Hello, ${name}!`;
  for (let i = 0; i < times; i++) {
    console.log(flags.loud ? msg.toUpperCase() : msg);
  }
} else {
  console.log("Usage: greet greet <name> [--loud] [--times <n>]");
  process.exit(1);
}

This handles --loud, --times 3, and positional args. It does not handle --times=3, short-form chaining (-lt), or negated flags (--no-loud). Add those if you need them. Each addition is about 5 lines and you understand every byte.

4. Subcommands, flags, and where each path struggles

Subcommands are where the paths diverge most sharply.

In oclif, each subcommand is a file in src/commands/. A file at src/commands/user/create.ts maps to mycli user create. The directory structure is the routing table. That pattern scales to 30 commands because you can grep for a file name.

In commander, subcommands chain off the root program:

const userCmd = program.command("user");
userCmd.command("create <email>").action((email) => { /* ... */ });
userCmd.command("delete <id>").action((id) => { /* ... */ });

That works well up to around 10 subcommands in a single file. Past that, split into separate files and import each group, then register them. Commander does not enforce any file layout, so naming conventions matter more.

The zero-dep path requires a manual dispatch table. A switch on command covers five subcommands cleanly. Beyond five, the file grows fast and the argument parsing for each command needs its own handling. That is the natural ceiling where migrating to commander or oclif starts paying off.

Prompts (interactive input like password fields or selection lists) sit outside all three. None of them bundle an interactive prompt library. The standard pairing is inquirer for oclif and commander, or Node's built-in readline interface for the zero-dep path.

5. Distribution via npm

Publishing a CLI to npm follows the same steps regardless of which framework you chose.

{
  "name": "@yourscope/greet-cli",
  "version": "1.0.0",
  "bin": { "greet": "./dist/cli.js" },
  "files": ["dist"],
  "engines": { "node": ">=20" }
}

The files array keeps the published tarball small: only dist/ ships, not src/, test files, or dev configs. The engines field documents the Node floor and causes npm install to warn on older versions.

Build and publish:

npm run build
chmod +x dist/cli.js
npm publish --access public

For scoped packages (@yourscope/...), first publish needs --access public. Later publishes omit it.

Users install and run with:

npm install -g @yourscope/greet-cli
greet greet Alice --loud

Or without a global install via npx:

npx @yourscope/greet-cli greet Alice --loud

npx-only distribution is the right default for one-off tools. It avoids polluting the user's global PATH and always runs the version you specify. For tools a developer runs dozens of times a day, a global install still wins on startup time because npx runs a resolution step on every invocation.

If you are distributing a tool that should work offline or in air-gapped environments, vendor the dependencies into the published tarball with bundleDependencies in package.json. Oclif's generated scaffold includes this by default. Commander and zero-dep need it added manually.

6. Comparison

	oclif v4	commander v14	zero-dep
Unpacked install size	~8 MB	~220 kB	0
TypeScript flag types	Inferred, no casting	Manual coercion for numbers	Manual
Auto-generated help	Yes, rich	Yes, basic	You write it
Subcommand routing	File-based (scales)	Code-based (works to ~10)	Switch statement
Plugin system	Yes	No	No
Interactive prompts	Requires inquirer	Requires inquirer	readline built-in
Used by	Heroku CLI, Salesforce CLI	Dozens of open source tools	Scripts, one-off tools
Breaking change cadence	Moderate (major versions)	Low (stable API since v8)	None

The bundle size difference matters when the CLI runs inside a Docker image on a tight layer budget, or when install time in CI is a bottleneck. A full oclif project with its generator output and Heroku plugin dependencies can exceed 50 MB unpacked when counting transitive deps. Commander stays well under 1 MB including your own code.

The type inference gap matters when the team touches the CLI infrequently. With oclif, a new contributor gets full TypeScript hints on every flag value and hits a type error immediately when passing a string where a number belongs. With commander, the coercion is a runtime concern that TypeScript cannot see through without a cast.

The bottom line

Use oclif if you are building a CLI that a team of engineers will extend over time, already have the Heroku or Salesforce ecosystem in mind, or need a plugin architecture. The day-one overhead is real, and the generated scaffold is dense, but the structure pays off past the third command.

Use commander if you are building a real tool with 3 to 15 subcommands, want TypeScript without the framework overhead, and are comfortable writing a thin coercion layer for numeric options. It covers most real-world cases and the API has been stable long enough that StackOverflow has an answer for every edge case.

Build zero-dep if the tool does one thing, ships in a monorepo where dep hygiene is strict, or you want to understand exactly what runs in production. The ceiling is around five commands before the code fights you.

Node 24 LTS (v24.16.0) ships native ESM, native fetch, and a built-in test runner, which removes three common reasons to reach for dependencies in the first place. Whatever path you pick, the toolchain in 2026 is cleaner than 2022 by a wide margin.

What is the CLI in your current project running on? A raw process.argv slicer past the 100-line mark signals the time to pick a framework.

GDS K S · thegdsks.com · follow on X @thegdsks

The right CLI framework is the one that fits the command count, not the one with the best marketing page.

RAG with Postgres pgvector in 2026: the full TypeScript pipeline.

GDS K S — Mon, 08 Jun 2026 08:24:44 +0000

RAG with Postgres pgvector in 2026: the full TypeScript pipeline.

I spent a week evaluating dedicated vector databases before deciding to just use the Postgres instance I already had. The pgvector extension handles similarity search well enough for most production workloads, and it collapses three infrastructure components into one. This walkthrough covers everything from schema to answer: chunk your docs, embed them, store in pgvector, retrieve by cosine similarity, and wire the results into an LLM call.

TL;DR

Step	Tool	Why
Enable vector store	`pgvector` 0.8.x, HNSW index	Runs in your existing Postgres, no extra infra
Embed	`text-embedding-3-small` (1,536 dims)	$0.02 per million tokens, fast
Query	`<=>` cosine distance, top-k	Works with both OpenAI and Voyage models
Augment	Claude or GPT-4o with retrieved docs	Context window stuffed, hallucination rate drops

1. Why pgvector instead of a dedicated vector database

Pinecone and Weaviate are good products. If you need multi-tenant isolation, sub-millisecond p99 at 100M+ vectors, or native hybrid search with BM25, they earn their place. For most teams, those are future problems.

The cost calculus changes when you consider ops burden. A dedicated vector DB means a new billing line, a new set of credentials to rotate, a new failure mode to track, and a new SDK to keep current in your application. pgvector runs as a Postgres extension: one connection string, one backup strategy, one source of truth. At 10M documents with 1,536-dimensional embeddings, an HNSW index on a reasonably sized Postgres instance returns top-10 results in under 10ms. That covers the overwhelming share of RAG use cases.

pgvector 0.8.0 added iterative HNSW scans. That release made filtered similarity search practical without falling back to sequential scans every time a WHERE clause got specific. The 0.8.0 release was what tipped my team from "maybe later" to "ship it."

2. Schema setup

Enable the extension once per database, then create your table.

-- enable pgvector (run once per database)
CREATE EXTENSION IF NOT EXISTS vector;

-- documents table
CREATE TABLE documents (
  id         BIGSERIAL PRIMARY KEY,
  source     TEXT NOT NULL,          -- filename, URL, or ID of source doc
  chunk_idx  INT NOT NULL,           -- chunk number within the source
  content    TEXT NOT NULL,          -- raw text of the chunk
  embedding  vector(1536) NOT NULL,  -- OpenAI text-embedding-3-small
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Choosing between HNSW and IVFFlat

HNSW builds a navigable small-world graph. Queries scan the graph instead of comparing all rows. Build once, query immediately. The tradeoff is that the index takes more memory: roughly 8 bytes per dimension per row for a 1,536-dim column at default settings.

IVFFlat partitions the embedding space into centroid clusters. Faster to build, smaller memory footprint, but you must load rows before building the index or the centroid assignment is useless. If you are starting from zero rows, build HNSW.

-- HNSW index (recommended default)
-- m = connections per layer (default 16), higher = better recall at higher memory cost
-- ef_construction = candidate list during build (default 64), higher = better recall at slower build
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- IVFFlat alternative (only after loading rows)
-- lists = sqrt(row_count) is a good starting point for large tables
-- CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

Use vector_cosine_ops with the <=> operator when your embedding model normalizes vectors (OpenAI and Voyage both do). Use vector_l2_ops with <-> for raw Euclidean distance when vectors are not normalized. Use vector_ip_ops with <#> for inner product, which equals cosine similarity on normalized vectors and saves one normalization step.

3. Ingest pipeline in TypeScript

The ingest function chunks a document, calls the embedding API, and bulk inserts rows. Use postgres (the npm package, not pg) for its tagged-template SQL and native array support.

import postgres from "postgres";
import OpenAI from "openai";

const sql = postgres(process.env.DATABASE_URL!);
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

const CHUNK_SIZE = 512;   // tokens, not characters
const CHUNK_OVERLAP = 64; // tokens of overlap between adjacent chunks

function chunkText(text: string, size: number, overlap: number): string[] {
  // naive word-boundary chunker — swap for tiktoken in production
  const words = text.split(/\s+/);
  const chunks: string[] = [];
  let start = 0;
  while (start < words.length) {
    const end = Math.min(start + size, words.length);
    chunks.push(words.slice(start, end).join(" "));
    start += size - overlap;
  }
  return chunks;
}

async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts,
  });
  return response.data.map((d) => d.embedding);
}

export async function ingestDocument(source: string, text: string): Promise<void> {
  const chunks = chunkText(text, CHUNK_SIZE, CHUNK_OVERLAP);

  // embed in batches of 100 (OpenAI max batch size)
  const BATCH = 100;
  for (let i = 0; i < chunks.length; i += BATCH) {
    const batch = chunks.slice(i, i + BATCH);
    const embeddings = await embedBatch(batch);

    const rows = batch.map((content, j) => ({
      source,
      chunk_idx: i + j,
      content,
      embedding: JSON.stringify(embeddings[j]),
    }));

    await sql`
      INSERT INTO documents (source, chunk_idx, content, embedding)
      SELECT
        r.source,
        r.chunk_idx::int,
        r.content,
        r.embedding::vector
      FROM jsonb_to_recordset(${JSON.stringify(rows)}::jsonb)
        AS r(source text, chunk_idx text, content text, embedding text)
    `;
  }

  console.log(`[ingest] ${source}: ${chunks.length} chunks stored`);
}

A note on chunk size: 512 words is a starting point. The right size depends on your source material. Legal documents with dense paragraphs do better at 256 words. Code files need at least 300 lines or you lose function context. The overlap prevents the embedding from missing a sentence that straddles a chunk boundary.

4. Query pipeline in TypeScript

Embed the user's question, run a top-k cosine similarity search, return the matching chunks.

export async function queryDocuments(
  question: string,
  topK = 5,
): Promise<Array<{ source: string; content: string; distance: number }>> {
  // embed the question with the same model used at ingest time
  const [embedding] = await embedBatch([question]);
  const embeddingStr = JSON.stringify(embedding);

  const rows = await sql<{ source: string; content: string; distance: number }[]>`
    SELECT
      source,
      content,
      (embedding <=> ${embeddingStr}::vector) AS distance
    FROM documents
    ORDER BY embedding <=> ${embeddingStr}::vector
    LIMIT ${topK}
  `;

  return rows;
}

The <=> operator returns cosine distance (0 = identical, 2 = opposite). Lower numbers win. If you add metadata filters, add them in the WHERE clause before ORDER BY so the planner can use the HNSW iterative scan introduced in 0.8.0.

// filtered query example — same model must have returned results for this source
const rows = await sql<{ source: string; content: string; distance: number }[]>`
  SELECT source, content, (embedding <=> ${embeddingStr}::vector) AS distance
  FROM documents
  WHERE source = ${filterSource}
  ORDER BY embedding <=> ${embeddingStr}::vector
  LIMIT ${topK}
`;

5. Wiring retrieved docs into an LLM call

Concatenate the retrieved chunks into a context block, then call your model of choice. Claude 3.5 Sonnet or GPT-4o both handle long contexts well. Keep the context block under 80,000 tokens for cost reasons.

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });

export async function answerWithRAG(question: string): Promise<string> {
  const docs = await queryDocuments(question, 5);

  if (docs.length === 0) {
    return "No relevant documents found.";
  }

  const context = docs
    .map((d, i) => `[${i + 1}] (${d.source})\n${d.content}`)
    .join("\n\n---\n\n");

  const prompt = `You are a helpful assistant. Answer the question using only the provided context.
If the context does not contain the answer, say so.

Context:
${context}

Question: ${question}`;

  const response = await anthropic.messages.create({
    model: "claude-sonnet-4-6-20250929",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
  });

  const block = response.content[0];
  return block.type === "text" ? block.text : "";
}

The "answer using only the provided context" instruction is load-bearing. Without it, the model mixes retrieval with parametric memory and you cannot tell which is which. If the answer comes from the context, citations work. If it comes from training data, they do not. Force the distinction at the prompt level.

One more thing worth noting: rerank before you send to the LLM. A fast cosine search returns the 5 closest chunks by vector distance, but distance does not always equal usefulness. A cross-encoder reranker (Cohere Rerank costs about $1 per 1,000 queries) takes your top-20 candidates and scores them for actual relevance before you trim to 5. The quality jump is noticeable. Skip the reranker while prototyping, add it before you hit production.

6. Two gotchas that bite everyone

Chunk size drives recall more than index parameters

Most teams spend hours tuning HNSW m and ef_construction and see marginal gains. The actual lever is chunk size and overlap. A chunk that is too short loses context (the model cannot answer a cross-sentence question). A chunk that is too long pulls in noise, dilutes the embedding, and wastes context window in the LLM call. Run a quick eval: take 20 representative questions, retrieve top-5, then manually score whether the answer appeared in the returned chunks. Adjust chunk size in 100-word steps until recall tops 85%. Then tune the index.

Build the index after bulk loading, not before

HNSW indexing at insert time is slow. If you load 500,000 documents and the HNSW index exists, every INSERT pays the graph update cost. The fast path: load all rows with the index dropped, then build it once with CREATE INDEX. On a table of 500,000 rows with 1,536-dim embeddings, a cold HNSW build takes roughly 8 to 12 minutes on 4 vCPUs. That is far cheaper than the cumulative insert overhead.

-- drop the index before bulk load
DROP INDEX IF EXISTS documents_embedding_idx;

-- ... run your ingest pipeline ...

-- rebuild once after load
CREATE INDEX documents_embedding_idx
  ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The bottom line

The full pipeline is about 120 lines of TypeScript and three SQL statements. pgvector 0.8.x is stable enough for production, HNSW is the right default index for most teams, and the two things that matter most for answer quality are chunk size and staying consistent between embed-at-ingest and embed-at-query time (same model, same preprocessing). Dedicated vector DBs are not wrong, they are just a layer you do not need until your row count passes 50M or your recall requirements get strict enough to warrant a tuning team.

What chunk size worked best for your use case? Drop it in the comments.

GDS K S · thegdsks.com · follow on X @thegdsks

Good retrieval beats a better model every time.

TanStack shipped a postmortem for the 42-package npm compromise. Here is what every project should change this week.

GDS K S — Fri, 29 May 2026 02:33:59 +0000

TanStack shipped a postmortem for the 42-package npm compromise. Here is what every project should change this week.

On May 11, 2026, between 19:20 and 19:26 UTC, an attacker published 84 malicious versions across 42 packages in the @tanstack scope. The attacker did not steal a maintainer's npm credentials. They hijacked the build pipeline itself, and the packages they shipped carried valid SLSA provenance attestations. That last part changes something important about how the ecosystem thinks about supply chain trust.

TanStack published a full postmortem. This piece walks through the attack chain, explains what made this incident novel, and gives you a concrete checklist for your own project.

TL;DR

What	Detail
Date	May 11, 2026, 19:20 to 19:26 UTC
Scope	42 @tanstack packages, 84 malicious versions
Worm reach	170+ packages total after self-propagation
Detection	External researcher flagged it within 6 minutes
Full deprecation	~1 hour 43 minutes after first publish
Advisory	GHSA-g7cv-rxg3-hmpx
Novel claim	First documented malicious npm package carrying valid SLSA provenance

1. What happened and when

The attacker, operating under accounts zblgg and voicproducoes, targeted the TanStack Router/Start monorepo. The Query, Table, Form, Virtual, Store, and AI packages were not affected. Only the Router/Start monorepo contained the vulnerable workflow configuration.

At 19:20 UTC the first malicious versions landed. By 19:26 the full 84-version batch hit the registry. An external researcher named ashishkurmi from StepSecurity spotted the anomaly, an unusual optionalDependencies entry pointing to a GitHub fork, within minutes. No internal alerting triggered on TanStack's side.

TanStack deprecated the malicious versions 1 hour 43 minutes after the first publish. npm pulled the tarballs from 22:13 to 23:55 UTC, a 4.5-hour window after the initial compromise.

The payload was a 2.3 MB obfuscated file named router_init.js. It harvested credentials (GitHub tokens, AWS keys, Vault tokens, Kubernetes service accounts, SSH keys, GCP credentials), exfiltrated them over the Session/Oxen P2P messenger network, and then used any stolen publish-capable tokens to republish itself to every other package the victim could write to. It also installed persistence mechanisms in .claude/settings.json hooks, VS Code task injection, and a systemd monitoring service. If the stolen GitHub token was later revoked, the payload wiped the home directory.

Secondary victims included @mistralai/mistralai, 40-plus @uipath packages, and 19 packages in aviation-related namespaces. Wiz attributes the campaign, named "Mini Shai-Hulud" internally, to a threat group called TeamPCP, linked to prior SAP, Checkmarx, and Trivy compromises.

2. The three-primitive attack chain

Most supply chain coverage stops at "compromised package." The TanStack incident is worth studying in detail because the attacker chained three distinct primitives to get from zero access to a signed publish on a major open-source project.

Primitive 1: The Pwn Request

A "Pwn Request" is a specific GitHub Actions anti-pattern. When a workflow uses pull_request_target as its trigger, it runs in the context of the base repository rather than the fork. That means it has access to base repository secrets. The intent of pull_request_target is to let maintainers do things like post comments on pull requests from forks without exposing write tokens to fork code.

The problem: if the workflow also checks out the pull request's code and executes it, you get fork code running with base repository privileges. TanStack's bundle-size.yml workflow had this pattern.

The attacker opened a PR from a fork. The workflow executed the fork's code with base repo context.

Primitive 2: Cache poisoning across trust boundaries

The malicious fork code poisoned the pnpm package store cache. It wrote a 1.1 GB cache entry under the exact key that the legitimate release.yml workflow would later restore.

This is the trust-boundary crossing. The bundle-size workflow (lower trust, triggered by PRs) and the release workflow (higher trust, triggered by maintainer merges) shared a cache key namespace. The attacker wrote to cache from the low-trust context. The high-trust context read from it without re-validating.

The poisoned cache entry sat undetected for eight hours before the release workflow pulled it.

Primitive 3: OIDC token extraction from runner memory

Here is the part that bypasses npm credential protections entirely.

GitHub Actions supports OIDC-based publishing. Instead of storing a long-lived npm token in your repository secrets, your workflow requests a short-lived OIDC token from GitHub at publish time. npm's trusted publisher feature accepts this token. The design assumes that only the intended workflow step can request and use that token.

The attacker's payload included binaries that read /proc/<pid>/mem on the GitHub Actions runner. Processes in the runner environment, including the GitHub Actions agent, hold the OIDC token in memory while the job runs. The attacker extracted that token directly from memory and used it to authenticate npm publishes, bypassing the actual publish step in the release workflow.

This is why the packages carried valid SLSA provenance attestations. The attestation records that the package shipped from the expected repository and workflow. From Sigstore's perspective, that was true. The attacker did not forge the attestation. They hijacked the pipeline mid-run and minted legitimate credentials within it.

3. Why valid SLSA provenance on a malicious package matters

SLSA (Supply chain Levels for Software Artifacts) provenance is one of the main signals the npm ecosystem has been building toward for trusted package distribution. The idea: a package with SLSA provenance attestation proves it came from a specific source commit in a specific workflow. Consumers can verify this cryptographically.

The TanStack incident stands as the first documented case of a malicious npm package carrying SLSA provenance that the attacker did not forge. Sigstore verified the build correctly. The provenance was real. The code running through the pipeline was not safe.

SLSA provenance answers the question "did this package build how the maintainer intended?" It does not answer "did the build pipeline run clean before the build started?" Those are different questions, and the ecosystem has largely treated them as the same question.

This does not make SLSA provenance worthless. A package with no provenance is less trustworthy than one with provenance. But it does mean provenance is a necessary condition, not a complete one. The signal has a new attack surface.

What a cleaner version of SLSA provenance would need: a way to attest that the cache state restored before the build arrived clean, that no cross-context cache sharing occurred, and that OIDC token issuance covered only a specific workflow step rather than any code running in the job.

4. Lockdown checklist for your project this week

Run through this before your next release.

Audit your package-lock for affected versions

# Check for any @tanstack packages from May 11 UTC
npm audit
npx better-npm-audit audit

# List all @tanstack versions currently installed
npm ls --depth=0 | grep tanstack

# Verify against the advisory
# Affected: @tanstack/* versions published 2026-05-11 between 19:20-23:55 UTC
# Safe: any version before May 11 or after npm confirmed tarball removal

If you pulled a new install or ran CI between May 11 19:20 UTC and May 11 23:55 UTC, treat your build environment as potentially compromised. Rotate any credentials that were present in that environment.

Harden your GitHub Actions workflows

The Pwn Request pattern is the root primitive. Audit every workflow file for pull_request_target triggers.

# DANGEROUS: pull_request_target that checks out and runs fork code
on:
  pull_request_target:
    types: [opened, synchronize]

jobs:
  build:
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}  # THIS IS THE PROBLEM
      - run: npm ci && npm run build  # fork code running with base repo context

# SAFER: split into two workflows
# Workflow 1: runs on pull_request (fork context, no secrets)
on:
  pull_request:
jobs:
  build:
    steps:
      - uses: actions/checkout@v4  # checks out fork code, no secret access
      - run: npm ci && npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: pr-artifacts
          path: ./dist

# Workflow 2: runs on workflow_run (base context, has secrets, reads artifacts not code)
on:
  workflow_run:
    workflows: ["Build PR"]
    types: [completed]
jobs:
  comment:
    steps:
      - uses: actions/download-artifact@v4  # reads build output, not fork code
        with:
          name: pr-artifacts

If you need pull_request_target for a legitimate reason (bot comments, label management), never check out PR code in that context. Keep it to read-only GitHub API calls.

Scope your OIDC token permissions

# Restrict permissions at the job level, not just the workflow level
jobs:
  publish:
    permissions:
      id-token: write    # only the publish job gets OIDC
      contents: read
    steps:
      - uses: actions/checkout@v4
      - run: npm publish --provenance

Do not grant id-token: write at the workflow level if only one job needs it. The narrower the scope, the shorter the window an extracted token stays useful.

Isolate your cache keys by trust level

# Separate cache keys for PR workflows vs release workflows
- uses: actions/cache@v4
  with:
    path: ~/.pnpm-store
    key: release-pnpm-${{ runner.os }}-${{ hashFiles('**/pnpm-lock.yaml') }}
    # Never share this key with pull_request_target workflows

Use different key prefixes for PR-triggered and release-triggered workflows. A compromised PR workflow cannot poison a release workflow's cache if the keys do not overlap. This is not a full defense (an attacker with arbitrary code execution can still do damage), but it eliminates the specific cache-poisoning vector used here.

Check for persistence artifacts if you ran a CI job during the window

# Check for the gh-token-monitor service (one of the payload's persistence mechanisms)
systemctl status gh-token-monitor 2>/dev/null
ls ~/.local/share/systemd/user/ | grep monitor

# Check VS Code tasks for injected entries
cat .vscode/tasks.json 2>/dev/null | grep -i monitor

# Check Claude settings for hook injection
cat ~/.claude/settings.json 2>/dev/null | grep -v '"permissions"'

# If you find any of these: stop, rotate credentials first, then remove

The payload's wiper triggers when someone revokes a stolen token while the daemon runs. Confirm the daemon is not present before rotating credentials, or coordinate both actions at the same instant.

5. What changes downstream if provenance is not a clean signal

Practically, for most teams consuming public packages, the immediate answer is: not much changes in workflow, but the mental model needs updating.

Provenance attestation was the "this package came from a known clean pipeline" signal. That signal is now more accurately described as "this package came from the expected repository and workflow, assuming the pipeline itself was not injected into." For widely-used OSS packages where you have no visibility into the upstream CI environment, that assumption deserves scrutiny.

Three things worth watching in the next quarter:

First, whether npm or the SLSA spec adds guidance on cache attestation. The build pipeline audit trail currently does not record what cache state was restored before the build ran. Adding that would let downstream consumers see whether a restore happened and from what source.

Second, whether GitHub adds controls to block OIDC token issuance from jobs that restored cache from a lower-trust workflow. Right now the runner process holds the token regardless of how the cache arrived. A job-level flag to drop OIDC access after a cross-context cache restore would close this specific vector.

Third, whether teams start treating @ts-nocheck and skip audit patterns in CI the same way they treat the Pwn Request pattern: as defaults that need an explicit justification written next to them. The TanStack postmortem credits an external researcher with the detection. The internal system had no alert. That is the gap to close.

The bottom line

TanStack's maintainers handled this well. They published a detailed timeline, named the advisory, credited the researcher, and documented what their internal detection missed. That level of transparency under pressure is worth acknowledging.

The incident is notable for two reasons. One is scale: 12.7 million weekly downloads on @tanstack/react-router alone means a narrow six-minute window had real blast radius potential. The other is the SLSA provenance angle. The attacker did not break the signature. They got inside the signing process.

If your project uses GitHub Actions for publishing, run the workflow audit above before your next release. The Pwn Request pattern is common, the cache isolation gap is invisible until something like this happens, and the OIDC scoping is easy to miss in a busy workflow file. None of these fixes take more than an afternoon.

How does your team currently handle CI trust boundaries between PR workflows and release workflows? Drop your setup in the comments.

GDS K S · thegdsks.com · follow on X @thegdsks

Valid provenance on a malicious package is not a cryptography failure. Pipeline isolation failed.

Google's Gemini 3.5 Flash is 4x faster than other frontier models. Here is how to call it from TypeScript.

GDS K S — Wed, 27 May 2026 17:20:41 +0000

Google's Gemini 3.5 Flash is 4x faster than other frontier models. Here is how to call it from TypeScript.

Google shipped Gemini 3.5 Flash on May 19 at Google I/O 2026. The headline claim is four times faster output tokens per second compared to other frontier models. That is not a marketing tier label. The claim is a throughput number, and for latency-sensitive work like streaming chat, code generation, or agentic loops, it changes what is worth reaching for.

Here is what the model actually is, how to wire it up in TypeScript, and what the cost and rate limit picture looks like before you depend on it in production.

TL;DR

Dimension	Gemini 3.5 Flash	Gemini 2.5 Flash
Output speed	4x faster than other frontier models	Best price-performance for high-volume tasks
Primary use	Agentic workflows, coding, long-horizon tasks	Cost-sensitive, high-volume, reasoning tasks
Input price	$1.50 per 1M tokens	$0.30 per 1M tokens
Output price	$9.00 per 1M tokens	$2.50 per 1M tokens
Free tier	Yes (limited)	Yes (standard rate limits)
SDK package	`@google/genai`	`@google/genai`
Model ID	`gemini-3.5-flash`	`gemini-2.5-flash`
Released	May 19, 2026	Earlier in 2026

1. What Gemini 3.5 Flash is and where it fits

Google positions Gemini 3.5 Flash as the fast tier in the 3.5 family. The framing from the announcement is "frontier intelligence with action," which is a wordy way of saying: this model runs complex agentic tasks at a speed where the latency is not the bottleneck anymore.

The benchmarks Google published back this up. On Terminal-Bench 2.1, 3.5 Flash scores 76.2%. On MCP Atlas it hits 83.6%. On CharXiv Reasoning, a multimodal benchmark, it reaches 84.2%. Google published those scores for agentic and coding workloads, not general chat.

Where does it fit against the rest of the lineup? The 2.5 Flash is cheaper per token and designed for high-volume reasoning tasks where cost per call matters more than raw throughput. The 3.5 Flash costs more but delivers output fast enough that the wall-clock time for an agentic loop shrinks, which can lower your per-task cost even at a higher per-token rate. Google's own framing is "often at less than half the cost of other frontier models" for full tasks, not individual calls.

For most TypeScript projects, the decision point is: does your user wait for the output, or does a pipeline consume it? If a user is staring at a cursor, speed matters and 3.5 Flash is worth the price premium. If a background job is processing documents at scale, 2.5 Flash is likely the right call.

2. Install the SDK and make your first call

The SDK is @google/genai. Node.js 18 or later required.

npm install @google/genai

Set your API key from Google AI Studio:

export GEMINI_API_KEY="your-key-here"

Basic call:

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "Summarize the key breaking changes in Node.js 22 for a TypeScript developer.",
});

console.log(response.text);

That is the whole surface for a one-shot request. The GoogleGenAI constructor accepts the key directly or reads GEMINI_API_KEY from the environment when called with an empty object {}. Prefer the explicit key reference so your intent is clear at the call site.

Worth noting: response.text is a convenience accessor. The full response tree lives at response.candidates[0].content.parts. You only need to go that deep when handling multi-modal outputs or function call responses.

3. Streaming responses

Four times faster output speed matters most when you stream. A blocking generateContent call holds the connection open until the model finishes. For a 1,000-token response at high throughput, that is still a perceivable wait for a user. Streaming pipes each chunk to the client as the model produces it.

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

async function streamToStdout(prompt: string): Promise<void> {
  const stream = await ai.models.generateContentStream({
    model: "gemini-3.5-flash",
    contents: prompt,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.text ?? "");
  }

  process.stdout.write("\n");
}

await streamToStdout("Write a TypeScript function that retries a promise up to N times with exponential backoff.");

In a Next.js API route or an Express server, you would pipe chunk.text into a ReadableStream and set Content-Type: text/event-stream. The pattern is the same: iterate the async generator, forward each chunk.

// pages/api/generate.ts (Next.js App Router example)
import { NextRequest } from "next/server";
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

export async function POST(req: NextRequest) {
  const { prompt } = await req.json();

  const stream = await ai.models.generateContentStream({
    model: "gemini-3.5-flash",
    contents: prompt,
  });

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(new TextEncoder().encode(chunk.text ?? ""));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

The 4x throughput claim shows up in the time between the first chunk and the last. At high output speeds, the stream feels snappy from the user's side even when total token count is large.

4. Tool calling in TypeScript

Gemini 3.5 Flash handles function calling with a three-step cycle: you declare the tool, the model returns a function call request, you execute and send back the result.

One thing to know before you write any code: Gemini 3 model APIs attach a unique id to every function call. You must echo that id back in the function response or the model cannot match results to calls. This changed in the 3.x API line.

import { GoogleGenAI, Type } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

// Step 1: Declare the tool
const getWeatherDeclaration = {
  name: "get_weather",
  description: "Returns current weather conditions for a city.",
  parameters: {
    type: Type.OBJECT,
    properties: {
      city: {
        type: Type.STRING,
        description: "City name, e.g. Tokyo",
      },
      units: {
        type: Type.STRING,
        description: "Temperature unit: celsius or fahrenheit",
      },
    },
    required: ["city"],
  },
};

// Step 2: Send the initial request
const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "What is the weather in Oslo right now?",
  config: {
    tools: [{ functionDeclarations: [getWeatherDeclaration] }],
  },
});

// Step 3: Handle the function call
if (response.functionCalls && response.functionCalls.length > 0) {
  const call = response.functionCalls[0];

  // Your real implementation here
  const weatherData = await fetchWeatherFromYourAPI(call.args as { city: string; units?: string });

  // Build conversation history with the function result
  const history = [
    { role: "user", parts: [{ text: "What is the weather in Oslo right now?" }] },
    response.candidates![0].content,
    {
      role: "user",
      parts: [
        {
          functionResponse: {
            id: call.id,       // Required in Gemini 3.x
            name: call.name,
            response: { result: weatherData },
          },
        },
      ],
    },
  ];

  // Step 4: Get the final natural-language response
  const final = await ai.models.generateContent({
    model: "gemini-3.5-flash",
    contents: history,
    config: {
      tools: [{ functionDeclarations: [getWeatherDeclaration] }],
    },
  });

  console.log(final.text);
}

async function fetchWeatherFromYourAPI(args: { city: string; units?: string }) {
  // Placeholder. Replace with your actual weather API call.
  return { temperature: 12, condition: "cloudy", city: args.city };
}

Two practical notes. The Type enum imported from @google/genai is mandatory for the parameter schema. Do not pass raw strings like "object" for the type field. The model also accepts an array of tool declarations, and you can include more than one function if your agentic workflow needs to route between them.

For parallel tool calls in a single turn, the model may return more than one entry in response.functionCalls. Iterate the array, execute each, and send all results back in one follow-up request.

5. Cost and rate limits

The pricing numbers above in the TL;DR table come from Google AI Studio's pricing page as of May 2026. Two practical caveats before you budget anything.

Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens on the paid tier. Output pricing includes thinking tokens if the model uses internal reasoning steps. In a chat or code-generation workflow, output typically runs 2 to 4 times the input token count, so budget accordingly.

The 2.5 Flash at $0.30 input / $2.50 output is a meaningful difference at scale. A task that generates 10,000 output tokens costs $0.025 on 2.5 Flash and $0.09 on 3.5 Flash. That is 3.6x more per call. The gap can close if the 4x speed advantage means 3.5 Flash completes a multi-turn agentic task in fewer wall-clock seconds and the task itself needs fewer total tokens because the model gets there faster. Test against your actual workload rather than extrapolating from single-call pricing.

Both models have a free tier through the Gemini API with rate limits Google does not publish precisely on the pricing page. The paid tier removes the per-day caps. If you are prototyping, the free tier is enough. If you are running production traffic, use a paid project and set a monthly spend cap in the Google Cloud console.

One hard ceiling worth knowing: Google Search grounding requests share a 5,000 prompt monthly quota across all Gemini 3 models on the free tier, then $14 per 1,000 queries on paid. If your tool-calling setup routes through Search grounding, that quota burns faster than you expect.

6. The bottom line

Gemini 3.5 Flash is worth adding to your model comparison list. Google's own benchmarks back the 4x output speed claim, and the numbers line up with the agentic workload focus. The TypeScript SDK is straightforward. The function calling API has one new rule compared to older Gemini versions: always echo the id field back in your function response.

The price premium over 2.5 Flash is real. Whether it pays back depends on whether your users wait for output and whether your agentic loops shrink enough in wall-clock time to offset the per-token cost difference. Run both models against your actual task shape before committing either to production.

What kind of workload are you considering Gemini 3.5 Flash for? Drop a comment, especially if you have run latency comparisons against other frontier models.

GDS K S · thegdsks.com · follow on X @thegdsks

Speed is only free if you would have paid for the wall-clock time anyway.

Build your first MCP server in TypeScript: the 2026 setup that takes 30 minutes.

GDS K S — Tue, 26 May 2026 20:07:21 +0000

Build your first MCP server in TypeScript: the 2026 setup that takes 30 minutes.

I had Claude Desktop open. I needed it to query a local SQLite database without copy-pasting schema dumps into the chat. Thirty minutes later I had a working MCP server. Here is the exact path I took, stripped of dead ends.

TL;DR

Step	What you build	Time
Project setup	npm project, tsconfig, SDK install	5 min
First tool	Structured input, structured output	10 min
First resource	Read-only data the model can request	8 min
Connect Claude Desktop	Config file, restart, verify	5 min
Common pitfalls	Avoid the three bugs that kill every first attempt	2 min

What MCP actually is

Model Context Protocol is a standard for connecting AI models to external data and tools. The model issues requests, your server handles them, and the results come back in a format the model understands. That is the whole idea.

Before MCP, every tool integration was custom. OpenAI had function calling. Anthropic had tool use. Cursor had its own plugin format. MCP standardizes the wire protocol so you write one server and any compliant client can call it, whether that is Claude Desktop, Cursor, or a client you build yourself.

The three primitives you care about:

Resources: read-only data the model can fetch, like files or database rows.
Tools: functions the model can call with arguments, like running a query or sending a request.
Prompts: reusable prompt templates the client can surface to the user.

This tutorial covers tools and resources. Prompts follow the same pattern and you will not need them for most servers.

1. Project setup

Node 18 or higher required. Check with node --version.

mkdir my-mcp-server && cd my-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk zod
npm install -D typescript @types/node
mkdir src
touch src/index.ts

The SDK package is @modelcontextprotocol/sdk. The version on npm as of May 2026 is 1.11.x. Zod handles schema validation for tool inputs.

Update package.json with these fields:

{
  "type": "module",
  "scripts": {
    "build": "tsc",
    "start": "node build/index.js"
  }
}

Create tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./build",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules"]
}

2. Implementing a tool

A tool is a function the model can call. You define its name, description, input schema, and handler. The model reads the description and schema to decide when and how to call it.

Here is a complete server with one tool that converts a hex color to RGB:

// src/index.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "color-tools",
  version: "1.0.0",
});

server.tool(
  "hex_to_rgb",
  "Convert a hex color string to RGB components. Input must include the leading #.",
  {
    hex: z.string().regex(/^#[0-9a-fA-F]{6}$/, "Must be a 6-digit hex color, e.g. #ff5733"),
  },
  async ({ hex }) => {
    const r = parseInt(hex.slice(1, 3), 16);
    const g = parseInt(hex.slice(3, 5), 16);
    const b = parseInt(hex.slice(5, 7), 16);
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify({ hex, r, g, b }),
        },
      ],
    };
  },
);

const transport = new StdioServerTransport();
await server.connect(transport);

Three things to notice:

The description string is what the model reads to decide whether to call the tool. Write it as plainly as you would write a JSDoc comment for a teammate. Vague descriptions produce missed calls or wrong inputs.

The second argument to server.tool() is the description. The third is a Zod schema object. The SDK turns this into a JSON Schema that the client sends to the model. Keep schemas tight: required fields only, no optional fields that do not change the output.

The return value must have a content array. Each item has a type and a text (or data for binary). Return JSON as a string inside a text item. The model can parse it from there.

Build and test locally:

npm run build
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | node build/index.js

You should see a JSON-RPC response listing hex_to_rgb. That confirms the server starts and responds to the list request.

3. Implementing a resource

Resources expose read-only data the model can pull on demand. A common use case: expose the schema of your local database so the model knows the table structure before writing a query.

Add this before the transport setup:

server.resource(
  "db-schema",
  "sqlite:///local.db",
  async (uri) => {
    // In a real server, read this from your database
    const schema = `
CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  created_at INTEGER NOT NULL
);
CREATE TABLE orders (
  id INTEGER PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  total_cents INTEGER NOT NULL,
  placed_at INTEGER NOT NULL
);
    `.trim();
    return {
      contents: [
        {
          uri: uri.href,
          text: schema,
          mimeType: "text/plain",
        },
      ],
    };
  },
);

The first argument is the resource name. The second is the URI the client uses to request it. Pick a URI scheme that makes sense for your data: file, sqlite, https, or a custom scheme like myapp://.

Resources are pull-based. The model requests them when it decides it needs them. If you want data pushed into every conversation automatically, that is a different pattern (system prompt injection at the client level, not a resource).

4. Hooking it up to Claude Desktop

Build the project:

npm run build

Open your Claude Desktop config file. On macOS:

~/Library/Application Support/Claude/claude_desktop_config.json

On Windows:

%APPDATA%\Claude\claude_desktop_config.json

Add your server to the mcpServers block:

{
  "mcpServers": {
    "color-tools": {
      "command": "node",
      "args": ["/absolute/path/to/my-mcp-server/build/index.js"]
    }
  }
}

Use the absolute path. Relative paths fail silently, which is the single most common first-timer mistake. Restart Claude Desktop fully (quit from the menu bar, not just close the window). Open a new conversation. You should see a hammer icon in the input bar indicating tools are available. Type "convert #3b82f6 to RGB" and watch it call the tool.

For Cursor, the config lives at ~/.cursor/mcp.json and uses the same mcpServers JSON shape:

{
  "mcpServers": {
    "color-tools": {
      "command": "node",
      "args": ["/absolute/path/to/my-mcp-server/build/index.js"]
    }
  }
}

For a generic client or testing: the MCP Inspector from Anthropic runs tool calls through a web UI without configuring Claude Desktop.

npx @modelcontextprotocol/inspector node /absolute/path/to/build/index.js

Open the Inspector UI at port 6274 and you can fire tool calls manually and inspect the raw JSON-RPC traffic.

5. Transport choice: stdio vs HTTP

The setup above uses stdio transport. The client starts your server as a child process and communicates over stdin/stdout. This works for local tools and is the path of least resistance for Claude Desktop and Cursor.

For a remote server that two or more clients share, you need HTTP transport. The SDK ships StreamableHttpServerTransport for this. You pair it with an HTTP framework (Hono, Express, Fastify) and handle sessions. That setup adds meaningful complexity and is worth a separate article. Start with stdio unless you are building a shared service from day one.

One rule that applies to both: never write to stdout with console.log in a stdio server. The MCP protocol uses stdout for JSON-RPC frames. A stray log line corrupts the framing and the client sees a parse error with no helpful message. Use console.error() for debugging output. Everything sent to stderr is safe.

6. Common pitfalls

The three mistakes I see in every first MCP server attempt:

Schema validation gaps break calls silently. If the model sends an input that does not match your Zod schema, the SDK rejects it with a generic error. The model may retry with the same bad input. Write the schema narrowly and add .describe() calls on each field to help the model understand what values are valid.

// add field-level descriptions so the model knows what to send
{
  hex: z.string()
    .regex(/^#[0-9a-fA-F]{6}$/)
    .describe("Six-digit hex color with leading #, e.g. #ff5733"),
}

Error responses need the right shape. When your tool handler throws, return a structured error instead of letting the exception propagate:

async ({ hex }) => {
  try {
    const r = parseInt(hex.slice(1, 3), 16);
    // ... rest of handler
    return { content: [{ type: "text", text: JSON.stringify({ r, g, b }) }] };
  } catch (err) {
    return {
      content: [{ type: "text", text: `Error: ${err instanceof Error ? err.message : "unknown"}` }],
      isError: true,
    };
  }
}

The isError: true flag tells the client the call failed, which surfaces properly in Claude Desktop rather than showing as a successful response with error text inside.

Resource URIs must be stable. If a client caches a resource URI and your server changes it on restart, the cached reference points nowhere. Treat resource URIs like public API paths: change them only when you intend a breaking change and version them if needed.

The bottom line

MCP is not a new protocol that requires learning a whole ecosystem. The SDK is thin. You write a handler function, attach a schema, return a content array. The hard part is designing the right tools: narrow enough to be reliable, broad enough to be useful. A tool that does one thing with a clear input schema outperforms a general-purpose tool with six optional fields every time.

Build the color tool above. Get it running in Claude Desktop. Then replace the hex conversion with whatever data or action you actually want to expose. The scaffolding is identical regardless of what the tool does.

What would you expose through an MCP server if you had it running today?

GDS K S · thegdsks.com · follow on X @thegdsks

The scaffolding is 30 minutes; the tool design is the actual work.

Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works.

GDS K S — Tue, 26 May 2026 02:51:45 +0000

Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works.

On April 2, 2026, Cursor shipped version 3.0 and called it "a unified workspace for building software with agents." The headline feature is the Agents Window: a sidebar that shows every active agent session, local or cloud, across all your repos, all at once.

I have spent the past three weeks running it on a real codebase and the experience is different enough from any previous AI coding tool that it warrants a proper walkthrough. Not a demo. The actual workflow, with the parts that break.

TL;DR

Feature	What it does	When you reach for it
Agents Window	Sidebar listing all active agent sessions	Any time you run more than one agent
Local agents	Composer 2 model, run in your open workspace	Fast iteration, short-horizon tasks
Cloud agents	Runs offline, persists when laptop closes	Long tasks, overnight runs, heavy refactors
Local to cloud handoff	Move a session between targets mid-task	When a quick task grows into a long one
Cursor Marketplace	Plugins, MCPs, subagents, skills	Extending what any agent can reach

1. What the Agents Window actually is

Before Cursor 3, you had one agent session per window. You could open more than one Cursor window, but there was no unified view across them. The Agents Window fixes that by collecting all active sessions into a single sidebar panel.

Open it with Cmd+Shift+P and search "Agents Window". What you get is a list of every agent currently running: the task that started it, the repo it targets, and whether it runs locally or in the cloud. You can click into any session, see its chat history and file diffs, and redirect it.

The practical change is visibility. Running three agents in parallel used to mean three browser tabs and a lot of alt-tabbing. Now you get one panel with three rows.

What it does not do: it does not merge agent output automatically, it does not prevent two agents from writing to the same file, and it does not enforce any ordering between sessions. That coordination is still your job. Which is exactly why you need a workflow, not just the feature.

2. The two execution targets and when to use each

Cursor 3 ships with two places an agent can run.

Local agents

A local agent runs in your open workspace using the Composer 2 model. It has access to your file system, your terminal, and your LSP (Language Server Protocol). When you ask it to refactor a function, it reads the file, writes the change, and you see the diff immediately. Round trip from prompt to edit runs in 5 to 15 seconds for most tasks.

Use local agents when the task has a short time horizon, when you want to watch the work happen in real time, or when the task touches files that you are also actively editing. The Composer 2 model is fast, and the model that knows your workspace state best because it has direct file access.

Cloud agents

A cloud agent runs on Cursor's infrastructure. The job persists even when your laptop closes. You can queue a long refactor, shut the lid, and come back four hours later to a PR ready for review. Cloud agents generate screenshots and demo recordings of the result so you can verify before you merge.

Use cloud agents when the task will take longer than you want to babysit it, when you are working across more than one repository, or when you are running automations triggered from Slack, GitHub, or Linear. The Cursor Marketplace also ships subagent plugins specifically designed to extend cloud agent capabilities with external tool access.

The handoff between local and cloud goes both ways. Start something locally, realize the scope expanded, hand it to cloud. Or pull a cloud result back into a local session to do final cleanup with LSP context.

3. A worked example: refactor pipeline split across 3 agents

Here is the actual split I ran last week on a service that needed its logging replaced with structured JSON, its error handling standardized, and its test coverage filled in. Three distinct jobs with almost no overlap in the files they touched.

Setup

# Create a worktree for each agent to avoid branch conflicts
git worktree add ../refactor-logging feature/structured-logging
git worktree add ../refactor-errors feature/error-handling
git worktree add ../refactor-tests feature/test-coverage

Git worktrees give each agent its own working directory on a separate branch. The agents are not sharing a working tree, so there are no write conflicts at the file level. The Agents Window still shows all three in the same sidebar.

Prompt structure

Each agent gets a scoped prompt. The logging agent:

Refactor all console.log and console.error calls in src/services/
to use the structured logger at src/lib/logger.ts. Output must be
JSON with fields: level, message, context. Do not change function
signatures. Do not touch test files.

The error agent:

Standardize all try/catch blocks in src/services/ to use the
AppError class in src/errors/app-error.ts. Rethrow with the
original error as the cause property. Do not change logging calls.
Do not touch test files.

The test agent:

Add missing unit tests for src/services/ using Vitest.
Cover the three exported functions with the lowest coverage
per the attached lcov.info. Do not edit source files.

The constraint "do not touch test files" in the first two prompts is not optional. Without it, agents drift toward touching shared files and you end up with three agents that all think they own src/lib/logger.ts.

Monitoring in the Agents Window

With all three agents running, the Agents Window shows each session's current file and last action. You are not watching them run; you check back every 10 minutes to see if any of them has gone quiet or made a choice that looks wrong.

The most common failure mode: an agent finishes one subtask and then starts making "improvements" to adjacent files outside its scope. Catch this early. The diff view inside each session tab shows you exactly what files the agent has queued for commit.

Merging the results

Each agent runs on its own branch. When all three finish, the merge sequence matters. Logging changes first, since error handling depends on the logger being correct. Error handling second. Tests third, because they exercise both.

git checkout main
git merge feature/structured-logging
git merge feature/error-handling
git merge feature/test-coverage

Run the test suite after each merge, not just after the last one. If the test merge fails, you want to know which of the two prior merges introduced the problem.

4. The orchestration gotchas

Parallel agents are faster than sequential agents on tasks that do not share state. But they introduce three categories of failure that a single agent session avoids.

File conflicts

Two agents writing to the same file at the same time produce a merge conflict that neither of them knows about. The only reliable prevention is prompt scoping. Give each agent an explicit list of directories it owns and an explicit list it must not touch. Worktrees help at the file system level, but they do not prevent two agents from editing the same path in different branches.

If you skip this and end up with conflicts, do not ask a third agent to resolve them. Resolve merge conflicts manually. The context an agent needs to resolve a three-way conflict correctly is usually larger than what fits in a useful prompt.

Branch divergence

Agents that run long enough start diverging from main in ways that require manual rebase. A 4-hour cloud agent job started on Monday morning may return to a main branch that has 12 commits it did not see. Budget time for rebase before merge, especially on active repos.

# Before merging any agent branch, rebase it
git checkout feature/structured-logging
git rebase main
# resolve conflicts, then merge

Cost ceiling

Three agents running in parallel burn tokens three times as fast as one. Local agents use your Cursor subscription allocation. Cursor bills cloud agents separately for compute time, though no per-minute rate appears in the public docs at time of writing. Set a scope that finishes in under two hours for each agent on the first run. You will learn the actual token and time cost from those runs and can calibrate longer jobs after.

The Agents Window does not have a built-in cost display per session at version 3.4. You get total usage in account settings. If you need per-session cost visibility, log the task start time and check account usage after the session ends.

The bottom line

The Agents Window is not magic. Treat it as a coordination surface for parallel work that you still have to design. The rule that made this actually work for me: treat each agent like a pull request reviewer who will only read the files you hand them. Scope, branch, scope again, then run.

The real gain is not speed on one task. The gain is that three independent jobs that used to take three sequential afternoons now take one. The orchestration tax is real, but it pays back at 3x velocity on the right class of work.

What kind of tasks are you splitting across agents? The comment thread from the first 90 minutes usually surfaces approaches I have not tried. Drop yours below.

GDS K S · thegdsks.com · follow on X @thegdsks

Parallel agents are faster only when you design the seams between them.

Microsoft tried to kill the printer driver. Healthcare said no.

GDS K S — Sat, 23 May 2026 06:36:49 +0000

Microsoft tried to kill the printer driver. 90% of US healthcare said no.

In late 2025, Microsoft put a line on the Windows Roadmap that should have read as routine. Starting January 2026, Windows Update would stop shipping legacy V3 and V4 printer drivers. Modern Print Platform only. Goodbye to a decade of brittle vendor blobs.

In February 2026 they quietly took it back. The line vanished from the roadmap. The official statement told users no action applies. Existing printers will keep working. The deprecation, for now, sits on hold.

Microsoft holds more market power than almost any company in history. They tried to retire a category of driver that Microsoft itself deprecated back in September 2023. They could not actually pull it off. The reason sits in every hospital in the United States, and it makes a noise like a 1990s modem.

TL;DR

Thing	Status
V3 and V4 printer drivers	Deprecated since September 2023, still alive
January 2026 deprecation push	Announced, then retracted in February 2026
US healthcare communication that still runs on fax	About 70 percent
Once you count EHR linked faxing	Closer to 90 percent
ATM transactions still running on COBOL	About 95 percent
Online banking transactions touching COBOL	More than 40 percent
Time horizon on this stuff actually dying	Decades, not quarters

1. The headline that almost happened

The original Microsoft plan looked clean. V3 and V4 driver models carried known security and stability problems. Modern Print Platform, the IPP based replacement, outperforms them in almost every measurable way. Microsoft already deprecated the old drivers two and a half years ago. The January 2026 update would have completed the cleanup.

That plan sits in the archive now. Tom's Hardware and Windows Central covered the original announcement. The retraction came after Microsoft "received feedback." The polite version of "received feedback" reads as follows: some quite large customers told Microsoft, in writing, that breaking the printer pipeline would break the hospital pipeline, and that the hospital pipeline runs on fax.

2. The fax number you cannot believe

Here is the statistic that broke my brain when I first read it. Roughly 70 percent of healthcare communication in the United States still moves over fax. When you include EHR linked faxing, where an electronic health record system pretends to be a fax machine in order to talk to the rest of the industry, the number climbs to about 90 percent.

Ninety percent. Of the most regulated, most digitized, most money-flooded industry in the developed world. Running on a protocol that predates the personal computer.

   The 2026 healthcare comms diagram

  ┌──────────────┐         FAX           ┌──────────────┐
  │   Hospital A │  ─────────────────▶   │   Clinic B   │
  │   (modern    │                       │   (modern    │
  │    EHR)      │                       │    EHR)      │
  └──────────────┘                       └──────────────┘
        │                                       │
        ▼                                       ▼
   Pretends to be                          Pretends to be
   a fax machine                           a fax machine
        │                                       │
        ▼                                       ▼
  ╔═════════════════════════════════════════════════════╗
  ║   90% of the actual traffic goes over fax anyway    ║
  ╚═════════════════════════════════════════════════════╝

That diagram explains what Microsoft hit when they tried to ship the driver change. The driver path covers more than home offices. The driver path runs through compliance pipelines that no single engineering team owns. Break the driver layer in January, and somebody's referral cannot reach somebody else's prior authorization in February. That outcome does not fit a "we will respond to feedback" narrative. That outcome makes a 60 Minutes segment.

3. The other infrastructure that refuses to die

Fax counts as the most visible example. Not the only one. The pattern shows up everywhere stable infrastructure built up decades of edge cases. IBM has said for years, in slightly louder volumes each year, that COBOL still runs about 95 percent of ATM transactions and more than 40 percent of online banking. The COBOL workforce is aging out. The replacements never arrived. The systems keep running.

Same pattern with:

System	Year designed	Still doing real work in 2026
Fax	1843 (concept), 1960s mainstream	Yes, in healthcare and government
COBOL	1959	Yes, in banks and insurance
FORTRAN	1957	Yes, in scientific computing
SQL	1974	Yes, almost everywhere
Email (SMTP)	1982	Yes, the protocol you read every day
HTTP	1991	Yes, you are reading this over it

We tell each other we live in a world of rapid change. The world actually sits on one of the most stable substrates the species has ever built. The application layer churns. The substrate hardly moves at all.

4. The lesson for software you ship today

You will not build fax machines. You will, almost certainly, write code that outlives your current job, your current company, and possibly your current career. That outcome sits at the heart of the COBOL story that nobody puts on a slide. The COBOL devs in 1985 did not know their code would still run in 2026. They just shipped.

The code you wrote last week might still serve as a production database adapter in 2040. The defaults you picked stand a chance of becoming invariants for some future maintainer who has never met you. Five practical rules that pay back over the decade-scale arc of code:

Rule 1: Comment the boundary, not the line

Your future maintainer can read your code. They cannot read your decision tree. Write down why a particular flag exists, why a particular workaround sits where it does, why a particular value lives as a constant. Skip the obvious. Document the negotiations.

# bad
TIMEOUT = 47

# good
# Set to 47 seconds because the partner auth gateway has a hard 50s limit
# and we observed 1-2s of jitter from our load balancer in the May 2023
# postmortem. Do not raise without coordinating with the integrations team.
TIMEOUT = 47

The bad comment captures what the code already says. The good comment captures the negotiation that produced the number, which is the part that erases first.

Rule 2: Pick formats that read in plain text

JSON, CSV, plain SQL, basic English logs. The dependency on a binary format with proprietary tooling bites archaeologists hardest. If somebody can cat the file in 2046 and start guessing what it does, you have done them a favor that pays back forever.

The fax format is plain enough that a forensic analyst can read it with the right hardware. COBOL source is plain enough that a junior dev with a manual can read it. The systems that died fastest in the 1990s and 2000s were the ones that depended on a binary tool that the vendor stopped supporting. Choose against that future.

Rule 3: Write the migration script you wish someone had written for you

Every meaningful schema change should ship with the SQL or code that undoes it, or that walks the data from the old shape to the new one. Future you, or future someone, will thank you.

-- Forward migration
ALTER TABLE users ADD COLUMN preferred_locale VARCHAR(10) DEFAULT 'en-US';
UPDATE users SET preferred_locale = 'en-GB'
  WHERE country_code IN ('GB', 'IE', 'AU', 'NZ');

-- Down migration (commit this in the same file)
ALTER TABLE users DROP COLUMN preferred_locale;

Tools like Alembic, Flyway, Liquibase, and Sequelize migrations enforce this discipline. If your team is doing migrations as ad-hoc DBAs running scripts in pgAdmin, you are storing technical debt that compounds at the rate of every release.

Rule 4: Version your wire formats from day one

The number one source of unkillable legacy infrastructure is a public protocol that grew without a version field. The 1843 fax protocol gained version negotiation only when CCITT standardized it. The internet has 30 years of bolt-on versioning because TCP/IP shipped without it. Avoid being the contributor of the next one.

// good API response, version everywhere
{
  "version": "2026-05-01",
  "data": { "..." }
}

Use date-based versioning, header-based versioning, or URL-based versioning. Pick one. Use it consistently. When you need to make a breaking change in five years, the version field is the only thing that lets you do it without breaking every client at once.

Rule 5: Write a CHANGELOG that survives the company

CHANGELOG.md, in the root of every repo you own. One entry per release. Date, version, and a sentence per change. Not generated. Written by a human. The future maintainer reads this before they read your code.

## [2026-05-12] - 2.4.1
- Fixed billing rounding bug where orders with >100 line items
  rounded the tax down by 1 cent. See incident 2026-05-09.
- Raised the partner gateway timeout from 30s to 47s. Coordinated with
  the integrations team. Do not raise further.

The CHANGELOG is the only document that gets read in 2040. Make it count.

5. A short tour of the substrate you depend on right now

If you think your stack is modern, the following table is for you. The right column is the year the underlying protocol or format reached its current dominant form. Every one of these things runs in the path of the request that loaded this article.

Layer	Protocol or format	Year
Network	TCP/IP	1981
Domain name	DNS	1983
Email transport	SMTP	1982
Email reading	IMAP	1986
Web transport	HTTP/1.1	1997
Time format	Unix epoch	1970
Text encoding	UTF-8	1993
Image format	JPEG	1992
Image format	PNG	1996
Video format	H.264	2003
Database query language	SQL	1974
Source control	Git	2005
Container format	Tar	1979
Shell	POSIX shell	1989

The newest thing on that list is H.264, and it is 23 years old. Everything else has been there longer than most of the people reading this article have been alive. The "modern stack" is a thin veneer of frameworks over a substrate that predates the personal computer in most cases.

This is not bad news. It is the most stable substrate any creative discipline has ever had to work on. Painters change pigments every century. Architects change materials every generation. Software engineers work on a foundation that has been mostly stable for 40 years. That foundation is what makes everything we build possible.

6. The honest take

A tempting story sits here that goes "legacy is bad and we should kill it." That story misses the picture. The legacy systems stayed around because they work. A hundred million transactions a day stress-tested them, in front of regulators who would happily fine the carrier that broke them. The new systems will, eventually, earn the same proof. They have not yet.

The reasonable position lands at humility. We do not count as the first generation to write important software. We will not count as the last. The substrate predates us. The substrate will probably outlast us.

In a strange way, that picture reassures rather than worries. Microsoft cannot delete the printer driver. The fax machine still rings in your hospital. The work matters.

The bottom line

A driver deprecation that should have been routine got walked back because the substrate it sits on is older, weirder, and more important than the people deprecating it remembered. Healthcare runs on fax. Banking runs on COBOL. Your job, whatever you ship next, is going to land in someone's legacy/ directory eventually. Write it like the next person matters.

Question for the comments: what is the oldest piece of infrastructure your job still depends on, and how surprised would your CTO be to learn it is in the critical path?

GDS K S · thegdsks.com · follow on X @thegdsks

The most modern thing in your stack is the part that is about to be legacy.

Google redesigned 13 Workspace icons last week. Here is where to grab the new SVGs.

GDS K S — Fri, 22 May 2026 07:07:47 +0000

On May 18 Google started rolling out new gradient icons for thirteen of its Workspace apps. Gmail, Drive, Docs, Sheets, Slides, Calendar, Chat, Meet, Vids, Forms, Keep, Voice, and Tasks all got refreshed artwork on the web. The iOS and Android rollouts began this week.

Google 2026 SVG Icons - Free Download (14 icons) | theSVG

Browse and download 14 Google 2026 SVG icons. Free for personal and commercial use. Copy as SVG, JSX, React component, or CDN link.

thesvg.org

If you build a SaaS dashboard with a "works with Google Workspace" row, or a marketing page that shows the Gmail icon next to your integration copy, you have a small problem. The icons in your codebase are now the old set, and most projects do not have a fast path to refresh them.

Here is what changed, why icon updates take so long to land in OSS libraries, and how to grab the new Google 2026 SVGs today without waiting.

TL;DR

What	Status
Apps redesigned	13 (Gmail, Drive, Docs, Sheets, Slides, Calendar, Chat, Meet, Vids, Forms, Keep, Voice, Tasks)
Visual direction	Gradient style, more distinct shape and color per app
Color rule change	Dropped the "all four Google colors" mandate
Gmail exception	Still uses more than one color, the only one in the set
Web rollout	Mid-May 2026
Mobile rollout	Late May 2026
OSS SVGs available at	thesvg.org/category/google-2026, free, no attribution

1. What changed in the Google 2026 icon set

The earlier Google Workspace icons followed a strict rule. Every product icon had to use all four Google colors, blue, red, yellow, and green. The result was a row of icons that all looked vaguely similar at small sizes. A user in the app launcher would scan a wall of red-blue-yellow-green squares and pause to read the label.

The new direction drops that rule. Each app now leans on one or two dominant colors and a clearer shape, with a soft gradient finish. Gmail is the one holdout that still keeps more than one color, because the envelope is the recognizable shape and the colors are part of the brand identity.

The icons are also larger inside the same containing box. Most apps no longer ship the rounded-square page background, so the symbol takes up the full visual area instead of floating inside a card.

You can see the new Google 2026 icons in two places today, the app launcher in the top-right of any Google site, and the New Tab page in Chrome. Open either and you are already looking at the refreshed set, even if you have not touched any setting.

2. Why icon refreshes take time to reach your project

This is the part that bites a freelancer at 5pm on a Friday.

When a major brand refreshes its mark, the icon does not appear in your bundle on its own. Someone has to source the original from the brand's media kit or extract it from the live site. Then optimize the path through SVGO. Then verify it renders the same on dark and light backgrounds. Then categorize, name, and ship.

For a single brand refresh that touches one product, the cycle takes days to weeks depending on bandwidth. For thirteen apps in one rollout, multiply that. The OSS community absorbs brand refreshes one path file at a time, and most icon catalogs run on volunteer hours.

You get the gap. The official Google sites already show the new icons. Your app still shows the old ones. To a user who keeps Gmail open in a tab next to your dashboard, this reads as "this dashboard is stale." The icons are a small detail. Small details are what users read as signals of how current a product is.

glincker / thesvg

6,035+ brand SVG icons for developers. Tree-shakeable, typed, open source. npm i thesvg

6,030+ SVG icons. Brands, AWS, Azure, GCP, and more. Search, copy, ship.

Browse Icons • Install • Extensions • CDN • API • Packages • Compare • Contribute

Why theSVG?

Most icon libraries focus on UI icons. Brand logos are scattered across press kits, Figma files, and random GitHub repos. theSVG is the single source for SVG icons - brand logos, cloud architecture diagrams, and more. Searchable, versioned, and available as npm packages, CDN, CLI, API, and MCP server.

6,030+ icons across multiple collections
4,019 brand icons across 55+ categories
739 AWS Architecture icons (2026-Q1)
626 Azure Service icons (2026-Q1)
214 Google Cloud icons (2026-Q1)
8,400+ SVG variants - color, mono, light, dark, wordmark
Tree-shakeable - import one icon, ship only that icon
TypeScript-first - fully typed, dual ESM/CJS
Framework-agnostic - React, Vue, Svelte, plain HTML, or CDN
AI-ready - MCP server for Claude, Cursor, and Windsurf

Collections

theSVG organizes…

View on GitHub

3. Where to grab the Google 2026 SVGs today

The full Google 2026 icon set is live in the open-source library thesvg.org. All thirteen Workspace apps are in the catalog with the new gradient artwork, shipped the same week as Google's web rollout. License: free, no attribution required. The repo is on GitHub at GLINCKER/thesvg if you want to contribute, file an issue, or fork.

Install via npm:

npm install thesvg

Or download direct from the site. URLs follow a stable pattern, /icons/[brand]/[variant].svg, so you can wire them into a build step:

// src/components/GoogleIcon.tsx
// Server component or build-time loader, not a runtime fetch in production
import { readFileSync } from 'node:fs';
import { join } from 'node:path';

type IconName =
  | 'gmail' | 'google-drive' | 'google-docs'
  | 'google-sheets' | 'google-slides' | 'google-calendar'
  | 'google-chat' | 'google-meet' | 'google-vids'
  | 'google-forms' | 'google-keep' | 'google-voice'
  | 'google-tasks';

export function GoogleIcon({ name, size = 32 }: { name: IconName; size?: number }) {
  const svg = readFileSync(
    join(process.cwd(), 'public/icons', name, '2026.svg'),
    'utf-8',
  );
  return (
    <div
      style={{ width: size, height: size, display: 'inline-block' }}
      dangerouslySetInnerHTML={{ __html: svg }}
    />
  );
}

For a Vite or Next.js project, the cleaner path is to import the SVG as a component through your bundler's SVG loader. The above is the read-the-file version for projects that do not have a loader configured yet.

If you maintain an OSS app and need to migrate to the Google 2026 icons fast for a release this week, the path is: install the package, swap your existing Google icon imports for the 2026 variants, handle the Gmail edge case below, ship.

4. The Gmail multi-color edge case

One thing worth handling carefully in your render code. Gmail is the only app in the new Google 2026 set that keeps more than one color. The other twelve work fine with a currentColor fill or a single-color CSS override. Gmail breaks if you do that, because the multi-color fill is the brand.

If your design system applies a color prop to all logos uniformly, you need a special case for Gmail, or you ship two render paths:

function BrandIcon({ name, color }: { name: IconName; color?: string }) {
  const preservesColor = name === 'gmail';
  if (preservesColor) {
    return <GoogleIcon name={name} />;
  }
  return (
    <GoogleIcon name={name} style={{ color: color ?? 'currentColor' }} />
  );
}

This is the kind of edge case the old four-color rule used to hide. When every icon used four colors, you knew you could not apply a single-color override to any of them. Now twelve out of thirteen work fine with an override and one does not. Read your design system docs accordingly.

5. The bigger pattern

Brand refreshes ship faster than the icon ecosystem can absorb them. This is the third major refresh of the past two years where the official site updates on day zero and the broader OSS catalog catches up over weeks. When you depend on a third-party library to ship brand assets, you are accepting a built-in lag.

The fix is not to abandon icon libraries. The fix is to know which catalogs already have the assets you need for the release you are shipping this week, and to pick accordingly. For a marketing page going live now with a "works with Google" row, you want the catalog that already has the Google 2026 set. For a long-running design system, the audit trail and naming convention matter more than speed.

The OSS community is at its best when a new resource lands and people share it before everyone has to rebuild it from scratch. That is the spirit here.

The bottom line

Google shipped new gradient icons for thirteen Workspace apps on May 18. The web rollout is live, the mobile rollout is in progress, and the new SVGs are already available as OSS at thesvg.org/category/google-2026, free with no attribution. If you build product that lives next to Workspace in your users' tabs, the migration takes one afternoon.

What does your icon-refresh workflow look like when a major brand drops a redesign overnight? Drop a comment with your current setup.

GDS K S · thegdsks.com · building thesvg.org and Glincker · follow on X @thegdsks

Brand refreshes are the moment your icon library reveals whether it is curated or just convenient.