DEV Community

Cover image for Taming the Interactions API in the @google/genai SDK
Andrew Ross
Andrew Ross

Posted on

Taming the Interactions API in the @google/genai SDK

Fixing TypeScript Inference in Google's Gemini Interactions API

Google released beta support for a new Interactions API on 2025-12-11, described as "a unified interface for interacting with Gemini models and agents". The latest @google/genai SDK (v1.34.0 as of writing) surfaces this beta API, unlocking preview support for Deep Research tasks with background polling.

There's just one problem: you can't use it safely or sanely in TypeScript.

The Problem isn't Verbosity, it's Impossibility

When unknown appears in a union type, TypeScript doesn't treat it as "one possible type among many." It treats it as "this could be anything," which collapses inference entirely.

// What Google ships:
declare interface MCPServerToolResultContent {
  result: MCPServerToolResultContent.Items | unknown | string;
  // ...
}

declare namespace MCPServerToolResultContent {
  interface Items {
    items?: Array<string | InteractionsAPI.ImageContent | unknown>;
  }
}
Enter fullscreen mode Exit fullscreen mode
// What TypeScript infers:
result: null | {}  // 💀 completely unusable
Enter fullscreen mode Exit fullscreen mode

You can't narrow into result.items. You can't check typeof result === "string". The type system has given up. Your only options:

  1. Unsafe assertions everywhere: (result as MCPServerToolResultContent.Items).items
  2. Patch the SDK to remove unknown from unions

Option 1 defeats the purpose of TypeScript. Option 2 is what I'm providing below.


The Patch: Removing unknown From Union Types

Before any type utilities matter, you need a working foundation. The @google/genai SDK (applicable to v1.34.0) ships with unknown polluting critical union types, making safe narrowing impossible.

Applying the Patch

Using pnpm:

# Initialize the patch (creates a temporary editable copy)
pnpm patch @google/genai
Enter fullscreen mode Exit fullscreen mode

Output:

Patch: You can now edit the package at:

  /your-project/node_modules/.pnpm_patches/@google/genai@1.34.0

To commit your changes, run:

  pnpm patch-commit '/your-project/node_modules/.pnpm_patches/@google/genai@1.34.0'
Enter fullscreen mode Exit fullscreen mode

Navigate to that directory, apply the edits to dist/node/node.d.ts (detailed below), then commit:

pnpm patch-commit '/your-project/node_modules/.pnpm_patches/@google/genai@1.34.0'
Enter fullscreen mode Exit fullscreen mode

Or grab the patch directly:

curl -o patches/@google__genai.patch \
  https://gist.githubusercontent.com/DopamineDriven/7a826cae206bc28c2c620d1eee0dea9e/raw/@google__genai.patch
Enter fullscreen mode Exit fullscreen mode

Then add to your pnpm-workspace.yaml:

patchedDependencies:
  '@google/genai': patches/@google__genai.patch
Enter fullscreen mode Exit fullscreen mode

Verifying the Patch

After committing, your pnpm-workspace.yaml should include the patchedDependencies entry above, and patches/@google__genai.patch will exist in your project root. This file is version-controlled—commit it so your team (and CI) gets the fix automatically.

What the Patch Fixes

The patch targets 8 locations across dist/node/node.d.ts (~8,500 LOC):

Line Type Change
1824 ContentDelta.FunctionResultDelta.Items Array<string | ImageContent | unknown>Array<string | ImageContent>
1917 ContentDelta.MCPServerToolResultDelta.Items Array<string | ImageContent | unknown>Array<string | ImageContent>
3867 FunctionResultContent.result Items | unknown | stringItems | string
3881 FunctionResultContent.Items Array<string | ImageContent | unknown>Array<string | ImageContent>
5410 HttpResponse.json() Promise<unknown>Promise<T = unknown> (adds generic)
6997 MCPServerToolResultContent.result Items | unknown | stringItems | string
7011 MCPServerToolResultContent.Items Array<string | ImageContent | unknown>Array<string | ImageContent>
8560 SchemaUnion Schema | unknownSchema

The Full Patch File

Click to Expand
diff --git a/dist/node/node.d.ts b/dist/node/node.d.ts
index bc7365e795551b19fe390a25421e2611de75b168..5090f992b2deece3e098c311e06eee49cbda7db9 100644
--- a/dist/node/node.d.ts
+++ b/dist/node/node.d.ts
@@ -1821,7 +1821,7 @@ declare namespace ContentDelta {
     }
     namespace FunctionResultDelta {
         interface Items {
-            items?: Array<string | InteractionsAPI.ImageContent | unknown>;
+            items?: Array<string | InteractionsAPI.ImageContent>;
         }
     }
     interface CodeExecutionCallDelta {
@@ -1914,7 +1914,7 @@ declare namespace ContentDelta {
     }
     namespace MCPServerToolResultDelta {
         interface Items {
-            items?: Array<string | InteractionsAPI.ImageContent | unknown>;
+            items?: Array<string | InteractionsAPI.ImageContent>;
         }
     }
     interface FileSearchResultDelta {
@@ -3864,7 +3864,7 @@ declare interface FunctionResultContent {
     /**
      * The result of the tool call.
      */
-    result: FunctionResultContent.Items | unknown | string;
+    result: FunctionResultContent.Items | string;
     type: 'function_result';
     /**
      * Whether the tool call resulted in an error.
@@ -3878,7 +3878,7 @@ declare interface FunctionResultContent {

 declare namespace FunctionResultContent {
     interface Items {
-        items?: Array<string | InteractionsAPI.ImageContent | unknown>;
+        items?: Array<string | InteractionsAPI.ImageContent>;
     }
 }

@@ -5407,7 +5407,7 @@ export declare class HttpResponse {
      */
     responseInternal: Response;
     constructor(response: Response);
-    json(): Promise<unknown>;
+    json<T = unknown>(): Promise<T>;
 }

 /** An image. */
@@ -6994,7 +6994,7 @@ declare interface MCPServerToolResultContent {
     /**
      * The result of the tool call.
      */
-    result: MCPServerToolResultContent.Items | unknown | string;
+    result: MCPServerToolResultContent.Items | string;
     type: 'mcp_server_tool_result';
     /**
      * Name of the tool which is called for this specific tool call.
@@ -7008,7 +7008,7 @@ declare interface MCPServerToolResultContent {

 declare namespace MCPServerToolResultContent {
     interface Items {
-        items?: Array<string | InteractionsAPI.ImageContent | unknown>;
+        items?: Array<string | InteractionsAPI.ImageContent>;
     }
 }

@@ -7076,7 +7076,7 @@ export declare enum MediaResolution {
  * This type contains `RequestInit` options that may be available on the current runtime,
  * including per-platform extensions like `dispatcher`, `agent`, `client`, etc.
  */
-declare type MergedRequestInit = RequestInits & 
+declare type MergedRequestInit = RequestInits &
 /** We don't include these in the types as they'll be overridden for every request. */
 Partial<Record<'body' | 'headers' | 'method' | 'signal', never>>;

@@ -8557,7 +8557,7 @@ export declare interface Schema {
     type?: Type;
 }

-export declare type SchemaUnion = Schema | unknown;
+export declare type SchemaUnion = Schema;

 /** An image mask representing a brush scribble. */
 export declare interface ScribbleImage {
Enter fullscreen mode Exit fullscreen mode

Bonus: Generic HttpResponse.json()

Line 5410 isn't strictly an unknown-in-union issue, but while we're in surgery-mode anyway making json() generic enables:

// After patch
const data = await response.json<MyExpectedType>();

// Instead of
const data = await response.json() as MyExpectedType;
Enter fullscreen mode Exit fullscreen mode

Small win, same patch.


The Type Utilities: From Patched to Pleasant

With the patch applied, you could write verbose narrowing code. But why suffer? A few type utilities transform the SDK's 18-member delta union into a discriminated map.

UnionToRecord: The Core Transformation

export type UnionToRecord<
  TUnion extends { type: string },
  TDiscriminant extends string = TUnion["type"]
> = {
  [K in TDiscriminant]: Extract<TUnion, { type: K }>;
};
Enter fullscreen mode Exit fullscreen mode

This takes any discriminated union (where each member has a type field) and produces an object type keyed by those discriminants.

CTR (Conditional to Required)

The SDK marks delta as optional on ContentDelta. We need it required to extract the union. CTR is a surgical Required<T> that only targets optional keys:

export type Rm<T, P extends keyof T = keyof T> = {
  [S in keyof T as Exclude<S, P>]: T[S];
};

export type IsOptional<T, K extends keyof T> = undefined extends T[K]
  ? object extends Pick<T, K>
    ? true
    : false
  : false;

export type OnlyOptional<T> = {
  [K in keyof T as IsOptional<T, K> extends true ? K : never]: T[K];
};

/**
 * CTR (Conditional to Required)
 *
 * - By default: makes all **optional** properties required.
 * - With K: makes only the specified optional keys required.
 */
export type CTR<
  T,
  K extends keyof OnlyOptional<T> = keyof OnlyOptional<T>
> = Rm<T, K> & {
  [Q in K]-?: T[Q];
};
Enter fullscreen mode Exit fullscreen mode

Putting It Together — GeminiEventMap

import type { Interactions } from "@google/genai";

export type UnionToRecord<
  TUnion extends { type: string },
  TDiscriminant extends string = TUnion["type"]
> = {
  [K in TDiscriminant]: Extract<TUnion, { type: K }>;
};

export type InteractionDeltas = CTR<
  Interactions.ContentDelta,
  "delta"
>["delta"];

export type GeminiEventMap = UnionToRecord<InteractionDeltas>;
Enter fullscreen mode Exit fullscreen mode

On hover, GeminiEventMap resolves to:

type GeminiEventMap = {
  text: Interactions.ContentDelta.TextDelta;
  image: Interactions.ContentDelta.ImageDelta;
  audio: Interactions.ContentDelta.AudioDelta;
  document: Interactions.ContentDelta.DocumentDelta;
  video: Interactions.ContentDelta.VideoDelta;
  thought_summary: Interactions.ContentDelta.ThoughtSummaryDelta;
  thought_signature: Interactions.ContentDelta.ThoughtSignatureDelta;
  function_call: Interactions.ContentDelta.FunctionCallDelta;
  function_result: Interactions.ContentDelta.FunctionResultDelta;
  code_execution_call: Interactions.ContentDelta.CodeExecutionCallDelta;
  code_execution_result: Interactions.ContentDelta.CodeExecutionResultDelta;
  url_context_call: Interactions.ContentDelta.URLContextCallDelta;
  url_context_result: Interactions.ContentDelta.URLContextResultDelta;
  google_search_call: Interactions.ContentDelta.GoogleSearchCallDelta;
  google_search_result: Interactions.ContentDelta.GoogleSearchResultDelta;
  mcp_server_tool_call: Interactions.ContentDelta.MCPServerToolCallDelta;
  mcp_server_tool_result: Interactions.ContentDelta.MCPServerToolResultDelta;
  file_search_result: Interactions.ContentDelta.FileSearchResultDelta;
};
Enter fullscreen mode Exit fullscreen mode

Every delta type, keyed by its discriminant. No narrowing required.


The Handler Pattern: Type-Safe Event Dispatch

With GeminiEventMap in place, we can build a handler factory that infers types from the event key:

type HandlerRegistration<K extends keyof GeminiEventMap = keyof GeminiEventMap> = {
  event: K;
  handler: (data: GeminiEventMap[K]) => void;
};

protected interactionsHandler = <
  const K extends keyof GeminiEventMap = keyof GeminiEventMap
>(
  event: K,
  handler: (data: GeminiEventMap[K]) => void
) => ({ event, handler });
Enter fullscreen mode Exit fullscreen mode

The const K generic constraint preserves literal types, so interactionsHandler("text", ...) infers K as "text", not string.

Registering Handlers

const handlers = [
  this.interactionsHandler("text", (delta) => {
    // delta is Interactions.ContentDelta.TextDelta
    console.log(delta.text);
  }),
  this.interactionsHandler("image", (delta) => {
    // delta is Interactions.ContentDelta.ImageDelta
    console.log(delta.mime_type, delta.data);
  }),
  this.interactionsHandler("file_search_result", (delta) => {
    // delta is Interactions.ContentDelta.FileSearchResultDelta
    for (const res of delta.result) {
      console.log(res.title, res.text);
    }
  }),
] as const satisfies readonly HandlerRegistration[];
Enter fullscreen mode Exit fullscreen mode

Dispatch

function dispatch(delta: InteractionDeltas) {
  const registration = handlers.find((h) => h.event === delta.type);
  if (registration) {
    // The discriminant match guarantees type alignment at runtime
    (registration.handler as (data: InteractionDeltas) => void)(delta);
  }
}
Enter fullscreen mode Exit fullscreen mode

The cast in dispatch is the one escape hatch—TypeScript can't prove the runtime discriminant match guarantees type alignment, but we know it does. A pragmatic concession vs. 18 if branches.


Before vs. After

Before: Manual Narrowing Hell

for await (const event of stream) {
  switch (event.event_type) {
    case "content.delta": {
      if (event.delta) {
        // First layer: check if result exists
        if (
          "result" in event.delta &&
          typeof event.delta.result !== "undefined"
        ) {
          // Second layer: is it a string? an array? an object with items?
          if (typeof event.delta.result === "string") {
            // handle string result...
          } else if (
            Array.isArray(event.delta.result) &&
            event.delta.result.length > 0 &&
            // Third layer: compound guards to narrow delta.type
            (event.delta.type === "url_context_result" ||
              event.delta.type === "google_search_result" ||
              event.delta.type === "file_search_result")
          ) {
            // Fourth layer: ANOTHER switch on delta.type
            switch (event.delta.type) {
              case "file_search_result": {
                for (const res of event.delta.result) {
                  // finally... we can access res.title, res.text
                }
                break;
              }
              // ... 2 more cases
            }
          } else if (
            // Back to layer 2: different shape check
            "items" in event.delta.result &&
            typeof event.delta.result.items !== "undefined" &&
            (event.delta.type === "function_result" ||
              event.delta.type === "mcp_server_tool_result")
          ) {
            // The `unknown` in the union forces this assertion
            for (const it of event.delta.result.items) {
              const item = it as Interactions.ImageContent | string; // 💀
              // ...
            }
          }
        }
        // Oh, and we still need to handle delta.type directly
        switch (event.delta.type) {
          case "image": { /* ... */ break; }
          case "google_search_call": { /* ... */ break; }
          // ... 16 more cases
        }
      }
      break;
    }
    // ... 6 more event_type cases
  }
}
Enter fullscreen mode Exit fullscreen mode

After: Declarative, Composable, Typed.

const handlers = [
  this.interactionsHandler("file_search_result", (delta) => {
    // delta.result is FileSearchResult[] — no narrowing, no assertions
    for (const res of delta.result) {
      this.logger.info({ title: res.title, text: res.text });
    }
  }),
  this.interactionsHandler("image", (delta) => {
    // delta.data is the base64 payload, delta.mime_type is typed
    this.handleImageUpload(delta.data, delta.mime_type);
  }),
  this.interactionsHandler("text", (delta) => {
    this.emit("text", delta.text);
  }),
  // Register only what you need, skip what you don't
] as const satisfies readonly HandlerRegistration[];
Enter fullscreen mode Exit fullscreen mode

The discriminated map pattern shifts narrowing from imperative branching to declarative registration. Handlers become composable units rather than branches in a monolithic switch.


Troubleshooting

MCP SDK Import Resolution

If your editor throws errors on the @modelcontextprotocol/sdk import while editing in .pnpm_patches, you have two options:

Option A (recommended): Ignore the editor error—it won't affect the committed patch or runtime behavior. The import resolves correctly once you're out of the patch directory.

Option B: If it's driving you mad, install the MCP SDK and add a triple-slash directive:

pnpm add -D @modelcontextprotocol/sdk
Enter fullscreen mode Exit fullscreen mode

Then prepend to your patch:

 /// <reference types="node" />
+/// <reference path="../../../../../@modelcontextprotocol/sdk/dist/esm/client/index.js" />

-import type { Client } from '@modelcontextprotocol/sdk/client/index.js';
+import type { Client } from '@modelcontextprotocol/sdk';
Enter fullscreen mode Exit fullscreen mode

This is purely for editor peace of mind during patching—it doesn't affect the fix itself.


Bonus: Deep Research Config Reference

The Interactions API's Deep Research functionality is under two weeks old. An example working configuration with the non-obvious gotchas annotated is provided below:

const gemini = this.getClient(apiKey);

const system_instruction = this.formatSystemInstruction(systemPrompt);

const input = this.formatHistoryDeepResearch(msgs, system_instruction);

const previous_interaction_id = this.previousInteractionId(msgs);

const stream = await gemini.interactions.create(
  {
    agent: "deep-research-pro-preview-12-2025",
    // Interactions.Turn[] type signature
    input,
    // "v1alpha" | "v1beta"
    api_version: "v1alpha",
    tools: [
      { type: "google_search" },
      { type: "code_execution" },
      { type: "url_context" },
      {
        type: "file_search",
        file_search_store_names: [
          "fileSearchStores/my-file_search-store-123"
        ]
      }
    ] satisfies Interactions.Tool[],
    // "audio" is another supported option
    response_modalities: ["text", "image"],
    system_instruction,
    // enables thinking events for feedback throughout the task
    agent_config: { thinking_summaries: "auto", type: "deep-research" },
    background: true,
    // when streaming is enabled `store: true` is required
    store: true,
    previous_interaction_id,
    stream: true
  },
  { stream: true }
);
Enter fullscreen mode Exit fullscreen mode

Key gotchas:

  • store: true is required when stream: true; the SDK doesn't enforce this at the type level
  • previous_interaction_id enables multi-turn research sessions; parsed from event.interaction.id on interaction.complete events
  • Audio input is unsupported; audio output is supported via response_modalities

Closing Thoughts

The Interactions API is genuinely powerful; it enables Deep Research tasks, background polling, multi-modal outputs (documents, images, audio, etc), MCP server integration, and more. But the TypeScript story is currently hostile. The types shipped in v1.34.0 of the @google/genai SDK were likely autogenerated from Protobuf definitions (explaining the unknown union member contamination that nukes type inference throughout the Interactions namespace without the patch fix).

Until Google ships cleaner types, the patch + utility pattern provides

  • Safe and sane narrowing without assertions
  • Composable handlers instead of nested switches
  • Full IntelliSense across all 18 delta types

The API is in beta. The shipped types will likely improve eventually. But if you want to harness bleeding edge features like Deep Research today, this is the way.


Resources:

Top comments (0)