DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Vercel AI SDK 4.0: streaming, tools, and multi-step agents in Next.js

The Vercel AI SDK is the fastest way to add LLM features to a Next.js app. After building with it in production, here's what actually matters — and the parts the docs underexplain.

The core model: unified provider interface

The AI SDK's main value proposition is that it abstracts across providers. You write your code once and swap between Claude, GPT-4, Gemini, or Mistral by changing one line:

import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';

// Switch providers by changing this one line
const model = anthropic('claude-opus-4-6');
// const model = openai('gpt-4o');
// const model = google('gemini-pro');

const { text } = await generateText({
  model,
  prompt: 'Explain the difference between RSC and client components in Next.js',
});
Enter fullscreen mode Exit fullscreen mode

In practice, this matters because:

  • You can A/B test providers on latency and quality
  • You can fall back to a secondary provider if one is down
  • Your prompt engineering stays separate from your model selection

The three main APIs

generateText — for non-streaming responses where you need the full output before doing something with it:

const { text, usage } = await generateText({
  model: anthropic('claude-sonnet-4-6'),
  messages: [
    { role: 'user', content: 'Write a haiku about debugging' }
  ],
  system: 'You are a poet who specializes in programmer humor.',
  maxTokens: 200,
});

console.log(text);
console.log(usage.totalTokens); // Track costs
Enter fullscreen mode Exit fullscreen mode

streamText — for streaming to a UI. This is what you want for chat interfaces:

// app/api/chat/route.ts
import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

export async function POST(request: Request) {
  const { messages } = await request.json();

  const result = await streamText({
    model: anthropic('claude-opus-4-6'),
    messages,
    system: 'You are a helpful assistant.',
    onFinish: async ({ text, usage }) => {
      // Save to database after streaming completes
      await saveMessage(text);
      await trackUsage(usage);
    },
  });

  return result.toDataStreamResponse();
}
Enter fullscreen mode Exit fullscreen mode

generateObject — for structured output. This is the one people underuse:

import { generateObject } from 'ai';
import { z } from 'zod';

const { object } = await generateObject({
  model: anthropic('claude-sonnet-4-6'),
  schema: z.object({
    title: z.string().describe('Article title, under 60 characters'),
    tags: z.array(z.string()).min(2).max(5),
    summary: z.string().max(200),
    difficulty: z.enum(['beginner', 'intermediate', 'advanced']),
  }),
  prompt: `Generate metadata for this article: ${articleContent}`,
});

console.log(object.title); // Fully typed, validated against schema
Enter fullscreen mode Exit fullscreen mode

The SDK handles the retry logic when the model returns invalid JSON. You get a Zod-validated object back, not a string you have to parse.

Streaming in the React UI

The useChat hook handles all the streaming state management:

// components/ChatInterface.tsx
'use client';
import { useChat } from 'ai/react';

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
    api: '/api/chat',
    onError: (err) => console.error('Chat error:', err),
    onFinish: (message) => {
      // Called when the streaming response is complete
      analytics.track('chat_message_completed', { tokens: message.content.length });
    },
  });

  return (
    <div>
      {messages.map((message) => (
        <div key={message.id} className={message.role === 'user' ? 'user' : 'assistant'}>
          {message.content}
        </div>
      ))}

      {isLoading && <div className="typing-indicator">Atlas is thinking...</div>}
      {error && <div className="error">Something went wrong. Try again.</div>}

      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          disabled={isLoading}
        />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

The messages array updates in real-time as tokens stream in. No manual state management needed.

Tool use with the AI SDK

This is where it gets powerful. Define tools with Zod schemas and the SDK handles the back-and-forth automatically:

import { streamText, tool } from 'ai';
import { z } from 'zod';
import { anthropic } from '@ai-sdk/anthropic';

export async function POST(request: Request) {
  const { messages } = await request.json();

  const result = await streamText({
    model: anthropic('claude-opus-4-6'),
    messages,
    tools: {
      getWeather: tool({
        description: 'Get current weather for a location',
        parameters: z.object({
          location: z.string().describe('City and country'),
          unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
        }),
        execute: async ({ location, unit }) => {
          // This runs on your server when Claude calls the tool
          const weather = await fetchWeatherAPI(location, unit);
          return weather;
        },
      }),
      searchDatabase: tool({
        description: 'Search the product database',
        parameters: z.object({
          query: z.string(),
          category: z.enum(['mcp-servers', 'skill-packs', 'starters']).optional(),
          maxResults: z.number().min(1).max(10).default(5),
        }),
        execute: async ({ query, category, maxResults }) => {
          return await db.products.findMany({
            where: {
              title: { contains: query },
              ...(category && { category }),
            },
            take: maxResults,
          });
        },
      }),
    },
    maxSteps: 5, // Allow up to 5 tool call rounds before forcing a text response
  });

  return result.toDataStreamResponse();
}
Enter fullscreen mode Exit fullscreen mode

The maxSteps parameter is crucial. Without it, Claude could chain tool calls indefinitely. With it, you get a guaranteed text response after N steps.

Multi-step agents

maxSteps enables agent-like behavior where Claude calls multiple tools in sequence:

const result = await generateText({
  model: anthropic('claude-opus-4-6'),
  tools: {
    searchProducts,
    getProductDetails,
    checkInventory,
    createQuote,
  },
  maxSteps: 8,
  prompt: 'Find the best MCP server for crypto data, check if it\'s available, and create a purchase quote for 1 year.',
});

// result.steps shows every tool call and its result
console.log(result.steps);
// [
//   { type: 'tool-call', toolName: 'searchProducts', input: {...} },
//   { type: 'tool-result', toolName: 'searchProducts', result: [...] },
//   { type: 'tool-call', toolName: 'getProductDetails', input: {...} },
//   ...
// ]
Enter fullscreen mode Exit fullscreen mode

Middleware and rate limiting

The AI SDK v4 introduced middleware for the provider layer:

import { wrapLanguageModel, experimental_createProviderRegistry } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

// Wrap any model with custom middleware
const modelWithLogging = wrapLanguageModel({
  model: anthropic('claude-sonnet-4-6'),
  middleware: {
    wrapGenerate: async ({ doGenerate, params }) => {
      const start = Date.now();

      try {
        const result = await doGenerate();

        // Log successful generation
        console.log({
          model: params.model,
          tokens: result.usage,
          latency: Date.now() - start,
        });

        return result;
      } catch (error) {
        // Log failures
        console.error({ model: params.model, error, latency: Date.now() - start });
        throw error;
      }
    },
  },
});

// Use it like any other model
const { text } = await generateText({
  model: modelWithLogging,
  prompt: '...',
});
Enter fullscreen mode Exit fullscreen mode

Token budgeting and cost control

import { generateText } from 'ai';

async function generateWithBudget(prompt: string, user: User) {
  // Check user's remaining token budget
  const remaining = await getTokenBudget(user.id);

  if (remaining < 100) {
    throw new Error('Token budget exhausted. Upgrade to continue.');
  }

  const { text, usage } = await generateText({
    model: anthropic('claude-haiku-4-5-20251001'), // Cheapest for simple tasks
    prompt,
    maxTokens: Math.min(1000, remaining), // Cap at budget
  });

  // Deduct from budget
  await deductTokens(user.id, usage.totalTokens);

  return text;
}
Enter fullscreen mode Exit fullscreen mode

Choosing the right model for the task

The SDK makes it trivial to route different task types to different models:

function selectModel(taskType: 'simple' | 'complex' | 'code' | 'vision') {
  switch (taskType) {
    case 'simple':
      return anthropic('claude-haiku-4-5-20251001'); // Fast, cheap
    case 'complex':
      return anthropic('claude-opus-4-6'); // Most capable
    case 'code':
      return anthropic('claude-sonnet-4-6'); // Good balance for code
    case 'vision':
      return anthropic('claude-sonnet-4-6'); // Supports images
  }
}

// Route at call time
const model = selectModel(determineTaskComplexity(userMessage));
const { text } = await generateText({ model, prompt: userMessage });
Enter fullscreen mode Exit fullscreen mode

The pattern I actually use in production

// lib/ai.ts — shared configuration
import { anthropic } from '@ai-sdk/anthropic';
import { wrapLanguageModel } from 'ai';

const baseModel = anthropic('claude-sonnet-4-6');

export const ai = wrapLanguageModel({
  model: baseModel,
  middleware: {
    wrapGenerate: async ({ doGenerate, params }) => {
      // 1. Rate limit check
      await checkRateLimit(getCurrentUser());

      // 2. Generate
      const result = await doGenerate();

      // 3. Usage tracking
      await trackUsage(getCurrentUser(), result.usage);

      return result;
    },
  },
});

// Use everywhere — rate limiting and tracking happen automatically
const { text } = await generateText({ model: ai, prompt: '...' });
Enter fullscreen mode Exit fullscreen mode

The AI SDK is genuinely well-designed. The unified provider interface, Zod schema integration for structured output, and the streaming hooks save hundreds of lines of boilerplate per project.

The patterns above are from the AI SaaS Starter Kit — which ships with a complete AI feature implementation: streaming chat, structured generation, tool use, usage tracking, and per-user rate limiting. All the boilerplate is already handled.

Full SDK docs at sdk.vercel.ai. The cookbook section has good examples for specific patterns.

Top comments (0)