The Vercel AI SDK is the fastest way to add LLM features to a Next.js app. After building with it in production, here's what actually matters — and the parts the docs underexplain.
The core model: unified provider interface
The AI SDK's main value proposition is that it abstracts across providers. You write your code once and swap between Claude, GPT-4, Gemini, or Mistral by changing one line:
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
// Switch providers by changing this one line
const model = anthropic('claude-opus-4-6');
// const model = openai('gpt-4o');
// const model = google('gemini-pro');
const { text } = await generateText({
model,
prompt: 'Explain the difference between RSC and client components in Next.js',
});
In practice, this matters because:
- You can A/B test providers on latency and quality
- You can fall back to a secondary provider if one is down
- Your prompt engineering stays separate from your model selection
The three main APIs
generateText — for non-streaming responses where you need the full output before doing something with it:
const { text, usage } = await generateText({
model: anthropic('claude-sonnet-4-6'),
messages: [
{ role: 'user', content: 'Write a haiku about debugging' }
],
system: 'You are a poet who specializes in programmer humor.',
maxTokens: 200,
});
console.log(text);
console.log(usage.totalTokens); // Track costs
streamText — for streaming to a UI. This is what you want for chat interfaces:
// app/api/chat/route.ts
import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
export async function POST(request: Request) {
const { messages } = await request.json();
const result = await streamText({
model: anthropic('claude-opus-4-6'),
messages,
system: 'You are a helpful assistant.',
onFinish: async ({ text, usage }) => {
// Save to database after streaming completes
await saveMessage(text);
await trackUsage(usage);
},
});
return result.toDataStreamResponse();
}
generateObject — for structured output. This is the one people underuse:
import { generateObject } from 'ai';
import { z } from 'zod';
const { object } = await generateObject({
model: anthropic('claude-sonnet-4-6'),
schema: z.object({
title: z.string().describe('Article title, under 60 characters'),
tags: z.array(z.string()).min(2).max(5),
summary: z.string().max(200),
difficulty: z.enum(['beginner', 'intermediate', 'advanced']),
}),
prompt: `Generate metadata for this article: ${articleContent}`,
});
console.log(object.title); // Fully typed, validated against schema
The SDK handles the retry logic when the model returns invalid JSON. You get a Zod-validated object back, not a string you have to parse.
Streaming in the React UI
The useChat hook handles all the streaming state management:
// components/ChatInterface.tsx
'use client';
import { useChat } from 'ai/react';
export function ChatInterface() {
const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
api: '/api/chat',
onError: (err) => console.error('Chat error:', err),
onFinish: (message) => {
// Called when the streaming response is complete
analytics.track('chat_message_completed', { tokens: message.content.length });
},
});
return (
<div>
{messages.map((message) => (
<div key={message.id} className={message.role === 'user' ? 'user' : 'assistant'}>
{message.content}
</div>
))}
{isLoading && <div className="typing-indicator">Atlas is thinking...</div>}
{error && <div className="error">Something went wrong. Try again.</div>}
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={handleInputChange}
placeholder="Ask anything..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>Send</button>
</form>
</div>
);
}
The messages array updates in real-time as tokens stream in. No manual state management needed.
Tool use with the AI SDK
This is where it gets powerful. Define tools with Zod schemas and the SDK handles the back-and-forth automatically:
import { streamText, tool } from 'ai';
import { z } from 'zod';
import { anthropic } from '@ai-sdk/anthropic';
export async function POST(request: Request) {
const { messages } = await request.json();
const result = await streamText({
model: anthropic('claude-opus-4-6'),
messages,
tools: {
getWeather: tool({
description: 'Get current weather for a location',
parameters: z.object({
location: z.string().describe('City and country'),
unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
}),
execute: async ({ location, unit }) => {
// This runs on your server when Claude calls the tool
const weather = await fetchWeatherAPI(location, unit);
return weather;
},
}),
searchDatabase: tool({
description: 'Search the product database',
parameters: z.object({
query: z.string(),
category: z.enum(['mcp-servers', 'skill-packs', 'starters']).optional(),
maxResults: z.number().min(1).max(10).default(5),
}),
execute: async ({ query, category, maxResults }) => {
return await db.products.findMany({
where: {
title: { contains: query },
...(category && { category }),
},
take: maxResults,
});
},
}),
},
maxSteps: 5, // Allow up to 5 tool call rounds before forcing a text response
});
return result.toDataStreamResponse();
}
The maxSteps parameter is crucial. Without it, Claude could chain tool calls indefinitely. With it, you get a guaranteed text response after N steps.
Multi-step agents
maxSteps enables agent-like behavior where Claude calls multiple tools in sequence:
const result = await generateText({
model: anthropic('claude-opus-4-6'),
tools: {
searchProducts,
getProductDetails,
checkInventory,
createQuote,
},
maxSteps: 8,
prompt: 'Find the best MCP server for crypto data, check if it\'s available, and create a purchase quote for 1 year.',
});
// result.steps shows every tool call and its result
console.log(result.steps);
// [
// { type: 'tool-call', toolName: 'searchProducts', input: {...} },
// { type: 'tool-result', toolName: 'searchProducts', result: [...] },
// { type: 'tool-call', toolName: 'getProductDetails', input: {...} },
// ...
// ]
Middleware and rate limiting
The AI SDK v4 introduced middleware for the provider layer:
import { wrapLanguageModel, experimental_createProviderRegistry } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
// Wrap any model with custom middleware
const modelWithLogging = wrapLanguageModel({
model: anthropic('claude-sonnet-4-6'),
middleware: {
wrapGenerate: async ({ doGenerate, params }) => {
const start = Date.now();
try {
const result = await doGenerate();
// Log successful generation
console.log({
model: params.model,
tokens: result.usage,
latency: Date.now() - start,
});
return result;
} catch (error) {
// Log failures
console.error({ model: params.model, error, latency: Date.now() - start });
throw error;
}
},
},
});
// Use it like any other model
const { text } = await generateText({
model: modelWithLogging,
prompt: '...',
});
Token budgeting and cost control
import { generateText } from 'ai';
async function generateWithBudget(prompt: string, user: User) {
// Check user's remaining token budget
const remaining = await getTokenBudget(user.id);
if (remaining < 100) {
throw new Error('Token budget exhausted. Upgrade to continue.');
}
const { text, usage } = await generateText({
model: anthropic('claude-haiku-4-5-20251001'), // Cheapest for simple tasks
prompt,
maxTokens: Math.min(1000, remaining), // Cap at budget
});
// Deduct from budget
await deductTokens(user.id, usage.totalTokens);
return text;
}
Choosing the right model for the task
The SDK makes it trivial to route different task types to different models:
function selectModel(taskType: 'simple' | 'complex' | 'code' | 'vision') {
switch (taskType) {
case 'simple':
return anthropic('claude-haiku-4-5-20251001'); // Fast, cheap
case 'complex':
return anthropic('claude-opus-4-6'); // Most capable
case 'code':
return anthropic('claude-sonnet-4-6'); // Good balance for code
case 'vision':
return anthropic('claude-sonnet-4-6'); // Supports images
}
}
// Route at call time
const model = selectModel(determineTaskComplexity(userMessage));
const { text } = await generateText({ model, prompt: userMessage });
The pattern I actually use in production
// lib/ai.ts — shared configuration
import { anthropic } from '@ai-sdk/anthropic';
import { wrapLanguageModel } from 'ai';
const baseModel = anthropic('claude-sonnet-4-6');
export const ai = wrapLanguageModel({
model: baseModel,
middleware: {
wrapGenerate: async ({ doGenerate, params }) => {
// 1. Rate limit check
await checkRateLimit(getCurrentUser());
// 2. Generate
const result = await doGenerate();
// 3. Usage tracking
await trackUsage(getCurrentUser(), result.usage);
return result;
},
},
});
// Use everywhere — rate limiting and tracking happen automatically
const { text } = await generateText({ model: ai, prompt: '...' });
The AI SDK is genuinely well-designed. The unified provider interface, Zod schema integration for structured output, and the streaming hooks save hundreds of lines of boilerplate per project.
The patterns above are from the AI SaaS Starter Kit — which ships with a complete AI feature implementation: streaming chat, structured generation, tool use, usage tracking, and per-user rate limiting. All the boilerplate is already handled.
Full SDK docs at sdk.vercel.ai. The cookbook section has good examples for specific patterns.
Top comments (0)