If you're building an AI chat interface today, you're likely stuck in the past. You’ve mastered the "text-in, text-out" loop—streaming responses, managing state, and polishing the UI. But in the real world, data isn't just a string of text. It’s a PDF contract, a product screenshot, or a log file.
To build truly useful AI applications, we must evolve our mental model from a "text terminal" to a "multi-modal workspace."
This guide explores the architecture, security, and code required to handle file uploads and attachments in chat. We'll bridge the gap between a simple Q&A bot and a genuine AI assistant that can see, read, and synthesize complex data.
The Core Concept: From Text-in-Text-Out to Multi-Modal Conversations
In previous tutorials, we treated the Large Language Model (LLM) as a sophisticated text processor. You feed it a string, and it streams back a string. This is the "Hello, World" of generative AI. However, the modern stack requires Multi-Part Messages.
Just as an HTTP request contains a body, headers, and metadata, a message sent to an AI model is evolving into a container that holds various data types simultaneously. The core challenge shifts from managing a string in React state (useState<string>) to managing a complex object containing a text prompt and a binary file reference.
The Analogy: The Executive Assistant and the Briefcase
Imagine you are a busy executive (the User) and you have an incredibly powerful assistant (the AI Model) working behind a frosted glass window.
- The Old Way: You slide a piece of paper under the door with a question and slide it back for an answer.
- The New Way: You need to ask your assistant to analyze a competitor's brochure (PDF) and a photo of a prototype (Image).
You cannot simply write "analyze this photo" on paper. You need to slide a briefcase through the door along with your note.
- The Note (Text Prompt): "Please analyze the attached brochure and photo, and draft a response comparing their product to ours."
- The Briefcase (Attachments): This contains the physical objects (the PDF and the image).
The assistant (AI) must now:
- Accept the Briefcase: Unlock the door and pull it in (Server-side file ingestion).
- Inspect the Contents: Open the briefcase and identify the document vs. the image (File type parsing).
- Synthesize: Read the text and "look" at the photo to form a mental concept (Multi-modal processing).
- Write: Generate the draft based on both sources (Streaming the response).
If the assistant refuses to open the briefcase (security restrictions) or doesn't have the key (validation failure), the interaction fails.
Under the Hood: The Data Flow of a Multi-Part Message
Moving from text-only to text-plus-files changes the architecture of the useChat hook interaction significantly. It is no longer a linear flow of strings.
1. The Client-Side: From File System to Data URL
On the client side, the user initiates a file selection via a hidden <input type="file" />. The browser provides a File object—a reference to data in temporary memory.
To send this to a Next.js server, we must serialize it. There are two primary methods:
- Base64 Encoding: The binary data is converted into an ASCII string. This string can be embedded directly into a JSON payload.
- Multipart Form Data: The request body is split into distinct parts separated by a "boundary" string. One part contains the text, another contains the raw binary data.
The Vercel AI SDK abstracts this complexity. When you pass a File object or a Data URL into the messages array, the SDK automatically detects the content type and formats the request correctly.
2. The Server-Side: Ingestion and Normalization
Once the request hits your Next.js server, the SDK's ai function intercepts the payload and performs Normalization. This converts the incoming file data into a format the underlying LLM can understand.
- For Images: The model decodes the image into a pixel array or converts it into a specialized token format (like OpenAI's
gpt-4-vision). - For Text Files (PDF, DOCX): The server extracts raw text from the binary file structure (parsing/chunking).
Security and Storage: The "Bouncer" and the "Locker"
Allowing file uploads is one of the most dangerous features you can add to a web application. A user could upload a malicious script, a massive file that crashes your server, or illegal content. This requires Validation and Storage Strategies.
1. Validation (The Bouncer)
Before a file is processed by the AI or stored on your server, it must pass strict security checks.
- Client-side: Immediate feedback ("File too large" or "Wrong file type") for UX.
- Server-side: The definitive check. You must verify the MIME type (e.g.,
image/png) and file size. Never trust the client.
Analogy: Think of this as a bouncer at a nightclub. The bouncer checks the ID (file extension) and dress code (file size). If they pass, they get a wristband (a secure token) allowing them to proceed.
2. Storage (The Locker)
Once validated, where does the file live?
- Temporary (In-Memory): For simple chats, keeping the file in RAM just long enough to process it. Fast but volatile.
- Ephemeral Storage (Vercel Blob): For longer interactions or streaming where the model might need to "look" at the file multiple times.
- Permanent Storage (Database): For persistent history (e.g., legal document review), stored permanently linked to a user ID.
Basic Code Example: Client-Side File Handling
Here is a minimal client-side component demonstrating how to attach files to a message using the Vercel AI SDK's useChat hook. This focuses on selecting a file and sending it alongside text.
'use client';
import { useChat } from 'ai/react';
import { useState, ChangeEvent, FormEvent } from 'react';
export default function FileChatInterface() {
const {
messages,
input,
handleInputChange,
handleSubmit,
isLoading,
error
} = useChat({
api: '/api/chat',
});
const [selectedFile, setSelectedFile] = useState<File | null>(null);
const handleFileChange = (event: ChangeEvent<HTMLInputElement>) => {
if (event.target.files && event.target.files.length > 0) {
const file = event.target.files[0];
setSelectedFile(file);
}
};
const handleCustomSubmit = (event: FormEvent<HTMLFormElement>) => {
event.preventDefault();
if (selectedFile) {
// The magic line: Pass the file object directly to the SDK
handleSubmit(event, {
files: [selectedFile],
});
setSelectedFile(null);
} else {
handleSubmit(event);
}
};
return (
<div className="chat-container">
<div className="messages-list">
{messages.map((m) => (
<div key={m.id} className={`message ${m.role}`}>
<strong>{m.role === 'user' ? 'You: ' : 'AI: '}</strong>
{m.experimental_attachments && m.experimental_attachments.length > 0 && (
<span>[Attachment: {m.experimental_attachments[0].name}] </span>
)}
<span>{m.content}</span>
</div>
))}
</div>
<form onSubmit={handleCustomSubmit} className="input-area">
<input
type="text"
value={input}
onChange={handleInputChange}
placeholder="Type a message..."
disabled={isLoading}
/>
<label htmlFor="file-upload" className="file-label">
{selectedFile ? `Selected: ${selectedFile.name}` : '📎 Attach File'}
</label>
<input
id="file-upload"
type="file"
onChange={handleFileChange}
style={{ display: 'none' }}
disabled={isLoading}
/>
<button type="submit" disabled={isLoading || !input.trim()}>
{isLoading ? 'Sending...' : 'Send'}
</button>
</form>
{error && <div className="error">Error: {error.message}</div>}
</div>
);
}
Line-by-Line Explanation
-
'use client';: Marks the file as a Client Component in the Next.js App Router, necessary for using React hooks and browser-specific events like file selection. -
const { ... } = useChat({ api: '/api/chat' });: Destructures the hook's properties.messagesholds history,inputmanages the text field, andhandleSubmittriggers the API call. -
handleFileChange: Updates local stateselectedFilefor UI feedback. Crucially, we do not manually convert to Base64 here; the SDK handles the encoding when we pass the file object in the submission. -
handleCustomSubmit: Overrides default form behavior.- If a file exists, it calls
handleSubmit(event, { files: [selectedFile] }). This tells the SDK to send a multipart request containing both the text and the file. - If no file exists, it falls back to standard text submission.
- If a file exists, it calls
Common Pitfalls
When implementing file uploads, several specific JavaScript and infrastructure issues can arise:
-
The "Async Void" Trap in
handleSubmit- The Issue: Developers often try to wrap
handleSubmitin auseEffector async function to "process" the file before sending. - Why it fails:
useChathandles the asynchronous network request internally. Manually manipulating the file stream can lock the main thread or fire the request before the file is read. - Fix: Trust the hook. Pass the file object directly to
handleSubmit(event, { files: [...] }). Do not manually convert the file to a data URL (Base64) before passing it; the SDK does this efficiently.
- The Issue: Developers often try to wrap
-
Vercel/Next.js Timeouts (4MB Limit)
- The Issue: Vercel Serverless Functions have a default payload limit (often 4.5MB on the Hobby plan). Sending a large PDF results in a
413 Payload Too Largeerror. - Why it fails: The file is sent as part of the request body. If it exceeds the limit, the request is rejected immediately.
- Fix: For large files, upload the file to Vercel Blob or AWS S3 first. Get the URL, then send only the URL to the
/api/chatendpoint. The server then fetches the file from the URL.
- The Issue: Vercel Serverless Functions have a default payload limit (often 4.5MB on the Hobby plan). Sending a large PDF results in a
-
Runtime Validation Neglect (Zod)
- The Issue: Trusting
event.target.files[0]blindly. - Why it fails: A malicious user could rename
virus.exetoimage.pngand upload it. - Fix: Always validate file types and sizes on the server using a schema validator like Zod.
// Inside your API route import { z } from 'zod'; const fileSchema = z.object({ name: z.string().min(1), size: z.number().max(5 * 1024 * 1024), // 5MB limit type: z.enum(['image/png', 'application/pdf']), // Whitelist types }); - The Issue: Trusting
Advanced Implementation: Secure Multi-Modal Chat
This implementation demonstrates a robust, production-ready architecture. It separates concerns: the client handles UI and file selection, while the server manages security, storage, and AI inference.
The Client Component (ChatInterface.tsx)
This component uses useChat but enhances it with custom file handling logic, converting files to Base64 for transmission.
'use client';
import { useChat } from 'ai/react';
import { useState, useRef, ChangeEvent, FormEvent } from 'react';
import type { Message } from 'ai';
export default function ChatInterface() {
const [pendingFiles, setPendingFiles] = useState<File[]>([]);
const fileInputRef = useRef<HTMLInputElement>(null);
const {
messages,
input,
handleInputChange,
handleSubmit: originalSubmit,
isLoading,
error
} = useChat({
api: '/api/chat/route',
});
const handleFileChange = (e: ChangeEvent<HTMLInputElement>) => {
if (e.target.files) {
setPendingFiles(Array.from(e.target.files));
}
};
const handleFileSubmit = async (e: FormEvent<HTMLFormElement>) => {
e.preventDefault();
if (pendingFiles.length === 0) {
originalSubmit(e);
return;
}
// Process files into Data URLs
const fileData = await Promise.all(
pendingFiles.map(async (file) => ({
name: file.name,
type: file.type,
size: file.size,
data: await readFileAsBase64(file),
}))
);
// Reset UI
setPendingFiles([]);
if (fileInputRef.current) fileInputRef.current.value = '';
// Trigger submission with attachments
originalSubmit(e, {
data: {
attachments: fileData,
},
});
};
const readFileAsBase64 = (file: File): Promise<string> => {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => resolve(reader.result as string);
reader.onerror = reject;
reader.readAsDataURL(file);
});
};
return (
<div className="flex flex-col h-full max-w-4xl mx-auto p-4 bg-white shadow-lg rounded-lg">
{/* Message List */}
<div className="flex-1 overflow-y-auto space-y-4 mb-4 p-2 border rounded">
{messages.map((msg: Message) => (
<div key={msg.id} className={`p-3 rounded ${msg.role === 'user' ? 'bg-blue-100 text-right' : 'bg-gray-100 text-left'}`}>
<p className="font-semibold text-sm text-gray-600">{msg.role === 'user' ? 'You' : 'AI'}</p>
<div className="mt-1 whitespace-pre-wrap">{msg.content}</div>
{/* Render Attachments in UI */}
{msg.experimental_attachments && (
<div className="mt-2 flex flex-wrap gap-2 justify-end">
{msg.experimental_attachments.map((att: any, i: number) => (
<div key={i} className="text-xs bg-purple-100 border border-purple-300 rounded px-2 py-1">
📎 {att.name}
</div>
))}
</div>
)}
</div>
))}
{isLoading && <div className="text-center text-gray-400 animate-pulse">AI is thinking...</div>}
{error && <div className="text-red-500 text-center">Error: {error.message}</div>}
</div>
{/* Input Area */}
<form onSubmit={handleFileSubmit} className="flex flex-col gap-2">
<input
type="text"
value={input}
onChange={handleInputChange}
className="w-full p-2 border rounded"
placeholder="Ask a question or request analysis..."
disabled={isLoading}
/>
<div className="flex gap-2">
<input
type="file"
ref={fileInputRef}
onChange={handleFileChange}
className="hidden"
id="file-upload-advanced"
/>
<label
htmlFor="file-upload-advanced"
className="cursor-pointer bg-gray-200 hover:bg-gray-300 text-gray-700 px-4 py-2 rounded font-medium text-sm transition"
>
{pendingFiles.length > 0 ? `${pendingFiles.length} Files Selected` : 'Attach Files'}
</label>
<button
type="submit"
disabled={isLoading || (!input.trim() && pendingFiles.length === 0)}
className="flex-1 bg-blue-600 hover:bg-blue-700 text-white px-4 py-2 rounded font-medium text-sm disabled:opacity-50 disabled:cursor-not-allowed transition"
>
{isLoading ? 'Processing...' : 'Send Message'}
</button>
</div>
</form>
</div>
);
}
The Server Action (route.ts)
This is where the magic happens. The server receives the Base64 data, validates it, and normalizes it for the AI model.
import { streamText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
import { z } from 'zod';
// IMPORTANT: Validate file size and type on the server
const attachmentSchema = z.object({
name: z.string(),
type: z.string().refine((val) => val.startsWith('image/') || val === 'application/pdf', {
message: "Only images and PDFs are allowed",
}),
size: z.number().max(5 * 1024 * 1024), // 5MB max
data: z.string().startsWith('data:'), // Must be a Data URL
});
export async function POST(req: Request) {
const { messages } = await req.json();
// Extract the last message to check for attachments
const lastMessage = messages[messages.length - 1];
const attachments = lastMessage.experimental_attachments || [];
// Validate attachments
for (const att of attachments) {
const result = attachmentSchema.safeParse(att);
if (!result.success) {
return new Response(JSON.stringify({ error: result.error }), { status: 400 });
}
}
// Process attachments into a format the LLM understands
// For images, we pass the Data URL directly to the model
// For PDFs, we would typically extract text here (using a library like pdf-parse)
// before sending it to the model. For this example, we assume images.
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Stream the response
const result = await streamText({
model: openai('gpt-4o'),
messages: [
{
role: 'system',
content: 'You are a helpful assistant capable of analyzing images and documents.',
},
...messages, // Pass the history (which includes the attachments in the SDK's internal format)
],
});
return result.toAIStreamResponse();
}
Conclusion
Handling file uploads in chat is the bridge between a simple text-based Q&A bot and a true AI assistant. It requires a shift in thinking from linear text streams to complex, multi-part data structures.
By mastering these concepts—Serialization (Base64/Blobs), Normalization (Text extraction/Image tokenization), Security (Zod validation), and Storage (Vercel Blob)—you enable your application to see and read, not just listen and write.
The days of the text-only terminal are over. It's time to build the multi-modal workspace.
The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the book The Modern Stack. Building Generative UI with Next.js, Vercel AI SDK, and React Server Components Amazon Link of the AI with JavaScript & TypeScript Series.
The ebook is also on Leanpub.com with many other ebooks: https://leanpub.com/u/edgarmilvus.
Top comments (0)