LLMs return strings. Your application needs objects. Instructor patches OpenAI's client to return validated, typed data structures using Pydantic models. No more parsing JSON from markdown blocks.
What Instructor Gives You for Free
- Structured extraction — LLM outputs validated against Pydantic/Zod schemas
- Automatic retries — if validation fails, it re-prompts with the error
- Streaming — partial objects stream as they're generated
- Multiple providers — OpenAI, Anthropic, Google, Mistral, Ollama, LiteLLM
- Python & TypeScript — first-class support for both
- Lightweight — patches existing clients, not a framework
Quick Start (Python)
pip install instructor openai
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class User(BaseModel):
name: str
age: int
email: str
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Extract: John is 30, email john@test.com"}]
)
print(user.name) # "John" (str, not Any)
print(user.age) # 30 (int, validated)
print(user.email) # "john@test.com" (str, validated)
Complex Extraction With Validation
from pydantic import BaseModel, Field, field_validator
from typing import List
class Address(BaseModel):
street: str
city: str
state: str = Field(pattern=r'^[A-Z]{2}$') # Must be 2-letter state code
zip_code: str = Field(pattern=r'^\d{5}$')
class Contact(BaseModel):
name: str
phone: str
email: str
addresses: List[Address]
@field_validator('email')
@classmethod
def validate_email(cls, v):
if '@' not in v:
raise ValueError('Invalid email')
return v
# If the LLM returns invalid data, Instructor retries with the validation error
contact = client.chat.completions.create(
model="gpt-4o",
response_model=Contact,
max_retries=3, # Retries with validation errors in prompt
messages=[{"role": "user", "content": long_email_text}]
)
Streaming Partial Objects
from instructor import Partial
class Article(BaseModel):
title: str
summary: str
tags: List[str]
key_points: List[str]
# Stream partial results as they arrive
for partial_article in client.chat.completions.create_partial(
model="gpt-4o",
response_model=Article,
messages=[{"role": "user", "content": f"Analyze: {text}"}]
):
print(partial_article.title) # Available as soon as generated
print(partial_article.tags) # Grows as more tags are generated
TypeScript Version
import Instructor from '@instructor-ai/instructor';
import OpenAI from 'openai';
import { z } from 'zod';
const client = Instructor({ client: new OpenAI() });
const UserSchema = z.object({
name: z.string(),
age: z.number(),
email: z.string().email()
});
const user = await client.chat.completions.create({
model: 'gpt-4o-mini',
response_model: { schema: UserSchema, name: 'User' },
messages: [{ role: 'user', content: 'Extract: John is 30, john@test.com' }]
});
console.log(user.name); // Fully typed!
Real-World Use Cases
Data extraction from emails
class Invoice(BaseModel):
vendor: str
amount: float
due_date: str
line_items: List[LineItem]
invoice = client.chat.completions.create(
model="gpt-4o",
response_model=Invoice,
messages=[{"role": "user", "content": email_body}]
)
Content classification
class Classification(BaseModel):
category: Literal["bug", "feature", "question", "docs"]
priority: Literal["low", "medium", "high", "critical"]
summary: str
result = client.chat.completions.create(
model="gpt-4o-mini",
response_model=Classification,
messages=[{"role": "user", "content": github_issue_text}]
)
Instructor vs Alternatives
| Feature | Instructor | LangChain | Vercel AI SDK |
|---|---|---|---|
| Focus | Structured output | General LLM | UI streaming |
| Approach | Patch existing client | New abstraction | New abstraction |
| Validation | Pydantic/Zod | Limited | Zod |
| Retries | Auto with errors | Manual | Manual |
| Bundle size | Tiny (~2KB) | Large | Medium |
| Learning curve | 5 minutes | Hours | 30 minutes |
The Verdict
Instructor is the simplest way to get structured, validated data from LLMs. It patches your existing OpenAI client — no new framework to learn. If you need LLMs to return data, not text, Instructor is the tool.
Need help building AI-powered data pipelines? I build custom solutions. Reach out: spinov001@gmail.com
Check out my awesome-web-scraping collection — 400+ tools for extracting web data.
Top comments (0)