LLMs return text. Your code needs JSON. Instructor bridges the gap with validated, typed outputs every time.
What Is Instructor?
Instructor patches OpenAI/Anthropic/Google clients to return Pydantic models instead of raw text. Define a schema, get structured data. No parsing. No regex. No "please respond in JSON."
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class User(BaseModel):
name: str
age: int
email: str
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[{"role": "user", "content": "John is 30 years old, email john@example.com"}]
)
print(user) # User(name='John', age=30, email='john@example.com')
print(user.name) # 'John' — fully typed, IDE autocomplete works
Validation + Retry
from pydantic import BaseModel, field_validator
class UserProfile(BaseModel):
name: str
age: int
@field_validator('age')
@classmethod
def age_must_be_valid(cls, v):
if v < 0 or v > 150:
raise ValueError('Age must be between 0 and 150')
return v
# If the LLM returns age=200, Instructor retries automatically
user = client.chat.completions.create(
model="gpt-4o",
response_model=UserProfile,
max_retries=3,
messages=[{"role": "user", "content": "Extract: Alice is two hundred years old"}]
)
Complex Extraction
from typing import List, Optional
class Address(BaseModel):
street: str
city: str
country: str
zip_code: Optional[str] = None
class Company(BaseModel):
name: str
industry: str
employees: int
headquarters: Address
key_products: List[str]
company = client.chat.completions.create(
model="gpt-4o",
response_model=Company,
messages=[{"role": "user", "content": "Apple Inc is a tech company in Cupertino, CA 95014 with 164,000 employees making iPhone, Mac, iPad, and Apple Watch"}]
)
Works With Every LLM
# OpenAI
client = instructor.from_openai(OpenAI())
# Anthropic
client = instructor.from_anthropic(Anthropic())
# Google
client = instructor.from_gemini(genai.GenerativeModel("gemini-2.0-flash"))
# Ollama (local models)
client = instructor.from_openai(OpenAI(base_url="http://localhost:11434/v1"))
Why Instructor
- Type safety — Pydantic models, not dicts
- Validation — automatic retry on invalid output
- Any LLM — OpenAI, Anthropic, Google, local models
- Streaming — partial objects as they generate
- Simple — 3 lines to add to existing OpenAI code
pip install instructor
Building AI data pipelines? Check out my extraction tools or email spinov001@gmail.com.
Top comments (0)