Instructor: Get JSON From LLMs That Actually Validates
Instructor patches OpenAI, Anthropic, and other LLM clients to return Pydantic models instead of raw text. Structured, validated, typed output — every time.
The Problem
LLMs return text. You need JSON. You parse it, it breaks. You add "return valid JSON" to the prompt, it still breaks. Instructor fixes this.
The Free API
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class User(BaseModel):
name: str
age: int
email: str
# Guaranteed to return a valid User object
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[{"role": "user", "content": "Extract: John is 30, john@example.com"}]
)
print(user.name) # "John"
print(user.age) # 30
print(user.email) # "john@example.com"
Complex Extraction
class Address(BaseModel):
street: str
city: str
country: str
class Company(BaseModel):
name: str
industry: str
founded: int
headquarters: Address
key_products: list[str]
company = client.chat.completions.create(
model="gpt-4o",
response_model=Company,
messages=[{"role": "user", "content": "Tell me about Apple Inc"}]
)
# Fully typed, validated Company object
With Ollama (Local)
import instructor
from openai import OpenAI
client = instructor.from_openai(
OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
)
user = client.chat.completions.create(
model="llama3",
response_model=User,
messages=[{"role": "user", "content": "John, 25, john@test.com"}]
)
Streaming Lists
class Article(BaseModel):
title: str
summary: str
tags: list[str]
# Stream partial results
for partial in client.chat.completions.create_partial(
model="gpt-4o",
response_model=list[Article],
messages=[{"role": "user", "content": "List 5 AI articles"}]
):
print(partial) # Partial list as it generates
Real-World Use Case
A data pipeline extracted info from 10K documents using GPT-4. Raw text output broke JSON parsing 15% of the time. Instructor: 100% valid Pydantic models, automatic retries on validation failure. Pipeline reliability went from 85% to 99.9%.
Quick Start
pip install instructor
Resources
Need structured data from AI? Check out my tools on Apify or email spinov001@gmail.com for custom AI pipelines.
Top comments (0)