Alex Spinov

Posted on Mar 29

Instructor Has a Free Library That Forces LLMs to Return Structured Data

#ai #python #typescript #llm

LLMs return strings. Your application needs objects. Instructor patches OpenAI's client to return validated, typed data structures using Pydantic models. No more parsing JSON from markdown blocks.

What Instructor Gives You for Free

Structured extraction — LLM outputs validated against Pydantic/Zod schemas
Automatic retries — if validation fails, it re-prompts with the error
Streaming — partial objects stream as they're generated
Multiple providers — OpenAI, Anthropic, Google, Mistral, Ollama, LiteLLM
Python & TypeScript — first-class support for both
Lightweight — patches existing clients, not a framework

Quick Start (Python)

pip install instructor openai

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: John is 30, email john@test.com"}]
)

print(user.name)   # "John" (str, not Any)
print(user.age)    # 30 (int, validated)
print(user.email)  # "john@test.com" (str, validated)

Complex Extraction With Validation

from pydantic import BaseModel, Field, field_validator
from typing import List

class Address(BaseModel):
    street: str
    city: str
    state: str = Field(pattern=r'^[A-Z]{2}$')  # Must be 2-letter state code
    zip_code: str = Field(pattern=r'^\d{5}$')

class Contact(BaseModel):
    name: str
    phone: str
    email: str
    addresses: List[Address]

    @field_validator('email')
    @classmethod
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email')
        return v

# If the LLM returns invalid data, Instructor retries with the validation error
contact = client.chat.completions.create(
    model="gpt-4o",
    response_model=Contact,
    max_retries=3,  # Retries with validation errors in prompt
    messages=[{"role": "user", "content": long_email_text}]
)

Streaming Partial Objects

from instructor import Partial

class Article(BaseModel):
    title: str
    summary: str
    tags: List[str]
    key_points: List[str]

# Stream partial results as they arrive
for partial_article in client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=Article,
    messages=[{"role": "user", "content": f"Analyze: {text}"}]
):
    print(partial_article.title)  # Available as soon as generated
    print(partial_article.tags)   # Grows as more tags are generated

TypeScript Version

import Instructor from '@instructor-ai/instructor';
import OpenAI from 'openai';
import { z } from 'zod';

const client = Instructor({ client: new OpenAI() });

const UserSchema = z.object({
  name: z.string(),
  age: z.number(),
  email: z.string().email()
});

const user = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  response_model: { schema: UserSchema, name: 'User' },
  messages: [{ role: 'user', content: 'Extract: John is 30, john@test.com' }]
});

console.log(user.name);  // Fully typed!

Real-World Use Cases

Data extraction from emails

class Invoice(BaseModel):
    vendor: str
    amount: float
    due_date: str
    line_items: List[LineItem]

invoice = client.chat.completions.create(
    model="gpt-4o",
    response_model=Invoice,
    messages=[{"role": "user", "content": email_body}]
)

Content classification

class Classification(BaseModel):
    category: Literal["bug", "feature", "question", "docs"]
    priority: Literal["low", "medium", "high", "critical"]
    summary: str

result = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Classification,
    messages=[{"role": "user", "content": github_issue_text}]
)

Instructor vs Alternatives

Feature	Instructor	LangChain	Vercel AI SDK
Focus	Structured output	General LLM	UI streaming
Approach	Patch existing client	New abstraction	New abstraction
Validation	Pydantic/Zod	Limited	Zod
Retries	Auto with errors	Manual	Manual
Bundle size	Tiny (~2KB)	Large	Medium
Learning curve	5 minutes	Hours	30 minutes

The Verdict

Instructor is the simplest way to get structured, validated data from LLMs. It patches your existing OpenAI client — no new framework to learn. If you need LLMs to return data, not text, Instructor is the tool.

Need help building AI-powered data pipelines? I build custom solutions. Reach out: spinov001@gmail.com

Check out my awesome-web-scraping collection — 400+ tools for extracting web data.

DEV Community