DEV Community

Alex Spinov
Alex Spinov

Posted on

Instructor Has a Free Library That Forces LLMs to Return Structured Data

LLMs return strings. Your application needs objects. Instructor patches OpenAI's client to return validated, typed data structures using Pydantic models. No more parsing JSON from markdown blocks.

What Instructor Gives You for Free

  • Structured extraction — LLM outputs validated against Pydantic/Zod schemas
  • Automatic retries — if validation fails, it re-prompts with the error
  • Streaming — partial objects stream as they're generated
  • Multiple providers — OpenAI, Anthropic, Google, Mistral, Ollama, LiteLLM
  • Python & TypeScript — first-class support for both
  • Lightweight — patches existing clients, not a framework

Quick Start (Python)

pip install instructor openai
Enter fullscreen mode Exit fullscreen mode
import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: John is 30, email john@test.com"}]
)

print(user.name)   # "John" (str, not Any)
print(user.age)    # 30 (int, validated)
print(user.email)  # "john@test.com" (str, validated)
Enter fullscreen mode Exit fullscreen mode

Complex Extraction With Validation

from pydantic import BaseModel, Field, field_validator
from typing import List

class Address(BaseModel):
    street: str
    city: str
    state: str = Field(pattern=r'^[A-Z]{2}$')  # Must be 2-letter state code
    zip_code: str = Field(pattern=r'^\d{5}$')

class Contact(BaseModel):
    name: str
    phone: str
    email: str
    addresses: List[Address]

    @field_validator('email')
    @classmethod
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email')
        return v

# If the LLM returns invalid data, Instructor retries with the validation error
contact = client.chat.completions.create(
    model="gpt-4o",
    response_model=Contact,
    max_retries=3,  # Retries with validation errors in prompt
    messages=[{"role": "user", "content": long_email_text}]
)
Enter fullscreen mode Exit fullscreen mode

Streaming Partial Objects

from instructor import Partial

class Article(BaseModel):
    title: str
    summary: str
    tags: List[str]
    key_points: List[str]

# Stream partial results as they arrive
for partial_article in client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=Article,
    messages=[{"role": "user", "content": f"Analyze: {text}"}]
):
    print(partial_article.title)  # Available as soon as generated
    print(partial_article.tags)   # Grows as more tags are generated
Enter fullscreen mode Exit fullscreen mode

TypeScript Version

import Instructor from '@instructor-ai/instructor';
import OpenAI from 'openai';
import { z } from 'zod';

const client = Instructor({ client: new OpenAI() });

const UserSchema = z.object({
  name: z.string(),
  age: z.number(),
  email: z.string().email()
});

const user = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  response_model: { schema: UserSchema, name: 'User' },
  messages: [{ role: 'user', content: 'Extract: John is 30, john@test.com' }]
});

console.log(user.name);  // Fully typed!
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

Data extraction from emails

class Invoice(BaseModel):
    vendor: str
    amount: float
    due_date: str
    line_items: List[LineItem]

invoice = client.chat.completions.create(
    model="gpt-4o",
    response_model=Invoice,
    messages=[{"role": "user", "content": email_body}]
)
Enter fullscreen mode Exit fullscreen mode

Content classification

class Classification(BaseModel):
    category: Literal["bug", "feature", "question", "docs"]
    priority: Literal["low", "medium", "high", "critical"]
    summary: str

result = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Classification,
    messages=[{"role": "user", "content": github_issue_text}]
)
Enter fullscreen mode Exit fullscreen mode

Instructor vs Alternatives

Feature Instructor LangChain Vercel AI SDK
Focus Structured output General LLM UI streaming
Approach Patch existing client New abstraction New abstraction
Validation Pydantic/Zod Limited Zod
Retries Auto with errors Manual Manual
Bundle size Tiny (~2KB) Large Medium
Learning curve 5 minutes Hours 30 minutes

The Verdict

Instructor is the simplest way to get structured, validated data from LLMs. It patches your existing OpenAI client — no new framework to learn. If you need LLMs to return data, not text, Instructor is the tool.


Need help building AI-powered data pipelines? I build custom solutions. Reach out: spinov001@gmail.com

Check out my awesome-web-scraping collection — 400+ tools for extracting web data.

Top comments (0)