Instructor Has a Free API: Get Structured Data From LLMs Every Single Time

#programming #ai #python #api

LLMs return text. Your code needs JSON. Instructor bridges the gap with validated, typed outputs every time.

What Is Instructor?

Instructor patches OpenAI/Anthropic/Google clients to return Pydantic models instead of raw text. Define a schema, get structured data. No parsing. No regex. No "please respond in JSON."

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[{"role": "user", "content": "John is 30 years old, email john@example.com"}]
)

print(user)  # User(name='John', age=30, email='john@example.com')
print(user.name)  # 'John' — fully typed, IDE autocomplete works

Validation + Retry

from pydantic import BaseModel, field_validator

class UserProfile(BaseModel):
    name: str
    age: int

    @field_validator('age')
    @classmethod
    def age_must_be_valid(cls, v):
        if v < 0 or v > 150:
            raise ValueError('Age must be between 0 and 150')
        return v

# If the LLM returns age=200, Instructor retries automatically
user = client.chat.completions.create(
    model="gpt-4o",
    response_model=UserProfile,
    max_retries=3,
    messages=[{"role": "user", "content": "Extract: Alice is two hundred years old"}]
)

Complex Extraction

from typing import List, Optional

class Address(BaseModel):
    street: str
    city: str
    country: str
    zip_code: Optional[str] = None

class Company(BaseModel):
    name: str
    industry: str
    employees: int
    headquarters: Address
    key_products: List[str]

company = client.chat.completions.create(
    model="gpt-4o",
    response_model=Company,
    messages=[{"role": "user", "content": "Apple Inc is a tech company in Cupertino, CA 95014 with 164,000 employees making iPhone, Mac, iPad, and Apple Watch"}]
)

Works With Every LLM

# OpenAI
client = instructor.from_openai(OpenAI())

# Anthropic
client = instructor.from_anthropic(Anthropic())

# Google
client = instructor.from_gemini(genai.GenerativeModel("gemini-2.0-flash"))

# Ollama (local models)
client = instructor.from_openai(OpenAI(base_url="http://localhost:11434/v1"))

Why Instructor

Type safety — Pydantic models, not dicts
Validation — automatic retry on invalid output
Any LLM — OpenAI, Anthropic, Google, local models
Streaming — partial objects as they generate
Simple — 3 lines to add to existing OpenAI code

pip install instructor

Building AI data pipelines? Check out my extraction tools or email spinov001@gmail.com.

DEV Community