DEV Community

Alex Spinov
Alex Spinov

Posted on

Instructor Has a Free API: Get Structured Data From LLMs Every Single Time

LLMs return text. Your code needs JSON. Instructor bridges the gap with validated, typed outputs every time.

What Is Instructor?

Instructor patches OpenAI/Anthropic/Google clients to return Pydantic models instead of raw text. Define a schema, get structured data. No parsing. No regex. No "please respond in JSON."

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[{"role": "user", "content": "John is 30 years old, email john@example.com"}]
)

print(user)  # User(name='John', age=30, email='john@example.com')
print(user.name)  # 'John' — fully typed, IDE autocomplete works
Enter fullscreen mode Exit fullscreen mode

Validation + Retry

from pydantic import BaseModel, field_validator

class UserProfile(BaseModel):
    name: str
    age: int

    @field_validator('age')
    @classmethod
    def age_must_be_valid(cls, v):
        if v < 0 or v > 150:
            raise ValueError('Age must be between 0 and 150')
        return v

# If the LLM returns age=200, Instructor retries automatically
user = client.chat.completions.create(
    model="gpt-4o",
    response_model=UserProfile,
    max_retries=3,
    messages=[{"role": "user", "content": "Extract: Alice is two hundred years old"}]
)
Enter fullscreen mode Exit fullscreen mode

Complex Extraction

from typing import List, Optional

class Address(BaseModel):
    street: str
    city: str
    country: str
    zip_code: Optional[str] = None

class Company(BaseModel):
    name: str
    industry: str
    employees: int
    headquarters: Address
    key_products: List[str]

company = client.chat.completions.create(
    model="gpt-4o",
    response_model=Company,
    messages=[{"role": "user", "content": "Apple Inc is a tech company in Cupertino, CA 95014 with 164,000 employees making iPhone, Mac, iPad, and Apple Watch"}]
)
Enter fullscreen mode Exit fullscreen mode

Works With Every LLM

# OpenAI
client = instructor.from_openai(OpenAI())

# Anthropic
client = instructor.from_anthropic(Anthropic())

# Google
client = instructor.from_gemini(genai.GenerativeModel("gemini-2.0-flash"))

# Ollama (local models)
client = instructor.from_openai(OpenAI(base_url="http://localhost:11434/v1"))
Enter fullscreen mode Exit fullscreen mode

Why Instructor

  • Type safety — Pydantic models, not dicts
  • Validation — automatic retry on invalid output
  • Any LLM — OpenAI, Anthropic, Google, local models
  • Streaming — partial objects as they generate
  • Simple — 3 lines to add to existing OpenAI code
pip install instructor
Enter fullscreen mode Exit fullscreen mode

Building AI data pipelines? Check out my extraction tools or email spinov001@gmail.com.

Top comments (0)