Instructor Has a Free API — Structured Output From Any LLM

#ai #llm #python #opensource

Instructor: Get JSON From LLMs That Actually Validates

Instructor patches OpenAI, Anthropic, and other LLM clients to return Pydantic models instead of raw text. Structured, validated, typed output — every time.

The Problem

LLMs return text. You need JSON. You parse it, it breaks. You add "return valid JSON" to the prompt, it still breaks. Instructor fixes this.

The Free API

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int
    email: str

# Guaranteed to return a valid User object
user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: John is 30, john@example.com"}]
)
print(user.name)   # "John"
print(user.age)    # 30
print(user.email)  # "john@example.com"

Complex Extraction

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str
    industry: str
    founded: int
    headquarters: Address
    key_products: list[str]

company = client.chat.completions.create(
    model="gpt-4o",
    response_model=Company,
    messages=[{"role": "user", "content": "Tell me about Apple Inc"}]
)
# Fully typed, validated Company object

With Ollama (Local)

import instructor
from openai import OpenAI

client = instructor.from_openai(
    OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
)
user = client.chat.completions.create(
    model="llama3",
    response_model=User,
    messages=[{"role": "user", "content": "John, 25, john@test.com"}]
)

Streaming Lists

class Article(BaseModel):
    title: str
    summary: str
    tags: list[str]

# Stream partial results
for partial in client.chat.completions.create_partial(
    model="gpt-4o",
    response_model=list[Article],
    messages=[{"role": "user", "content": "List 5 AI articles"}]
):
    print(partial)  # Partial list as it generates

Real-World Use Case

A data pipeline extracted info from 10K documents using GPT-4. Raw text output broke JSON parsing 15% of the time. Instructor: 100% valid Pydantic models, automatic retries on validation failure. Pipeline reliability went from 85% to 99.9%.

Quick Start

pip install instructor

Resources

Need structured data from AI? Check out my tools on Apify or email spinov001@gmail.com for custom AI pipelines.

DEV Community