DEV Community

Yuuichi Eguchi
Yuuichi Eguchi

Posted on

Type-Safe LLM Outputs: Why I Built pydantic-llm-io

The Problem with LLM Outputs

Working with LLMs in production applications, I kept running into the same frustrating cycle:

# Ask LLM for structured data
response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Summarize this article..."}]
)

# Hope it returns valid JSON
text = response.choices[0].message.content

# Manually parse and validate
try:
    data = json.loads(text)
    summary = data["summary"]  # What if this key doesn't exist?
    points = data["key_points"]  # What if this isn't a list?
except (json.JSONDecodeError, KeyError):
    # Now what? Retry? Log? Fail silently?
    pass
Enter fullscreen mode Exit fullscreen mode

This approach has several problems:

  1. No type safety: Your IDE can't help you, and runtime errors are common
  2. Manual error handling: You write the same try-catch blocks everywhere
  3. Inconsistent retry logic: Each developer implements retries differently
  4. Poor error messages: "KeyError: 'summary'" doesn't tell you much

After writing this boilerplate code repeatedly across multiple projects, I realized many developers face the same issue. So I built pydantic-llm-io to solve it.


What I Wanted

A library that would:

  • ✅ Let me define schemas with Pydantic (type-safe, validated)
  • ✅ Automatically handle JSON parsing and validation
  • ✅ Retry intelligently when validation fails
  • ✅ Give me clear error messages with context
  • ✅ Work with any LLM provider (OpenAI, Anthropic, custom)

How It Works

1. Define Your Schemas

from pydantic import BaseModel, Field

class SummaryInput(BaseModel):
    """Input schema for summarization."""
    text: str = Field(..., description="Text to summarize")
    max_words: int = Field(100, description="Maximum summary length")

class SummaryOutput(BaseModel):
    """Output schema - guaranteed structure."""
    summary: str
    key_points: list[str]
    language: str
Enter fullscreen mode Exit fullscreen mode

2. Make a Validated Call

from pydantic_llm_io import call_llm_validated, OpenAIChatClient

client = OpenAIChatClient(api_key="sk-...")

result = call_llm_validated(
    prompt_model=SummaryInput(text="Long article...", max_words=50),
    response_model=SummaryOutput,
    client=client,
)

# Result is fully typed and validated
print(result.summary)  # IDE knows this is a string
print(result.key_points)  # IDE knows this is a list[str]
Enter fullscreen mode Exit fullscreen mode

That's it. The library handles everything:

  • ✅ Serializing input to JSON
  • ✅ Constructing prompts with schema injection
  • ✅ Calling the LLM
  • ✅ Parsing the response
  • ✅ Validating against your schema
  • ✅ Retrying with corrections if validation fails

The Retry Mechanism

The most interesting feature is automatic retry with LLM self-correction.

When validation fails, the library:

  1. Catches the error (JSON parse error or Pydantic validation error)
  2. Waits (exponential backoff: 1s, 2s, 4s...)
  3. Asks the LLM to fix it by sending the error details back
  4. Retries validation
  5. Repeats until success or max retries
from pydantic_llm_io import LLMCallConfig, RetryConfig

config = LLMCallConfig(
    retry=RetryConfig(
        max_retries=3,
        initial_delay_seconds=1.0,
        backoff_multiplier=2.0,
    )
)

result = call_llm_validated(
    prompt_model=input_model,
    response_model=OutputModel,
    client=client,
    config=config,
)
Enter fullscreen mode Exit fullscreen mode

This leverages the LLM's ability to self-correct, significantly improving success rates.


Provider Independence

The library uses an abstract ChatClient interface, making it easy to switch providers:

from pydantic_llm_io import ChatClient

class CustomClient(ChatClient):
    def send_message(self, system: str, user: str, temperature: float = 0.7) -> str:
        # Your provider logic here
        pass

    async def send_message_async(self, system: str, user: str, temperature: float = 0.7) -> str:
        # Async version
        pass

    def get_provider_name(self) -> str:
        return "custom"

# Use it
client = CustomClient(api_key="...")
result = call_llm_validated(..., client=client)
Enter fullscreen mode Exit fullscreen mode

Currently supports:

  • OpenAI (built-in)
  • Anthropic (coming soon)
  • Any custom provider (via interface)

Better Error Handling

The library provides specific exceptions with rich context:

from pydantic_llm_io import RetryExhaustedError, LLMValidationError

try:
    result = call_llm_validated(...)
except RetryExhaustedError as e:
    print(f"Failed after {e.context['attempts']} attempts")
    print(f"Last error: {e.context['last_error']}")
except LLMValidationError as e:
    print(f"Validation errors: {e.context['validation_errors']}")
Enter fullscreen mode Exit fullscreen mode

No more cryptic KeyError messages. You know exactly what went wrong and on which attempt.


Async Support

Full async/await support for concurrent LLM calls:

from pydantic_llm_io import call_llm_validated_async
import asyncio

async def main():
    # Run multiple validations concurrently
    tasks = [
        call_llm_validated_async(input1, Output, client),
        call_llm_validated_async(input2, Output, client),
        call_llm_validated_async(input3, Output, client),
    ]
    results = await asyncio.gather(*tasks)
    return results

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Testing Made Easy

Use FakeChatClient for testing without API calls:

import json
from pydantic_llm_io import FakeChatClient, call_llm_validated

# Mock response
response = json.dumps({
    "summary": "Test summary",
    "key_points": ["point1", "point2"],
    "language": "English"
})

client = FakeChatClient(response)

# Test exactly like production
result = call_llm_validated(
    prompt_model=input_model,
    response_model=OutputModel,
    client=client,
)

# Verify
assert client.call_count == 1
assert "schema" in client.last_system
Enter fullscreen mode Exit fullscreen mode

No need to mock complex API clients. The fake client behaves identically.


Installation

# Basic installation
pip install pydantic-llm-io

# With OpenAI support
pip install pydantic-llm-io[openai]
Enter fullscreen mode Exit fullscreen mode

Requirements: Python 3.10+, Pydantic 2.0+


Real-World Example

Here's a complete example of using it for code review:

from pydantic import BaseModel, Field
from pydantic_llm_io import call_llm_validated, OpenAIChatClient

class CodeReviewInput(BaseModel):
    code: str = Field(..., description="Code to review")
    language: str = Field(..., description="Programming language")

class CodeReviewOutput(BaseModel):
    issues: list[str] = Field(description="List of issues found")
    suggestions: list[str] = Field(description="Improvement suggestions")
    severity: str = Field(description="Overall severity: low/medium/high")

client = OpenAIChatClient(api_key="sk-...")

result = call_llm_validated(
    prompt_model=CodeReviewInput(
        code="def foo():\n    x=1\n    return x",
        language="python"
    ),
    response_model=CodeReviewOutput,
    client=client,
)

print(f"Found {len(result.issues)} issues")
print(f"Severity: {result.severity}")
for issue in result.issues:
    print(f"  - {issue}")
Enter fullscreen mode Exit fullscreen mode

The output is guaranteed to match your schema. No surprises.


What I Learned

Building this library taught me:

  • The importance of provider abstraction (switching from OpenAI to Anthropic should be trivial)
  • How valuable LLM self-correction is for validation failures
  • Type safety isn't just nice-to-have—it's essential for production LLM apps
  • Good error messages save hours of debugging

Try It Out

The library is open source and available on PyPI:

I'd love feedback on:

  • API design
  • Missing features
  • Provider implementations you'd like to see

If you've struggled with LLM output validation, give it a try and let me know what you think!


Top comments (0)