The Problem with LLM Outputs
Working with LLMs in production applications, I kept running into the same frustrating cycle:
# Ask LLM for structured data
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Summarize this article..."}]
)
# Hope it returns valid JSON
text = response.choices[0].message.content
# Manually parse and validate
try:
data = json.loads(text)
summary = data["summary"] # What if this key doesn't exist?
points = data["key_points"] # What if this isn't a list?
except (json.JSONDecodeError, KeyError):
# Now what? Retry? Log? Fail silently?
pass
This approach has several problems:
- No type safety: Your IDE can't help you, and runtime errors are common
- Manual error handling: You write the same try-catch blocks everywhere
- Inconsistent retry logic: Each developer implements retries differently
- Poor error messages: "KeyError: 'summary'" doesn't tell you much
After writing this boilerplate code repeatedly across multiple projects, I realized many developers face the same issue. So I built pydantic-llm-io to solve it.
What I Wanted
A library that would:
- ✅ Let me define schemas with Pydantic (type-safe, validated)
- ✅ Automatically handle JSON parsing and validation
- ✅ Retry intelligently when validation fails
- ✅ Give me clear error messages with context
- ✅ Work with any LLM provider (OpenAI, Anthropic, custom)
How It Works
1. Define Your Schemas
from pydantic import BaseModel, Field
class SummaryInput(BaseModel):
"""Input schema for summarization."""
text: str = Field(..., description="Text to summarize")
max_words: int = Field(100, description="Maximum summary length")
class SummaryOutput(BaseModel):
"""Output schema - guaranteed structure."""
summary: str
key_points: list[str]
language: str
2. Make a Validated Call
from pydantic_llm_io import call_llm_validated, OpenAIChatClient
client = OpenAIChatClient(api_key="sk-...")
result = call_llm_validated(
prompt_model=SummaryInput(text="Long article...", max_words=50),
response_model=SummaryOutput,
client=client,
)
# Result is fully typed and validated
print(result.summary) # IDE knows this is a string
print(result.key_points) # IDE knows this is a list[str]
That's it. The library handles everything:
- ✅ Serializing input to JSON
- ✅ Constructing prompts with schema injection
- ✅ Calling the LLM
- ✅ Parsing the response
- ✅ Validating against your schema
- ✅ Retrying with corrections if validation fails
The Retry Mechanism
The most interesting feature is automatic retry with LLM self-correction.
When validation fails, the library:
- Catches the error (JSON parse error or Pydantic validation error)
- Waits (exponential backoff: 1s, 2s, 4s...)
- Asks the LLM to fix it by sending the error details back
- Retries validation
- Repeats until success or max retries
from pydantic_llm_io import LLMCallConfig, RetryConfig
config = LLMCallConfig(
retry=RetryConfig(
max_retries=3,
initial_delay_seconds=1.0,
backoff_multiplier=2.0,
)
)
result = call_llm_validated(
prompt_model=input_model,
response_model=OutputModel,
client=client,
config=config,
)
This leverages the LLM's ability to self-correct, significantly improving success rates.
Provider Independence
The library uses an abstract ChatClient interface, making it easy to switch providers:
from pydantic_llm_io import ChatClient
class CustomClient(ChatClient):
def send_message(self, system: str, user: str, temperature: float = 0.7) -> str:
# Your provider logic here
pass
async def send_message_async(self, system: str, user: str, temperature: float = 0.7) -> str:
# Async version
pass
def get_provider_name(self) -> str:
return "custom"
# Use it
client = CustomClient(api_key="...")
result = call_llm_validated(..., client=client)
Currently supports:
- OpenAI (built-in)
- Anthropic (coming soon)
- Any custom provider (via interface)
Better Error Handling
The library provides specific exceptions with rich context:
from pydantic_llm_io import RetryExhaustedError, LLMValidationError
try:
result = call_llm_validated(...)
except RetryExhaustedError as e:
print(f"Failed after {e.context['attempts']} attempts")
print(f"Last error: {e.context['last_error']}")
except LLMValidationError as e:
print(f"Validation errors: {e.context['validation_errors']}")
No more cryptic KeyError messages. You know exactly what went wrong and on which attempt.
Async Support
Full async/await support for concurrent LLM calls:
from pydantic_llm_io import call_llm_validated_async
import asyncio
async def main():
# Run multiple validations concurrently
tasks = [
call_llm_validated_async(input1, Output, client),
call_llm_validated_async(input2, Output, client),
call_llm_validated_async(input3, Output, client),
]
results = await asyncio.gather(*tasks)
return results
asyncio.run(main())
Testing Made Easy
Use FakeChatClient for testing without API calls:
import json
from pydantic_llm_io import FakeChatClient, call_llm_validated
# Mock response
response = json.dumps({
"summary": "Test summary",
"key_points": ["point1", "point2"],
"language": "English"
})
client = FakeChatClient(response)
# Test exactly like production
result = call_llm_validated(
prompt_model=input_model,
response_model=OutputModel,
client=client,
)
# Verify
assert client.call_count == 1
assert "schema" in client.last_system
No need to mock complex API clients. The fake client behaves identically.
Installation
# Basic installation
pip install pydantic-llm-io
# With OpenAI support
pip install pydantic-llm-io[openai]
Requirements: Python 3.10+, Pydantic 2.0+
Real-World Example
Here's a complete example of using it for code review:
from pydantic import BaseModel, Field
from pydantic_llm_io import call_llm_validated, OpenAIChatClient
class CodeReviewInput(BaseModel):
code: str = Field(..., description="Code to review")
language: str = Field(..., description="Programming language")
class CodeReviewOutput(BaseModel):
issues: list[str] = Field(description="List of issues found")
suggestions: list[str] = Field(description="Improvement suggestions")
severity: str = Field(description="Overall severity: low/medium/high")
client = OpenAIChatClient(api_key="sk-...")
result = call_llm_validated(
prompt_model=CodeReviewInput(
code="def foo():\n x=1\n return x",
language="python"
),
response_model=CodeReviewOutput,
client=client,
)
print(f"Found {len(result.issues)} issues")
print(f"Severity: {result.severity}")
for issue in result.issues:
print(f" - {issue}")
The output is guaranteed to match your schema. No surprises.
What I Learned
Building this library taught me:
- The importance of provider abstraction (switching from OpenAI to Anthropic should be trivial)
- How valuable LLM self-correction is for validation failures
- Type safety isn't just nice-to-have—it's essential for production LLM apps
- Good error messages save hours of debugging
Try It Out
The library is open source and available on PyPI:
- GitHub: https://github.com/yuuichieguchi/pydantic-llm-io
-
PyPI:
pip install pydantic-llm-io -
Examples: Check the
examples/directory in the repo
I'd love feedback on:
- API design
- Missing features
- Provider implementations you'd like to see
If you've struggled with LLM output validation, give it a try and let me know what you think!
Top comments (0)