Data Validation Guide
Why Validate Data?
Invalid data causes crashes, corrupted databases, and security vulnerabilities.
A structured validation layer catches problems at the boundary — before bad data
propagates through your system.
Validation Layers
External Input
│
▼
┌─────────────┐
│ Type Check │ Are fields the right Python types?
├─────────────┤
│ Schema Check │ Do values satisfy constraints (length, range, pattern)?
├─────────────┤
│ Business │ Do cross-field rules hold (dates, dependencies)?
│ Rules │
├─────────────┤
│ File Check │ Are uploaded files valid (size, format, extension)?
└─────────────┘
│
▼
Application
Building a Pipeline
Combine validators for defence in depth:
from src.pipeline import ValidationPipeline
from src.validators.type_validator import TypeValidator
from src.validators.schema_validator import SchemaValidator
from src.validators.business_rules import BusinessRuleValidator
pipeline = ValidationPipeline(mode="collect_all")
pipeline.add(TypeValidator({"name": str, "age": int}))
pipeline.add(SchemaValidator.from_yaml("configs/schemas/user_schema.yaml"))
pipeline.add(BusinessRuleValidator.from_yaml("configs/rules/business_rules.yaml"))
errors = pipeline.run(incoming_data)
if errors:
# Return 422 with error details
...
Pipeline Modes
| Mode | Behaviour |
|---|---|
collect_all |
Run every validator, return all errors |
short_circuit |
Stop at the first failing validator |
Use collect_all for API responses (show all problems at once).
Use short_circuit for internal pipelines (fail fast, save work).
Writing Custom Validators
Inherit from Validator and implement validate:
from src.validators.base import ValidationError, Validator
class NotEmptyValidator(Validator):
def validate(self, data: dict) -> list[ValidationError]:
errors = []
for key, value in data.items():
if isinstance(value, str) and not value.strip():
errors.append(
ValidationError(field=key, message="Must not be empty", code="empty")
)
return errors
YAML Schema Format
fields:
field_name:
type: str | int | float | bool | list | dict
required: true | false
min: 0 # numeric minimum
max: 100 # numeric maximum
min_length: 1 # string/list minimum length
max_length: 255 # string/list maximum length
pattern: "^regex$" # regex pattern for strings
choices: # allowed values
- option_a
- option_b
Error Reporting
Three built-in reporters:
| Reporter | Best For |
|---|---|
TextReporter |
CLI output, logs |
JsonReporter |
API responses |
SummaryReporter |
User-facing reports |
Best Practices
- Validate at the boundary — API endpoints, file uploads, CLI inputs.
- Use schemas for structure — YAML schemas keep validation rules out of code.
- Collect all errors — Users prefer fixing everything in one pass.
- Include field paths — Dot-notation paths help locate the problem.
- Separate concerns — Type checks, schema, and business rules are distinct layers.
- Test your rules — Validation logic is business logic; it deserves tests.
This is 1 of 14 resources in the Python Developer Pro toolkit. Get the complete [Data Validation Toolkit] with all files, templates, and documentation for $29.
Or grab the entire Python Developer Pro bundle (14 products) for $159 — save 30%.
Top comments (0)