DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Data Validation Toolkit: Data Validation Guide

Data Validation Guide

Why Validate Data?

Invalid data causes crashes, corrupted databases, and security vulnerabilities.
A structured validation layer catches problems at the boundary — before bad data
propagates through your system.

Validation Layers

External Input
     │
     ▼
┌─────────────┐
│ Type Check   │  Are fields the right Python types?
├─────────────┤
│ Schema Check │  Do values satisfy constraints (length, range, pattern)?
├─────────────┤
│ Business     │  Do cross-field rules hold (dates, dependencies)?
│ Rules        │
├─────────────┤
│ File Check   │  Are uploaded files valid (size, format, extension)?
└─────────────┘
     │
     ▼
  Application
Enter fullscreen mode Exit fullscreen mode

Building a Pipeline

Combine validators for defence in depth:

from src.pipeline import ValidationPipeline
from src.validators.type_validator import TypeValidator
from src.validators.schema_validator import SchemaValidator
from src.validators.business_rules import BusinessRuleValidator

pipeline = ValidationPipeline(mode="collect_all")
pipeline.add(TypeValidator({"name": str, "age": int}))
pipeline.add(SchemaValidator.from_yaml("configs/schemas/user_schema.yaml"))
pipeline.add(BusinessRuleValidator.from_yaml("configs/rules/business_rules.yaml"))

errors = pipeline.run(incoming_data)
if errors:
    # Return 422 with error details
    ...
Enter fullscreen mode Exit fullscreen mode

Pipeline Modes

Mode Behaviour
collect_all Run every validator, return all errors
short_circuit Stop at the first failing validator

Use collect_all for API responses (show all problems at once).
Use short_circuit for internal pipelines (fail fast, save work).

Writing Custom Validators

Inherit from Validator and implement validate:

from src.validators.base import ValidationError, Validator

class NotEmptyValidator(Validator):
    def validate(self, data: dict) -> list[ValidationError]:
        errors = []
        for key, value in data.items():
            if isinstance(value, str) and not value.strip():
                errors.append(
                    ValidationError(field=key, message="Must not be empty", code="empty")
                )
        return errors
Enter fullscreen mode Exit fullscreen mode

YAML Schema Format

fields:
  field_name:
    type: str | int | float | bool | list | dict
    required: true | false
    min: 0              # numeric minimum
    max: 100            # numeric maximum
    min_length: 1       # string/list minimum length
    max_length: 255     # string/list maximum length
    pattern: "^regex$"  # regex pattern for strings
    choices:            # allowed values
      - option_a
      - option_b
Enter fullscreen mode Exit fullscreen mode

Error Reporting

Three built-in reporters:

Reporter Best For
TextReporter CLI output, logs
JsonReporter API responses
SummaryReporter User-facing reports

Best Practices

  1. Validate at the boundary — API endpoints, file uploads, CLI inputs.
  2. Use schemas for structure — YAML schemas keep validation rules out of code.
  3. Collect all errors — Users prefer fixing everything in one pass.
  4. Include field paths — Dot-notation paths help locate the problem.
  5. Separate concerns — Type checks, schema, and business rules are distinct layers.
  6. Test your rules — Validation logic is business logic; it deserves tests.

This is 1 of 14 resources in the Python Developer Pro toolkit. Get the complete [Data Validation Toolkit] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire Python Developer Pro bundle (14 products) for $159 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)