DEV Community

Dror Atariah
Dror Atariah

Posted on

Two Approaches to Data Anonymization with Pydantic

When working with Personally Identifiable Information (PII), you often need to anonymize sensitive information before processing or storing it. Let's explore two different approaches to implement this using Pydantic models.

The Problem

You have a User model and need to create an anonymized version for privacy compliance:

from pydantic import BaseModel, EmailStr

class User(BaseModel):
    id: int
    name: str
    email: str
Enter fullscreen mode Exit fullscreen mode

Solution 1: Factory Method Approach

Create a classmethod that handles anonymization during object creation:

class AnonymizedPerson(BaseModel):
    id: int
    name: str
    email: EmailStr

    @classmethod
    def from_user(cls, user: User) -> "AnonymizedPerson":
        # Anonymize email by replacing username with "anonymized"
        # You might want to use some hashing, so analytics on
        # the data could still be supported.
        email_parts = user.email.split("@")
        anonymized_email = f"anonymized@{email_parts[1]}"

        return cls(
            id=user.id,
            name=f"anonymized_{user.name}",
            email=anonymized_email
        )
Enter fullscreen mode Exit fullscreen mode

How to use:

user = User(id=1, name="Alice Brown", email="alice@company.com")
anonymized = AnonymizedPerson.from_user(user)
assert anonymized.name == "anonymized_Alice Brown"
assert anonymized.email == "anonymized@company.com"
Enter fullscreen mode Exit fullscreen mode

Solution 2: Field Validator Approach

Use Pydantic's @field_validator decorator with mode='after' to transform data automatically:

from pydantic import BaseModel, EmailStr, field_validator

class AnonymizedPersonAuto(BaseModel):
    id: int
    name: str
    email: EmailStr

    @field_validator("name", mode="after")
    @classmethod
    def anonymize_name(cls, v: str) -> str:
        """Automatically anonymize the name field after validation"""
        if not v.startswith("anonymized_"):
            return f"anonymized_{v}"
        return v

    @field_validator("email", mode="after")
    @classmethod
    def anonymize_email(cls, v: str) -> str:
        """Automatically anonymize the email field after validation"""
        if not v.startswith("anonymized@"):
            email_parts = v.split("@")
            return f"anonymized@{email_parts[1]}"
        return v

    @classmethod
    def from_user(cls, user: User) -> "AnonymizedPersonAuto":
        """Create from User - name and email will be auto-anonymized"""
        return cls(id=user.id, name=user.name, email=user.email)
Enter fullscreen mode Exit fullscreen mode

How to use:

user = User(id=1, name="Bob Wilson", email="bob@tech.org")

# All these methods automatically anonymize both fields:
from_factory = AnonymizedPersonAuto.from_user(user)
direct = AnonymizedPersonAuto(id=2, name="Carol", email="carol@startup.io")
from_json = AnonymizedPersonAuto.model_validate({"id": 3, "name": "Dave", "email": "dave@corp.net"})

# All result in anonymized data:
assert from_factory.name == "anonymized_Bob Wilson"
assert direct.email == "anonymized@startup.io"
Enter fullscreen mode Exit fullscreen mode

Summary: When to Use Each Approach

Factory Method Approach

Pros:

  • Simple and explicit
  • Clear control over transformation logic
  • Easy to understand and debug
  • Suitable for complex multi-field transformations

Cons:

  • Only works when using the factory method
  • Other creation paths bypass anonymization
  • Manual process that can be forgotten

Best for: Simple use cases where you control all object creation paths

Field Validator Approach

Pros:

  • Guaranteed consistency across all creation methods
  • Self-contained transformation logic
  • Works with direct instantiation, JSON parsing, and factory methods
  • Prevents accidental non-anonymized instances

Cons:

  • Slightly more complex setup
  • Field-level transformations only
  • Less explicit about when transformation occurs

Best for: Production systems where data consistency and reliability are critical

Bonus: Making Factory Method Approach Bulletproof

If you want to use the factory method approach but prevent direct instantiation bypass, you can implement a private constructor pattern:

class AnonymizedPersonSecure(BaseModel):
    id: int
    name: str
    email: EmailStr

    def __init__(self, **data):
        # Private constructor - should only be called by factory methods
        if not hasattr(self, '_from_factory'):
            raise ValueError("Use AnonymizedPersonSecure.from_user() instead of direct instantiation")
        super().__init__(**data)

    @classmethod
    def from_user(cls, user: User) -> "AnonymizedPersonSecure":
        # Anonymize email by replacing username with "anonymized"
        email_parts = user.email.split("@")
        anonymized_email = f"anonymized@{email_parts[1]}"

        # Create instance through private constructor
        instance = cls.__new__(cls)
        instance._from_factory = True
        instance.__init__(
            id=user.id,
            name=f"anonymized_{user.name}",
            email=anonymized_email
        )
        return instance
Enter fullscreen mode Exit fullscreen mode

How it works:

user = User(id=1, name="Alice Brown", email="alice@company.com")

# This works - using factory method
anonymized = AnonymizedPersonSecure.from_user(user)
assert anonymized.name == "anonymized_Alice Brown"

# This fails - direct instantiation blocked
try:
    direct = AnonymizedPersonSecure(id=2, name="Bob", email="bob@test.com")
except ValueError as e:
    assert str(e) == "Use AnonymizedPersonSecure.from_user() instead of direct instantiation")
Enter fullscreen mode Exit fullscreen mode

This gives you the control of the factory method approach with the safety of preventing bypass routes.

Key Takeaway

Choose the factory method for simple, controlled scenarios. Choose field validators when you need bulletproof data transformation that works everywhere your model might be instantiated. Both patterns have their place in modern Python applications.


Have you used Pydantic field validators for data transformation? Share your use cases in the comments!

Top comments (0)