When working with Personally Identifiable Information (PII), you often need to anonymize sensitive information before processing or storing it. Let's explore two different approaches to implement this using Pydantic models.
The Problem
You have a User
model and need to create an anonymized version for privacy compliance:
from pydantic import BaseModel, EmailStr
class User(BaseModel):
id: int
name: str
email: str
Solution 1: Factory Method Approach
Create a classmethod
that handles anonymization during object creation:
class AnonymizedPerson(BaseModel):
id: int
name: str
email: EmailStr
@classmethod
def from_user(cls, user: User) -> "AnonymizedPerson":
# Anonymize email by replacing username with "anonymized"
# You might want to use some hashing, so analytics on
# the data could still be supported.
email_parts = user.email.split("@")
anonymized_email = f"anonymized@{email_parts[1]}"
return cls(
id=user.id,
name=f"anonymized_{user.name}",
email=anonymized_email
)
How to use:
user = User(id=1, name="Alice Brown", email="alice@company.com")
anonymized = AnonymizedPerson.from_user(user)
assert anonymized.name == "anonymized_Alice Brown"
assert anonymized.email == "anonymized@company.com"
Solution 2: Field Validator Approach
Use Pydantic's @field_validator
decorator with mode='after'
to transform data automatically:
from pydantic import BaseModel, EmailStr, field_validator
class AnonymizedPersonAuto(BaseModel):
id: int
name: str
email: EmailStr
@field_validator("name", mode="after")
@classmethod
def anonymize_name(cls, v: str) -> str:
"""Automatically anonymize the name field after validation"""
if not v.startswith("anonymized_"):
return f"anonymized_{v}"
return v
@field_validator("email", mode="after")
@classmethod
def anonymize_email(cls, v: str) -> str:
"""Automatically anonymize the email field after validation"""
if not v.startswith("anonymized@"):
email_parts = v.split("@")
return f"anonymized@{email_parts[1]}"
return v
@classmethod
def from_user(cls, user: User) -> "AnonymizedPersonAuto":
"""Create from User - name and email will be auto-anonymized"""
return cls(id=user.id, name=user.name, email=user.email)
How to use:
user = User(id=1, name="Bob Wilson", email="bob@tech.org")
# All these methods automatically anonymize both fields:
from_factory = AnonymizedPersonAuto.from_user(user)
direct = AnonymizedPersonAuto(id=2, name="Carol", email="carol@startup.io")
from_json = AnonymizedPersonAuto.model_validate({"id": 3, "name": "Dave", "email": "dave@corp.net"})
# All result in anonymized data:
assert from_factory.name == "anonymized_Bob Wilson"
assert direct.email == "anonymized@startup.io"
Summary: When to Use Each Approach
Factory Method Approach
Pros:
- Simple and explicit
- Clear control over transformation logic
- Easy to understand and debug
- Suitable for complex multi-field transformations
Cons:
- Only works when using the factory method
- Other creation paths bypass anonymization
- Manual process that can be forgotten
Best for: Simple use cases where you control all object creation paths
Field Validator Approach
Pros:
- Guaranteed consistency across all creation methods
- Self-contained transformation logic
- Works with direct instantiation, JSON parsing, and factory methods
- Prevents accidental non-anonymized instances
Cons:
- Slightly more complex setup
- Field-level transformations only
- Less explicit about when transformation occurs
Best for: Production systems where data consistency and reliability are critical
Bonus: Making Factory Method Approach Bulletproof
If you want to use the factory method approach but prevent direct instantiation bypass, you can implement a private constructor pattern:
class AnonymizedPersonSecure(BaseModel):
id: int
name: str
email: EmailStr
def __init__(self, **data):
# Private constructor - should only be called by factory methods
if not hasattr(self, '_from_factory'):
raise ValueError("Use AnonymizedPersonSecure.from_user() instead of direct instantiation")
super().__init__(**data)
@classmethod
def from_user(cls, user: User) -> "AnonymizedPersonSecure":
# Anonymize email by replacing username with "anonymized"
email_parts = user.email.split("@")
anonymized_email = f"anonymized@{email_parts[1]}"
# Create instance through private constructor
instance = cls.__new__(cls)
instance._from_factory = True
instance.__init__(
id=user.id,
name=f"anonymized_{user.name}",
email=anonymized_email
)
return instance
How it works:
user = User(id=1, name="Alice Brown", email="alice@company.com")
# This works - using factory method
anonymized = AnonymizedPersonSecure.from_user(user)
assert anonymized.name == "anonymized_Alice Brown"
# This fails - direct instantiation blocked
try:
direct = AnonymizedPersonSecure(id=2, name="Bob", email="bob@test.com")
except ValueError as e:
assert str(e) == "Use AnonymizedPersonSecure.from_user() instead of direct instantiation")
This gives you the control of the factory method approach with the safety of preventing bypass routes.
Key Takeaway
Choose the factory method for simple, controlled scenarios. Choose field validators when you need bulletproof data transformation that works everywhere your model might be instantiated. Both patterns have their place in modern Python applications.
Have you used Pydantic field validators for data transformation? Share your use cases in the comments!
Top comments (0)