Timothy had written his hundredth Book
class. Each time, the same tedious pattern: write __init__
, define each attribute, write __repr__
, write __eq__
, write comparison methods. His simple data-holding classes had become exercises in repetitive boilerplate.
class Book:
def __init__(self, title, author, year, pages):
self.title = title
self.author = author
self.year = year
self.pages = pages
def __repr__(self):
return f'Book(title={self.title!r}, author={self.author!r}, year={self.year}, pages={self.pages})'
def __eq__(self, other):
if not isinstance(other, Book):
return NotImplemented
return (self.title == other.title and
self.author == other.author and
self.year == other.year and
self.pages == other.pages)
def __lt__(self, other):
if not isinstance(other, Book):
return NotImplemented
return (self.title, self.author, self.year) < (other.title, other.author, other.year)
# 30 lines of boilerplate for 4 attributes!
Margaret found him copying and pasting __init__
for the fifth time that day. "You're hand-crafting every blueprint," she observed. "Come to the Blueprint Factory—where Python generates the boring parts automatically."
The Dataclass Decorator
Margaret showed Timothy Python's modern shortcut:
from dataclasses import dataclass
@dataclass
class Book:
title: str
author: str
year: int
pages: int
# That's it! Python generates __init__, __repr__, __eq__ automatically
dune = Book("Dune", "Herbert", 1965, 412)
print(dune)
# Book(title='Dune', author='Herbert', year=1965, pages=412)
foundation = Book("Foundation", "Asimov", 1951, 255)
print(dune == foundation) # False - different books
print(dune == Book("Dune", "Herbert", 1965, 412)) # True - same values
"The @dataclass
decorator generates methods automatically," Margaret explained. "Type hints become the attributes. No more writing self.title = title
repeatedly."
What Dataclasses Generate
Timothy learned what the decorator created:
@dataclass
class Book:
title: str
author: str
year: int
pages: int
# Python automatically generates:
# __init__(self, title, author, year, pages)
# __repr__(self)
# __eq__(self, other)
# You get this behavior for free:
dune = Book("Dune", "Herbert", 1965, 412)
# Readable representation
print(repr(dune))
# Book(title='Dune', author='Herbert', year=1965, pages=412)
# Value equality
another_dune = Book("Dune", "Herbert", 1965, 412)
print(dune == another_dune) # True
# But NOT __hash__ - dataclasses are mutable by default
# Can't use as dict keys or in sets without frozen=True
"Dataclasses are optimized for data storage," Margaret noted. "Python generates the most common methods, saving you from writing boilerplate."
Type Hints Are Documentation, Not Enforcement
Margaret clarified an important limitation:
@dataclass
class Book:
title: str
author: str
pages: int
# Python doesn't enforce types at runtime!
book = Book("Dune", "Herbert", "not an int") # No error!
print(book.pages) # "not an int" - type hint ignored
# Type hints are:
# - Documentation for humans
# - Used by type checkers (mypy, pyright)
# - Used by IDEs for autocomplete
# - NOT enforced at runtime
"Type hints document intent," Margaret explained. "They help tools catch errors before runtime, but Python won't stop you from passing wrong types. Use type checkers in your development workflow for safety."
Default Values
Timothy learned to provide defaults:
@dataclass
class Book:
title: str
author: str
year: int = 2024 # Default value
pages: int = 0 # Default value
isbn: str = "" # Default value
# Can omit fields with defaults
recent_book = Book("New Release", "Modern Author")
print(recent_book.year) # 2024
# Or provide all values
classic = Book("Dune", "Herbert", 1965, 412, "978-0441013593")
"Fields with defaults must come after fields without defaults," Margaret cautioned. "Python requires non-default parameters before default parameters."
The field() Function for Advanced Defaults
Margaret showed Timothy how to customize fields:
from dataclasses import dataclass, field
@dataclass
class Book:
title: str
author: str
year: int
pages: int
tags: list = field(default_factory=list) # Mutable default
metadata: dict = field(default_factory=dict)
# Each instance gets its own list/dict
book1 = Book("Dune", "Herbert", 1965, 412)
book2 = Book("Foundation", "Asimov", 1951, 255)
book1.tags.append("scifi")
print(book1.tags) # ["scifi"]
print(book2.tags) # [] - separate list!
"Never use mutable defaults directly," Margaret warned. "Use default_factory
to create a new instance for each object. This avoids the shared mutable default trap."
The Dangerous Mutable Default
Timothy saw what happens without default_factory
:
# WRONG - shared mutable default
@dataclass
class Book:
title: str
tags: list = [] # ERROR! This list is shared!
book1 = Book("Dune")
book2 = Book("Foundation")
book1.tags.append("scifi")
print(book2.tags) # ["scifi"] - OOPS! Shared list!
# RIGHT - use default_factory
@dataclass
class Book:
title: str
tags: list = field(default_factory=list)
book1 = Book("Dune")
book2 = Book("Foundation")
book1.tags.append("scifi")
print(book2.tags) # [] - separate lists!
"This is the same trap from regular classes," Margaret explained. "Always use default_factory
for lists, dicts, sets, or any mutable default."
Frozen Dataclasses: Immutability
Margaret showed Timothy immutable dataclasses:
@dataclass(frozen=True)
class Book:
title: str
author: str
year: int
pages: int
dune = Book("Dune", "Herbert", 1965, 412)
# Can't modify - raises FrozenInstanceError
# dune.pages = 500 # Error!
# But frozen dataclasses are hashable
book_ratings = {
Book("Dune", "Herbert", 1965, 412): 5,
Book("Foundation", "Asimov", 1951, 255): 4
}
# Can use in sets
unique_books = {
Book("Dune", "Herbert", 1965, 412),
Book("Dune", "Herbert", 1965, 412), # Duplicate removed
}
print(len(unique_books)) # 1
"Frozen dataclasses are immutable like tuples," Margaret explained. "They can't be modified after creation, but they gain __hash__
automatically—enabling use as dict keys and in sets."
The Danger of unsafe_hash
Margaret warned Timothy about a treacherous option:
# DANGEROUS - mutable dataclass with hash
@dataclass(unsafe_hash=True)
class Book:
title: str
pages: int # Mutable field!
book = Book("Dune", 412)
books_set = {book} # Add to set using hash
# Mutation breaks the hash invariant!
book.pages = 500
print(book in books_set) # May be False - set can't find it!
# The hash was computed with pages=412
# Now pages=500 but the hash is stale
# The set is corrupted!
"Never use unsafe_hash=True
with mutable dataclasses," Margaret cautioned. "Python calls it 'unsafe' for good reason. If you hash a mutable object and then mutate it, sets and dictionaries break. Only use hashing with frozen=True
, where immutability guarantees the hash stays valid."
Creating Modified Copies with replace()
Timothy learned to create modified copies of frozen dataclasses:
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Book:
title: str
author: str
year: int
pages: int
dune = Book("Dune", "Herbert", 1965, 412)
# Can't modify frozen dataclass
# dune.pages = 500 # FrozenInstanceError!
# But can create modified copy
updated = replace(dune, pages=500)
print(updated)
# Book(title='Dune', author='Herbert', year=1965, pages=500)
print(dune.pages) # 412 - original unchanged
# Can change multiple fields
revised = replace(dune, year=1966, pages=450)
"The replace()
function creates a copy with specified fields changed," Margaret explained. "It's like string methods that return new strings—the original stays unchanged. This is how you 'modify' immutable dataclasses."
Ordering and Comparison
Timothy learned to make dataclasses sortable:
from dataclasses import dataclass
@dataclass(order=True)
class Book:
title: str
author: str
year: int
pages: int
books = [
Book("Foundation", "Asimov", 1951, 255),
Book("Dune", "Herbert", 1965, 412),
Book("1984", "Orwell", 1949, 328),
]
# Now sortable!
sorted_books = sorted(books)
for book in sorted_books:
print(f"{book.title} by {book.author} ({book.year})")
# 1984 by Orwell (1949)
# Dune by Herbert (1965)
# Foundation by Asimov (1951)
"With order=True
, Python generates comparison methods," Margaret noted. "Books compare field-by-field in declaration order: title first, then author, then year, then pages."
Keyword-Only Arguments for Safety
Margaret showed Timothy how to prevent positional argument mistakes:
from dataclasses import dataclass
@dataclass(kw_only=True)
class Book:
title: str
author: str
year: int
pages: int
# Must use keyword arguments
book = Book(title="Dune", author="Herbert", year=1965, pages=412) # OK
# Positional arguments don't work
# book = Book("Dune", "Herbert", 1965, 412) # TypeError!
"The kw_only=True
parameter forces keyword arguments," Margaret explained. "If you later reorder fields or add new ones, calls won't break silently. The argument names document what each value means."
Memory Optimization with slots
Timothy learned about Python 3.10's major optimization:
# Regular dataclass - uses __dict__
@dataclass
class Book:
title: str
author: str
year: int
pages: int
# With slots - 50%+ less memory
@dataclass(slots=True)
class CompactBook:
title: str
author: str
year: int
pages: int
# Benefits of slots=True:
# - Significantly less memory per instance
# - Faster attribute access
# - Prevents adding attributes dynamically
# - Cannot use __dict__-based features
# For thousands of instances, slots saves substantial memory
books = [CompactBook(f"Book{i}", "Author", 2024, 300) for i in range(10000)]
# Uses ~50% less memory than without slots
"For classes with many instances," Margaret advised, "use slots=True
. It trades flexibility for efficiency—you can't add attributes dynamically, but you save memory and gain speed."
Customizing Comparison Order
Timothy discovered he could control which fields mattered for sorting:
from dataclasses import dataclass, field
@dataclass(order=True)
class Book:
sort_index: int = field(init=False, repr=False)
title: str = field(compare=False)
author: str = field(compare=False)
year: int
pages: int = field(compare=False)
def __post_init__(self):
# Sort by year only
self.sort_index = self.year
books = [
Book("Foundation", "Asimov", 1951, 255),
Book("Dune", "Herbert", 1965, 412),
Book("1984", "Orwell", 1949, 328),
]
sorted_books = sorted(books)
for book in sorted_books:
print(f"{book.title} ({book.year})")
# 1984 (1949)
# Foundation (1951)
# Dune (1965)
"The compare=False
parameter excludes fields from comparison," Margaret explained. "The init=False
parameter means the field isn't part of __init__
. The repr=False
parameter excludes it from the string representation."
Post-Init Processing with post_init
Margaret showed Timothy validation and computed fields:
@dataclass
class Book:
title: str
author: str
year: int
pages: int
def __post_init__(self):
# Validation after initialization
if self.pages < 0:
raise ValueError("Pages cannot be negative")
if self.year < 1000:
raise ValueError("Year seems unrealistic")
# Normalize title
self.title = self.title.strip()
# Validation runs automatically
try:
bad_book = Book("Test", "Author", 2024, -100)
except ValueError as e:
print(e) # "Pages cannot be negative"
# Normalization happens automatically
book = Book(" Dune ", "Herbert", 1965, 412)
print(book.title) # "Dune" - whitespace stripped
"The __post_init__
method runs after __init__
completes," Margaret explained. "Use it for validation, normalization, or computing derived fields."
Converting to Dictionaries and Tuples
Margaret showed Timothy how to serialize dataclasses:
from dataclasses import dataclass, asdict, astuple
@dataclass
class Book:
title: str
author: str
year: int
pages: int
dune = Book("Dune", "Herbert", 1965, 412)
# Convert to dictionary
book_dict = asdict(dune)
print(book_dict)
# {'title': 'Dune', 'author': 'Herbert', 'year': 1965, 'pages': 412}
# Convert to tuple (in field order)
book_tuple = astuple(dune)
print(book_tuple)
# ('Dune', 'Herbert', 1965, 412)
# Useful for:
# - JSON serialization: json.dumps(asdict(book))
# - Database inserts: cursor.execute(sql, astuple(book))
# - CSV writing: writer.writerow(astuple(book))
"The asdict()
function creates a dictionary of field names to values," Margaret explained. "astuple()
creates a tuple of values in field order. Both work recursively with nested dataclasses."
Computed Fields with post_init
Timothy learned to create fields based on other fields:
@dataclass
class Book:
title: str
author: str
year: int
pages: int
reading_time_minutes: int = field(init=False)
def __post_init__(self):
# Compute reading time based on pages
self.reading_time_minutes = self.pages * 2
dune = Book("Dune", "Herbert", 1965, 412)
print(dune.reading_time_minutes) # 824 - computed automatically
Inheritance with Dataclasses
Margaret showed Timothy dataclass inheritance:
@dataclass
class Book:
title: str
author: str
year: int
pages: int
@dataclass
class Audiobook(Book):
narrator: str
duration_minutes: int
# Child inherits parent's fields
audiobook = Audiobook(
title="Dune",
author="Herbert",
year=1965,
pages=0,
narrator="Scott Brick",
duration_minutes=1233
)
print(audiobook)
# Audiobook(title='Dune', author='Herbert', year=0, pages=0,
# narrator='Scott Brick', duration_minutes=1233)
"Child dataclasses inherit parent fields," Margaret noted. "Parent fields come first in __init__
, then child fields. All the generated methods work with the combined fields."
Converting Regular Classes to Dataclasses
Timothy learned when to use dataclasses:
# Before - regular class with boilerplate
class Book:
def __init__(self, title, author, year, pages):
self.title = title
self.author = author
self.year = year
self.pages = pages
def __repr__(self):
return f'Book(title={self.title!r}, author={self.author!r}, year={self.year}, pages={self.pages})'
def __eq__(self, other):
if not isinstance(other, Book):
return NotImplemented
return (self.title, self.author, self.year, self.pages) == \
(other.title, other.author, other.year, other.pages)
def get_reading_time(self):
return self.pages * 2
# After - dataclass with method
@dataclass
class Book:
title: str
author: str
year: int
pages: int
def get_reading_time(self):
return self.pages * 2
"Replace classes that are primarily data containers," Margaret advised. "Keep the dataclass for structure, add methods for behavior."
When to Use Dataclasses
Margaret clarified when dataclasses made sense:
Use dataclasses when:
- The class primarily holds data
- You need
__init__
,__repr__
,__eq__
automatically - You want type hints on attributes
- The class is relatively simple (not complex behavior)
Don't use dataclasses when:
- The class has complex initialization logic
- You need custom
__init__
with non-trivial processing - The class is primarily behavior, not data
- You need fine control over magic methods
Dataclass Options Summary
Margaret showed Timothy all available options:
@dataclass(
init=True, # Generate __init__ (default: True)
repr=True, # Generate __repr__ (default: True)
eq=True, # Generate __eq__ (default: True)
order=False, # Generate comparison methods (default: False)
unsafe_hash=False, # Generate __hash__ - DANGEROUS with mutable! (default: False)
frozen=False, # Make immutable (default: False)
slots=False, # Use __slots__ for memory efficiency (default: False, Python 3.10+)
kw_only=False # Require keyword arguments (default: False, Python 3.10+)
)
class Book:
title: str
Real-World Example: Configuration Class
Margaret demonstrated a practical pattern:
from dataclasses import dataclass, field
from typing import Optional
@dataclass(frozen=True)
class DatabaseConfig:
host: str
port: int = 5432
database: str = "library"
username: str = "admin"
password: str = field(repr=False) # Don't print password
ssl_enabled: bool = True
pool_size: int = 10
timeout: Optional[int] = None
def __post_init__(self):
if self.port < 1 or self.port > 65535:
raise ValueError(f"Invalid port: {self.port}")
if self.pool_size < 1:
raise ValueError("Pool size must be positive")
# Create configuration
config = DatabaseConfig(
host="localhost",
password="secret123"
)
print(config)
# DatabaseConfig(host='localhost', port=5432, database='library',
# username='admin', ssl_enabled=True, pool_size=10, timeout=None)
# Notice password is hidden!
# Immutable - can't accidentally modify
# config.port = 3306 # FrozenInstanceError
# Can use as dict key
configs = {
DatabaseConfig(host="prod.db", password="prod123"): "production",
DatabaseConfig(host="dev.db", password="dev456"): "development"
}
Timothy's Dataclass Wisdom
Through exploring the Blueprint Factory, Timothy learned essential principles:
@dataclass generates boilerplate: Automatically creates __init__
, __repr__
, __eq__
.
Type hints define attributes: Each typed attribute becomes a field.
Type hints are not enforced: They're documentation and tool guidance, not runtime checks.
Default values come after non-defaults: Python requirement for parameters.
Use field(default_factory=...) for mutables: Never use mutable defaults directly—always use default_factory
for lists, dicts, sets.
frozen=True makes immutable: Can't modify after creation, gains __hash__
automatically.
unsafe_hash=True is dangerous: Only use with frozen dataclasses—mutable + hash corrupts sets and dicts.
replace() creates modified copies: The way to "change" frozen dataclasses without mutation.
order=True enables sorting: Generates comparison methods for sorting.
kw_only=True forces keyword arguments: Prevents positional mistakes, makes code clearer (Python 3.10+).
slots=True saves memory: 50%+ less memory, faster access, but less flexible (Python 3.10+).
compare=False excludes fields: Control which fields matter for equality and ordering.
init=False excludes from init: For computed or internal fields.
repr=False hides from string: For sensitive data like passwords.
post_init runs after init: Use for validation, normalization, or computed fields.
asdict() converts to dictionary: For JSON serialization, APIs, databases.
astuple() converts to tuple: For CSV writing, database inserts, ordered data.
Dataclasses support inheritance: Child inherits parent fields.
Dataclasses can have methods: Add behavior alongside data.
Use for data-heavy classes: Replace boilerplate-heavy classes with dataclasses.
Don't use for behavior-heavy classes: Complex logic needs regular classes.
Frozen dataclasses work as dict keys: Immutability enables hashing.
Python's Blueprint Factory
Timothy had discovered Python's Blueprint Factory—the @dataclass
decorator that eliminated repetitive boilerplate for data-holding classes. By declaring attributes with type hints, Python generated all the standard methods automatically. He learned to use field()
for customization, frozen=True
for immutability, order=True
for sorting, and __post_init__
for validation. Modern Python 3.10+ features like slots=True
offered dramatic memory savings, while kw_only=True
prevented positional argument mistakes. He discovered asdict()
and astuple()
for serialization, and replace()
for creating modified copies of frozen objects. Yet he also learned the dangers—unsafe_hash=True
with mutable data corrupts sets, and type hints are documentation, not enforcement. The Blueprint Factory revealed that modern Python didn't require hand-crafting every class—for simple data containers, the decorator handled the tedious parts, letting Timothy focus on the unique logic that mattered.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Top comments (0)