Python's @dataclass is great, but it is a decorator. You sprinkle it on, you get __init__, __eq__, __hash__, __repr__ for free. Lovely.
But what if you wanted a function instead? Call it with kwargs, get back a class. No decorator, no class statement, no module-level boilerplate.
Here is one in 25 lines. And here are the three bugs I found while writing this article.
The result first
Klass = Klass(a=1, b=2)
# fields become defaults
Klass(a=3).a # 3
Klass().a # 1 (class-level default)
# equality by attribute dict
Klass(a=3) == Klass(a=3) # True
Klass(a=2) == Klass(a=3) # False
# hashable, usable as dict keys
Klass(a=4) in {Klass(a=5): 1} # False
Klass() in {Klass(): 1} # True
# strict validation
Klass(g=3)
# NameError: Unkown argument g=3
The whole implementation
def Klass(**fields):
fields["__data__"] = list(fields.keys())
class _(type("DataClass", (object,), fields)):
def __init__(self, **class_kwargs):
for k, val in class_kwargs.items():
if k not in fields:
raise NameError("Unkown argument {}={}".format(k, val))
setattr(self, k, val)
def __str__(self):
return "&data.{}({})".format(self.__class__.__name__, fields)
__repr__ = __str__
def __eq__(self, other):
return self.__dict__ == other.__dict__
def __hash__(self):
return hash(tuple(fields[k] for k in fields["__data__"]))
return _
That is the entire thing. No imports. No metaclass. No __init_subclass__ gymnastics.
What is happening
Three nested layers:
-
Klassis a function. You call it with kwargs and it returns a class. - Inside,
type("DataClass", (object,), fields)builds a class on the fly whose class-level attributes are the kwargs you passed. This is the sametype()you use every day, except with three arguments it acts as the class constructor. - Then we define an inner class
_that subclasses that freshDataClass. The subclass adds__init__,__eq__,__hash__, and a custom__repr__. It returns_.
The closure over fields is doing the heavy lifting. Every method on _ can see the original kwargs because they are captured in the enclosing function's scope.
fields["__data__"] stores the original key order so __hash__ has a stable iteration. (This is a leftover from pre-3.7 days when dict order was not guaranteed. On modern Python you could drop it.)
The trick: defaults live on the class, overrides live on the instance
When you call Klass(a=3), __init__ only sets a on the instance. The other field b stays as a class attribute. So Klass(a=3).b resolves to 2 via normal attribute lookup, but Klass(a=3).__dict__ only contains {'a': 3}.
That is elegant — and it is also where the bugs hide.
Three bugs hiding in plain sight
Bug 1: __hash__ ignores the instance
def __hash__(self):
return hash(tuple(fields[k] for k in fields["__data__"]))
fields is the closure, not self.__dict__. Every instance of the same class returns the same hash.
hash(Klass(a=1)) == hash(Klass(a=999)) # True
Python lets you have hash collisions (the hash invariant only requires that equal objects have equal hashes, not the reverse). But it means a dict full of these instances degrades to O(n) — every key collides into the same bucket. Use it for ten objects, fine. Use it for ten thousand, your dict is a linked list.
Bug 2: __repr__ lies
def __str__(self):
return "&data.{}({})".format(self.__class__.__name__, fields)
It prints fields — the closure — not the instance state. So if you do x = Klass(a=99) and then print(x), you see a: 1, not a: 99. The repr lies about what the object actually contains.
Fix: format {**fields, **self.__dict__} instead.
Bug 3: __eq__ only sees what __init__ set
def __eq__(self, other):
return self.__dict__ == other.__dict__
Klass() has an empty __dict__ because no kwargs were passed. Klass(a=1) has {'a': 1}. They should be equal — both objects have effective a == 1 — but they compare unequal because one has the attribute in its instance dict and the other inherits it from the class.
Klass = Klass(a=1, b=2)
Klass() == Klass(a=1, b=2) # False — equal in spirit, unequal in __dict__
Fix: compare resolved attribute values, e.g. {k: getattr(self, k) for k in fields['__data__']}.
Why this is still interesting
The bugs are real, but the pattern is genuinely useful as a teaching tool. It demonstrates four things in one tiny example:
-
Classes are first-class values. A function can return a class.
type()is justclassspelled differently. -
Closures over class definitions. The methods on
_close overfieldsfrom the enclosing function — noself.fieldsstorage needed. - The class-vs-instance attribute split. Defaults on the class, overrides on the instance — the same trick Django models and many ORMs use.
-
Why
@dataclassexists. Writing__eq__,__hash__, and__repr__correctly is surprisingly easy to get wrong. The standard library does it once, properly. Your 25-line version does it wrong three different ways.
If you read the standard library's dataclasses.py, you will see it does essentially the same thing — generate __init__, __eq__, __hash__ — but with much more care about what __dict__ contains, when to freeze, when to compare by tuple instead of dict, and how to handle inheritance.
When to reach for this
Never in production. Use @dataclass or attrs.
But as an exercise? Read it. Type it out. Find the bugs yourself. That is how you learn what @dataclass is actually doing under the hood.
Twenty-five lines. Three bugs. One useful lesson about Python's object model.
Top comments (0)