Alexander Mia

Posted on May 14

I Built a Dataclass in 25 Lines of Python. Then I Found Three Bugs.

#code #programming #python #showdev

Python's @dataclass is great, but it is a decorator. You sprinkle it on, you get __init__, __eq__, __hash__, __repr__ for free. Lovely.

But what if you wanted a function instead? Call it with kwargs, get back a class. No decorator, no class statement, no module-level boilerplate.

Here is one in 25 lines. And here are the three bugs I found while writing this article.

The result first

Klass = Klass(a=1, b=2)

# fields become defaults
Klass(a=3).a            # 3
Klass().a               # 1 (class-level default)

# equality by attribute dict
Klass(a=3) == Klass(a=3)   # True
Klass(a=2) == Klass(a=3)   # False

# hashable, usable as dict keys
Klass(a=4) in {Klass(a=5): 1}   # False
Klass() in {Klass(): 1}         # True

# strict validation
Klass(g=3)
# NameError: Unkown argument g=3

The whole implementation

def Klass(**fields):
    fields["__data__"] = list(fields.keys())

    class _(type("DataClass", (object,), fields)):
        def __init__(self, **class_kwargs):
            for k, val in class_kwargs.items():
                if k not in fields:
                    raise NameError("Unkown argument {}={}".format(k, val))
                setattr(self, k, val)

        def __str__(self):
            return "&data.{}({})".format(self.__class__.__name__, fields)

        __repr__ = __str__

        def __eq__(self, other):
            return self.__dict__ == other.__dict__

        def __hash__(self):
            return hash(tuple(fields[k] for k in fields["__data__"]))

    return _

That is the entire thing. No imports. No metaclass. No __init_subclass__ gymnastics.

What is happening

Three nested layers:

Klass is a function. You call it with kwargs and it returns a class.
Inside, type("DataClass", (object,), fields) builds a class on the fly whose class-level attributes are the kwargs you passed. This is the same type() you use every day, except with three arguments it acts as the class constructor.
Then we define an inner class _ that subclasses that fresh DataClass. The subclass adds __init__, __eq__, __hash__, and a custom __repr__. It returns _.

The closure over fields is doing the heavy lifting. Every method on _ can see the original kwargs because they are captured in the enclosing function's scope.

fields["__data__"] stores the original key order so __hash__ has a stable iteration. (This is a leftover from pre-3.7 days when dict order was not guaranteed. On modern Python you could drop it.)

The trick: defaults live on the class, overrides live on the instance

When you call Klass(a=3), __init__ only sets a on the instance. The other field b stays as a class attribute. So Klass(a=3).b resolves to 2 via normal attribute lookup, but Klass(a=3).__dict__ only contains {'a': 3}.

That is elegant — and it is also where the bugs hide.

Three bugs hiding in plain sight

Bug 1: `hash` ignores the instance

def __hash__(self):
    return hash(tuple(fields[k] for k in fields["__data__"]))

fields is the closure, not self.__dict__. Every instance of the same class returns the same hash.

hash(Klass(a=1)) == hash(Klass(a=999))   # True

Python lets you have hash collisions (the hash invariant only requires that equal objects have equal hashes, not the reverse). But it means a dict full of these instances degrades to O(n) — every key collides into the same bucket. Use it for ten objects, fine. Use it for ten thousand, your dict is a linked list.

Bug 2: `repr` lies

def __str__(self):
    return "&data.{}({})".format(self.__class__.__name__, fields)

It prints fields — the closure — not the instance state. So if you do x = Klass(a=99) and then print(x), you see a: 1, not a: 99. The repr lies about what the object actually contains.

Fix: format {**fields, **self.__dict__} instead.

Bug 3: `eq` only sees what `init` set

def __eq__(self, other):
    return self.__dict__ == other.__dict__

Klass() has an empty __dict__ because no kwargs were passed. Klass(a=1) has {'a': 1}. They should be equal — both objects have effective a == 1 — but they compare unequal because one has the attribute in its instance dict and the other inherits it from the class.

Klass = Klass(a=1, b=2)
Klass() == Klass(a=1, b=2)   # False — equal in spirit, unequal in __dict__

Fix: compare resolved attribute values, e.g. {k: getattr(self, k) for k in fields['__data__']}.

Why this is still interesting

The bugs are real, but the pattern is genuinely useful as a teaching tool. It demonstrates four things in one tiny example:

Classes are first-class values. A function can return a class. type() is just class spelled differently.
Closures over class definitions. The methods on _ close over fields from the enclosing function — no self.fields storage needed.
The class-vs-instance attribute split. Defaults on the class, overrides on the instance — the same trick Django models and many ORMs use.
Why @dataclass exists. Writing __eq__, __hash__, and __repr__ correctly is surprisingly easy to get wrong. The standard library does it once, properly. Your 25-line version does it wrong three different ways.

If you read the standard library's dataclasses.py, you will see it does essentially the same thing — generate __init__, __eq__, __hash__ — but with much more care about what __dict__ contains, when to freeze, when to compare by tuple instead of dict, and how to handle inheritance.

When to reach for this

Never in production. Use @dataclass or attrs.

But as an exercise? Read it. Type it out. Find the bugs yourself. That is how you learn what @dataclass is actually doing under the hood.

Twenty-five lines. Three bugs. One useful lesson about Python's object model.

DEV Community

I Built a Dataclass in 25 Lines of Python. Then I Found Three Bugs.

The result first

The whole implementation

What is happening

The trick: defaults live on the class, overrides live on the instance

Three bugs hiding in plain sight

Bug 1: `hash` ignores the instance

Bug 2: `repr` lies

Bug 3: `eq` only sees what `init` set

Why this is still interesting

When to reach for this

Top comments (0)

The result first

The whole implementation

What is happening

The trick: defaults live on the class, overrides live on the instance

Three bugs hiding in plain sight

Bug 1: __hash__ ignores the instance

Bug 2: __repr__ lies

Bug 3: __eq__ only sees what __init__ set

Why this is still interesting

When to reach for this

Bug 1: `hash` ignores the instance

Bug 2: `repr` lies

Bug 3: `eq` only sees what `init` set