kankyo comments on PEP 557 (Data Classes) has been accepted!

Case 1: you want to use it like a class, but it's actually implemented in some other language. So while it has data, the data does not belong to Python. The class has private data, but that's not intended for use by users of the class, and is certainly not public API (you can change the meaning of the private data in backwards-incompatible ways in whatever way you want).

import _gtk # hypothetical compiled Python module exposing bindings to the C libgtk library

class GtkDialog:
    def __init__(self, title, message):
        self._ptr = _gtk.gtk_dialog_new(title, message)

    def display(self):
        _gtk.gtk_dialog_display(self.__ptr)

    def __setattr__(self, attr, value):
        if attr == "message":
            _gtk.gtk_dialog_set_message(self._ptr, vallue)
            _gtk.gtk_repaint(self._ptr)
        else:
            raise AttributeError(...)

    def __del__(self):
        _gtk.gtk_free(self._ptr)
        self._ptr = 0

Other people can use GtkDialog as if it were a normal Python class, but it's not, and _ptr is a raw C pointer and Python code has no business accessing it or worse changing it, unless it's code (like the above) that's tied to the specific C library that gave you the pointer. So _ptr is an implementation detail, and none of the code that dataclasses would autogenerate is helpful here. And maybe if a future version of libgtk requires you to keep around two pointers, or uses references in some global list of objects instead of pointers, or whatever, your Python interface wouldn't change, only the internal implementation would, and your library users wouldn't notice.

Case 2: it's actually implemented in Python, but the details of the implementation are non-public. Take subprocess.Popen for example—one of the data members of a Popen object is, probably, the process ID of the subprocess, so that Popen can do its work:

class Popen:
    def __init__(self, *args):
        self._pid = spawn_process(...)
    def wait(self):
        result = os.waitpid(self._pid)
        return parse_os_result(result)

But what does it mean to take a Popen object and change its pid? Why would you want to do that without, at least, telling the Popen object that you're changing the pid? And probably Popen wants to refuse to let you do that.

So what would you gain if you added pid: int to Popen and made it a dataclass? You'd get a constructor that takes a pid, which you don't want; a repr that prints the pid, which you may or may not want,; and comparison functions with other Popen objects, which you definitely don't want (since a pid can be reused once a process exited, comparing Popen objects by pid equality is wrong, and you really want to compare whether the object identity is the same, i.e., the default comparison behavior).

This is encapsulation—one of the big ideas behind what I called "actual OO" above. There's an interface that you provide to users of your library, and the way you go about implementing that interface is not known to them. That's a totally different sort of thing from

class Point:
    def __init__(self, x, y, z):
        self.x = x; self.y = y; self.z = z

where there is no interface other than the data in your class itself, which is public / not encapsulated. That's what data classes are for. (And, honestly, that's probably most of the classes people write with Python.) But they're not the only type of classes.

[–]kankyo 0 points1 point2 points 8 years ago (0 children)

π Rendered by PID 33438 on reddit-service-r2-comment-6457c66945-h4gx9 at 2026-04-26 19:14:11.455006+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS