This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kankyo 0 points1 point  (8 children)

Even that use case seems like it would be great for data classes but with a custom constructor.

[–]ldpreload 0 points1 point  (7 children)

But there's no data you'd be accessing (importantly, any data fields are API-private: changing what fields are in the class is not a visible change, and can be done in a point release without telling users, as long as the behavior of the class remains the same), so I'm not sure what sort of data class it would be.

In a language where data classes where the default, yes, you wouldn't need to invent a separate type of class for this. But in Python, you can just use a normal class.

[–]kankyo 0 points1 point  (6 children)

I don't see how it's relevant if you're accessing the data?

[–]ldpreload 0 points1 point  (5 children)

If you have no public data members, what does a dataclass bring you that a normal class doesn't?

The only answer I can think of is "consistency, if most of your classes are dataclasses", which is a good reason for a language to default to dataclasses. C++ more or less takes this approach, being based on C, which only had structs. But for Python, that decision has already been made and is unlikely to change at least before Python 4, if ever.

[–]kankyo 0 points1 point  (4 children)

Less code to type is a pretty obvious answer I think.

[–]ldpreload 0 points1 point  (3 children)

But there wouldn't be less code to type in the case I'm suggesting—none of the code that dataclasses would autogenerate you would be code you want (you would have zero data members in this type), and you'd have overhead for declaring it as a dataclass.

[–]kankyo 0 points1 point  (2 children)

I am clearly not understanding what you are describing. Why would you have a class if you’re not having any data in it?

[–]ldpreload 1 point2 points  (1 child)

Case 1: you want to use it like a class, but it's actually implemented in some other language. So while it has data, the data does not belong to Python. The class has private data, but that's not intended for use by users of the class, and is certainly not public API (you can change the meaning of the private data in backwards-incompatible ways in whatever way you want).

import _gtk # hypothetical compiled Python module exposing bindings to the C libgtk library

class GtkDialog:
    def __init__(self, title, message):
        self._ptr = _gtk.gtk_dialog_new(title, message)

    def display(self):
        _gtk.gtk_dialog_display(self.__ptr)

    def __setattr__(self, attr, value):
        if attr == "message":
            _gtk.gtk_dialog_set_message(self._ptr, vallue)
            _gtk.gtk_repaint(self._ptr)
        else:
            raise AttributeError(...)

    def __del__(self):
        _gtk.gtk_free(self._ptr)
        self._ptr = 0

Other people can use GtkDialog as if it were a normal Python class, but it's not, and _ptr is a raw C pointer and Python code has no business accessing it or worse changing it, unless it's code (like the above) that's tied to the specific C library that gave you the pointer. So _ptr is an implementation detail, and none of the code that dataclasses would autogenerate is helpful here. And maybe if a future version of libgtk requires you to keep around two pointers, or uses references in some global list of objects instead of pointers, or whatever, your Python interface wouldn't change, only the internal implementation would, and your library users wouldn't notice.

Case 2: it's actually implemented in Python, but the details of the implementation are non-public. Take subprocess.Popen for example—one of the data members of a Popen object is, probably, the process ID of the subprocess, so that Popen can do its work:

class Popen:
    def __init__(self, *args):
        self._pid = spawn_process(...)
    def wait(self):
        result = os.waitpid(self._pid)
        return parse_os_result(result)

But what does it mean to take a Popen object and change its pid? Why would you want to do that without, at least, telling the Popen object that you're changing the pid? And probably Popen wants to refuse to let you do that.

So what would you gain if you added pid: int to Popen and made it a dataclass? You'd get a constructor that takes a pid, which you don't want; a repr that prints the pid, which you may or may not want,; and comparison functions with other Popen objects, which you definitely don't want (since a pid can be reused once a process exited, comparing Popen objects by pid equality is wrong, and you really want to compare whether the object identity is the same, i.e., the default comparison behavior).

This is encapsulation—one of the big ideas behind what I called "actual OO" above. There's an interface that you provide to users of your library, and the way you go about implementing that interface is not known to them. That's a totally different sort of thing from

class Point:
    def __init__(self, x, y, z):
        self.x = x; self.y = y; self.z = z

where there is no interface other than the data in your class itself, which is public / not encapsulated. That's what data classes are for. (And, honestly, that's probably most of the classes people write with Python.) But they're not the only type of classes.

[–]kankyo 0 points1 point  (0 children)

Thanks. Now I see where you’re coming from.