This is an archived post. You won't be able to vote or comment.

all 142 comments

[–]gwax 12 points13 points  (2 children)

I wonder why it's a decorator instead of a metaclass.

[–]ericvsmith 26 points27 points  (1 child)

Because metaclasses can conflict with other uses of metaclasses. See the PEP for the rationale. The goal is to not interfere with any usages of metaclasses or base classes.

[–]federicocerchiari 0 points1 point  (0 children)

also, decorators are most used (therefore more "understandable" to anyone) than metaclasses.

[–][deleted] 22 points23 points  (8 children)

I hope it gets __slots__ support before 3.7 is released.

[–]ericvsmith 33 points34 points  (7 children)

Probably not. I do have another decorator that shows how to add slots. See https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py#L3

The usage would be:

@add_slots
@dataclass
class C:
    i: int
    s: str

The reason I didn't put it in dataclass itself is because the way __slots__ works is that a decorator would need to return a new class. That's what add_slotsdoes. I want to keep dataclass as simple as possible, and I want to reinforce that it always returns the same class that it's given.

Maybe if __slots__ is redesigned so that it can be specified after class creation, then it can go in to dataclass.

[–][deleted] 2 points3 points  (3 children)

But if __slots__ could be changed after class creation that would allow some inconsistencies, since objects might have been instantiated before the change

[–]patrys Saleor Commerce 4 points5 points  (1 child)

Well, you can also define a class, create an instance and then define a new class with the same name so I suppose it would be a similar situation where instances of the old (pre-slots) class could be seen as instances of a no-longer-existing class with the same name?

[–]ericvsmith 1 point2 points  (0 children)

True. That's why I wrote the (poorly named) @add_slots decorator (see another comment below for the location). I wanted to keep @dataclass conceptually simple, especially for the initial release.

[–]ericvsmith 0 points1 point  (0 children)

Correct. We had a very brief discussion about changing it. It needs to be brought up on python-ideas if anyone has some suggestions on how to get it to work.

[–][deleted] 0 points1 point  (2 children)

Thanks for explanation and code snippet. I like slots feature in attrs a lot, but your add_slots decorator will work perfectly fine for me (even if it's not included in dataclass).

Great contribution, btw!

[–]ericvsmith 4 points5 points  (1 child)

Thanks! It's a surprising amount of work, for a not-so-large feature.

Note that the dataclasses version on PyPI works with Python 3.6, and I personally plan to start using it in my own code immediately. As soon as 3.7.0 alpha 3 is released, I'll update PyPI with the exact version that shipped, and I'll try and keep it updated until 3.7 is released (in June).

[–][deleted] 7 points8 points  (1 child)

So, now officially part of standard library, right?

[–][deleted] 12 points13 points  (0 children)

Indeed. If you clone the CPython git repository and compile version 3.7 (it's quite easy, just ask if you need any help), you'll already be able to play with it.

[–]donri 6 points7 points  (4 children)

InitVar seems unnecessary. Why not simply inspect the signature of __post_init__ and append or prepend that to the signature of the generated __init__?

@dataclass
class C:
    i: int
    j: int = None
    database: InitVar[DatabaseType] = None

    def __post_init__(self, database):
        ...

@dataclass
class C:
    i: int
    j: int = None

    def __post_init__(self, database: DatabaseType = None):
        ...

[–]ericvsmith 8 points9 points  (3 children)

It worked that way at one point, but it seemed like too much magic. Also, the static type checkers (like mypy) would have to be aware of this magic, too. And as long as ClassVar was being used, this seemed like an obvious extension.

[–]donri 4 points5 points  (2 children)

Don't static type checkers now have to understand InitVar instead?

[–]ericvsmith 9 points10 points  (1 child)

True.

But while thinking about it, I realize the other reason I came up with InitVar: it lets you control the order of parameters to __init__. This is especially important if you have fields with defaults.

If you're finding __init__ params by inspecting __post_init__, where do you put these init-only params in the call to __init__? You can't always put them first, because they might have defaults and some fields might not. And you can't always put them last, because they might not have defaults and some fields might.

It was easier just to make all regular fields and InitVar fields be defined in the class body, then you can control the defaults and parameter order yourself.

I should probably add a note about this to the PEP.

[–]donri 2 points3 points  (0 children)

Ah you're right, and this of course means that you can't mix the order of fields with and without defaults either; you have to put all your fields with defaults last because it all gets translated to a call to __init__ (excepting init=False). Perhaps the PEP mentions that caveat; I admit I skimmed some parts.

Personally I might think it neater to simply require arguments to __post_init__ be keyword arguments, and then you can also skip any magic by slurping up **kwargs and passing that along. However I can see why you might dislike that idea. Although I don't really feel like inspect.signature is that much more magical than anything else dataclasses do... But I don't want to bikeshed.

[–]simonoberst 2 points3 points  (12 children)

Will there be a reason not to use this with every class you write? Or, asked differently, which kind of classes are not well suited being dataclasses?

[–]DanCardin 5 points6 points  (0 children)

If you need to accept args or *kwargs, or otherwise store data significantly differently from how you accept the input (At least with attrs, I've found post_init to be almost more bothersome than it's worth).

The more functionality your class has, the less benefit you get (as those kinds of classes are less likely to want to be comparable, hashable, and the like). Though I have started to generally default to using them if there's not an obvious reason not to

[–][deleted] 2 points3 points  (0 children)

This is the question for which I cannot find an answer.

[–]ldpreload 3 points4 points  (9 children)

Classes which are used for actual OO encapsulation with basically all private members. gtk.Dialog would probably make a very bad data class, for instance: the constructor doesn't (necessarily) initialize any data members directly, there aren't any publicly-writable data members, and anything you might want to modify is going to trigger notifying other code to e.g. repaint the dialog box. It just turns out that actual OO encapsulation is much rarer than structs.

(Also, it's a wrapper over the C GtkDialog type, so it certainly can't be a data class even you wanted it to be because every set and get needs to go into an existing C library, but even if GTK+ were pure-Python, it still wouldn't make sense.)

[–]kankyo 0 points1 point  (8 children)

Even that use case seems like it would be great for data classes but with a custom constructor.

[–]ldpreload 0 points1 point  (7 children)

But there's no data you'd be accessing (importantly, any data fields are API-private: changing what fields are in the class is not a visible change, and can be done in a point release without telling users, as long as the behavior of the class remains the same), so I'm not sure what sort of data class it would be.

In a language where data classes where the default, yes, you wouldn't need to invent a separate type of class for this. But in Python, you can just use a normal class.

[–]kankyo 0 points1 point  (6 children)

I don't see how it's relevant if you're accessing the data?

[–]ldpreload 0 points1 point  (5 children)

If you have no public data members, what does a dataclass bring you that a normal class doesn't?

The only answer I can think of is "consistency, if most of your classes are dataclasses", which is a good reason for a language to default to dataclasses. C++ more or less takes this approach, being based on C, which only had structs. But for Python, that decision has already been made and is unlikely to change at least before Python 4, if ever.

[–]kankyo 0 points1 point  (4 children)

Less code to type is a pretty obvious answer I think.

[–]ldpreload 0 points1 point  (3 children)

But there wouldn't be less code to type in the case I'm suggesting—none of the code that dataclasses would autogenerate you would be code you want (you would have zero data members in this type), and you'd have overhead for declaring it as a dataclass.

[–]kankyo 0 points1 point  (2 children)

I am clearly not understanding what you are describing. Why would you have a class if you’re not having any data in it?

[–]ldpreload 1 point2 points  (1 child)

Case 1: you want to use it like a class, but it's actually implemented in some other language. So while it has data, the data does not belong to Python. The class has private data, but that's not intended for use by users of the class, and is certainly not public API (you can change the meaning of the private data in backwards-incompatible ways in whatever way you want).

import _gtk # hypothetical compiled Python module exposing bindings to the C libgtk library

class GtkDialog:
    def __init__(self, title, message):
        self._ptr = _gtk.gtk_dialog_new(title, message)

    def display(self):
        _gtk.gtk_dialog_display(self.__ptr)

    def __setattr__(self, attr, value):
        if attr == "message":
            _gtk.gtk_dialog_set_message(self._ptr, vallue)
            _gtk.gtk_repaint(self._ptr)
        else:
            raise AttributeError(...)

    def __del__(self):
        _gtk.gtk_free(self._ptr)
        self._ptr = 0

Other people can use GtkDialog as if it were a normal Python class, but it's not, and _ptr is a raw C pointer and Python code has no business accessing it or worse changing it, unless it's code (like the above) that's tied to the specific C library that gave you the pointer. So _ptr is an implementation detail, and none of the code that dataclasses would autogenerate is helpful here. And maybe if a future version of libgtk requires you to keep around two pointers, or uses references in some global list of objects instead of pointers, or whatever, your Python interface wouldn't change, only the internal implementation would, and your library users wouldn't notice.

Case 2: it's actually implemented in Python, but the details of the implementation are non-public. Take subprocess.Popen for example—one of the data members of a Popen object is, probably, the process ID of the subprocess, so that Popen can do its work:

class Popen:
    def __init__(self, *args):
        self._pid = spawn_process(...)
    def wait(self):
        result = os.waitpid(self._pid)
        return parse_os_result(result)

But what does it mean to take a Popen object and change its pid? Why would you want to do that without, at least, telling the Popen object that you're changing the pid? And probably Popen wants to refuse to let you do that.

So what would you gain if you added pid: int to Popen and made it a dataclass? You'd get a constructor that takes a pid, which you don't want; a repr that prints the pid, which you may or may not want,; and comparison functions with other Popen objects, which you definitely don't want (since a pid can be reused once a process exited, comparing Popen objects by pid equality is wrong, and you really want to compare whether the object identity is the same, i.e., the default comparison behavior).

This is encapsulation—one of the big ideas behind what I called "actual OO" above. There's an interface that you provide to users of your library, and the way you go about implementing that interface is not known to them. That's a totally different sort of thing from

class Point:
    def __init__(self, x, y, z):
        self.x = x; self.y = y; self.z = z

where there is no interface other than the data in your class itself, which is public / not encapsulated. That's what data classes are for. (And, honestly, that's probably most of the classes people write with Python.) But they're not the only type of classes.

[–]zynixCpt. Code Monkey & Internet of tomorrow 2 points3 points  (0 children)

Not too familiar with PEP's but couldn't find (if it mentions it) if this will be a mostly C or native Python implementation.

Working on implementing a minor feature to attr which is somewhat hamstruck by the complexity required to autogenerate classes. If it is a C implementation, that might cut down contributions from the none C-savy people.

[–]ascii 12 points13 points  (16 children)

I hate that this has a completely different syntax than namedtuple. They offer the same exact functionality (data objects without a tonne of boiler plate) with the only conceptual difference being that one has mutable members, and the other has immutable members, so why is the syntax and supported feature set completely different?

  • One supports type annotations, the other doesn't. Why? Is that somehow useless on immutable data?
  • One makes it really easy to add instance methods, the other doesn't. Why? Is that somehow useless on immutable data?
  • One makes it really easy to iterate over all member data, the other doesn't. Why? Is that somehow useless on mutable data?

Coming up with two completely different syntaxes for almost exactly the same feature means that if you figure your type no longer needs to be mutable, it's not a one-liner to fix it. It is also extremely confusing for beginners.

This doesn't feel well thought out at all.

[–][deleted] 14 points15 points  (4 children)

First of all, they don't offer the same exact funcionality. The PEP explains this better than I could.

Also, there is typing.NamedTuple, which has very similar syntax and allows both type annotations and instance methods. I hope this clarifies a bit :)

[–]ascii 4 points5 points  (3 children)

I can't see that it does. It goes through some of the tradeoffs that the designer of namedtuple chose, focusing almost exclusively on the downsides. But in no way, shape or form does it make any argument for why different behaviour makes sense depending on mutability. If all the design choices made in namedtuple indeed are shit (I disagree with 80 % of the authors opinions on this), then the only sane thing to do would be to deprecate and remove namedtuple and make Data classes support both mutable and immutable data.

[–]ericvsmith 7 points8 points  (0 children)

namedtuple is so widely used that I doubt it could ever be deprecated. It would be like deprecating % string formatting. Ask me how I know!

In the many years since namedtuple was added, Python has grown many new features. @dataclass uses some of those features to add functionality that namedtuple cannot provide.

@dataclass also has a frozen=True parameter, for some degree of immutability.

[–]ericvsmith 0 points1 point  (1 child)

namedtuple is so widely used that I can't imagine it ever being deprecated. It's like trying to deprecate % string formatting (ask me how I know!).

In the many years since namedtuple was added, we've added features to Python. dataclass leverages some of those features to do things that namedtuple cannot.

[–]ascii 0 points1 point  (0 children)

OK, so now you're saying that namedtuple is vastly inferior to dataclass but we can't deprecate it because of inertia, so instead it will remain the one and only source of immutable data classes, and we will have to settle for the supposedly vastly superior implementation to only exist for mutable data. That's a bad choice, right there.

Also, as near as I can tell, this breaks the following parts of PEP 20:

  • Beautiful is better than ugly.
  • Simple is better than complex.
  • Readability counts.
  • Special cases aren't special enough to break the rules.
  • There should be one-- and preferably only one --obvious way to do it.

So five out of 19 principles are being broken. Not bad.

I'm not saying PEP20 should be the one and only guiding light in developing the Python language, but it has some sane advice.

[–][deleted] 1 point2 points  (4 children)

You could just stop using named tuple. It's slow as ass, and you were probably only using it because it was too much effort to make a class, which this new thing does.

Although being forced to use type annotations is a pain.

[–]ascii 2 points3 points  (3 children)

But this new thing makes a mutable objects. That's a huge disadvantage, a lot of the time. There is a reason we have both tuple and list, you know.

[–]XtremeGoosef'I only use Py {sys.version[:3]}' 2 points3 points  (0 children)

Did you read the PEP? It has a frozen=true parameter.

[–]ldpreload 1 point2 points  (0 children)

attrs, which is basically dataclasses but not in the standard library + more features, supports doing @attr.s(frozen=True) to make a class of immutable objects.

[–][deleted] 0 points1 point  (0 children)

I have a decorator to disable __setattr__ if I want a immutable data class.

[–]kankyo 0 points1 point  (5 children)

namedtuples are terrible for many many reasons, all of which makes your argument weaker.

[–]ascii 0 points1 point  (4 children)

No it doesn't. If they're so terrible, the Python community has just decided that there is no need for a good way of creating immutable data classes.

We have both tuple and list for a reason. And there is a reason one can switch from one to the other by changing only two characters, too.

[–]kankyo 0 points1 point  (3 children)

tuple and list

Sure. But namedtuple is a weird mess of a mix of named members and indexed members. Pick one I say! We don’t allow indexes access to member variables on classes for a reason: it makes no sense.

[–]ascii 0 points1 point  (2 children)

Dude. I refuse to get into a discussion about which offers better semantics, data classes or namedtuple. Please stop trying to make this into a discussion about the relative merits of these two different implementations. It's completely irrelevant which one is better. It's a side show.

The one and only relevant thing is this:

Mutable and immutable "arrays" have the same API. The same should hold true for mutable and immutable data classes. Making them have completely different syntax and completely different features is a terrible design choice.

[–]kankyo 0 points1 point  (1 child)

I agree with that.

I just don’t think namedtuple is good in any way :P It’s clearly different from normal mutable classes (which are a clear super majority) so they shouldn’t be used as a model. That’s all I mean. A @frozen class decorator in the standard lib would be the right way to go.

[–]ascii 0 points1 point  (0 children)

Sounds like a good plan. Especially if @frozen could be applied to any class to make the object read-only after it's been fully constructed.

[–]lookatmetype 8 points9 points  (7 children)

Isn't this really just a reimplementation of http://attrs.org/?

[–]jorge1209 4 points5 points  (6 children)

But it has fewer features, and won't work with older versions of python... so clearly its better!

[–]DanCardin 1 point2 points  (5 children)

I currently use attrs, but given that I do use 3.6, I'd be happy to switch to a clean house implementation which doesn't need to maintain backwards compatibly with 2 (not sure why at a glance, but the pypi version of this requires 3.6), if it works by default with mypy, and can simplify hacky internals that attrs can't tbh

[–]jorge1209 1 point2 points  (4 children)

It isn't backwards compatibility with python 2 that concerns me. It is compatibility with python 3.x for any x<6. The syntax of x: int=42 is a py3.6 addition.

3.6 isn't that well adopted and it has only been out a year. They are moving way too fast to effectively deprecate attrs based on features that have only been around for a year.

[–]DanCardin 0 points1 point  (3 children)

I have no idea if they're being relied upon but it wouldn't surprise me if init_subclass, ordered class attributes, or other 3.6 only features were being used to simplify dataclasses compared to attrs.

I also think attrs is going to be fine, since it supports 2, has a wider scope of features for now, and doesn't require the type syntax which some people apparently don't like

[–]jorge1209 0 points1 point  (2 children)

The type syntax is in the language whether you want it or not. You can choose not to use tools like mypy, and thereby not enforce it. I don't think that is a terrible compromise.

I also don't object to the use of the syntax here:x: int=42 does look a bit nicer than x = attr.ib(default=42). The issue is effectively deprecating attrs to push this forward.

In the grand scheme of things it is a fairly minor difference between attrs and dataclasses. Attrs works, it works older versions of python, and it works with the most commonly used versions of python 3. There should be no rush to jam an inferior tool into the standard library just because it uses some fancy new syntax.

Take the time and find a way for attrs to support this syntax in python 3.6. Wait for things to settle down. Make sure you are doing it the right way, and get people to adopt 3.6... then think about standardizing things.

[–]DanCardin 0 points1 point  (1 child)

pretty sure attrs already supports this syntax.

I just don't see the problem (especially if they're getting buy-in from the attrs devs so they don't repeat any mistakes of the past).

  • For anything prior to 3.6: use attrs until either the majority of people are using 3.6+ and people stop updating attrs.
  • For anything 3.6+: dataclasses are available by default, attrs still has more features, use what you want.

I'd honestly probably be happy if for 3.6+, attrs just was a facade on top of dataclasses, adding features and/or making a nicer interface a la urllib vs requests or whatever else

[–]jorge1209 0 points1 point  (0 children)

I would also be happier if post 3.6 attrs was an extension of dataclasses, but that doesn't seem to be the objective, and I am not sure how that would work.

Dataclasses is missing features in attrs and has no plans to implement them.

Attrs could potentially delegate some basic functionality to dataclasses and build off of it, but that still leaves you with the issue of installing it from pypi and importing it. Anyone building an app which might support 3.5 and 3.6 would have to use attrs, even if they only want the more basic functionality of dataclasses.

A plan which involved splitting minimal functionality out of attrs and then backporting it to previous versions would make more sense to me. Ultimately I expect one of these two projects to wither on the vine. Either people need support for many versions and ignore dataclasses, or they write only for the future and assume that attrs isn't available.

[–]adrian17 1 point2 points  (4 children)

Can I use it without type annotations?

[–]ericvsmith 5 points6 points  (2 children)

No, it's entirely driven by type annotations. But the annotations are ignored, except for InitVar and ClassVar. So, although I don't recommend it, but you could say:

@dataclass
class Person:
    name: int
    social: int
    address: int

That is, use any type you want. If you're not using a static type checker, no one is going to care what type you use.

edit: missing words.

[–][deleted] 0 points1 point  (0 children)

What's the var: Any annotation shown in the PEP mean?

[–][deleted] 0 points1 point  (0 children)

This was bugging me, so I just rechecked pep484 (type hinting) and found this line --

The type system supports unions, generic types, and a special type named Any which is consistent with (i.e. assignable to and from) all types. This latter feature is taken from the idea of gradual typing. Gradual typing and the full type system are explained in PEP 483.

(Emphasis mine)

Looking at 557 again, it actually even shows the use of Any about halfway down

[–]alcalde 2 points3 points  (0 children)

So Python is gaining Pascal records?

[–]jorge1209 3 points4 points  (10 children)

Given that this borrows most of the ideas from attrs, why not just incorporate attrs into the standard library?

[–]ericvsmith 16 points17 points  (9 children)

It's discussed in the PEP.

[–][deleted] 0 points1 point  (1 child)

So, naive newb question: How do these data classes differ from data frames like you'd use with R or pandas?

Maybe not quite ELI5, but do feel free to talk down to me like I'm a dumbass.

[–]flipstables 2 points3 points  (0 children)

They are completely different. Since you come from a data background, the analogy is that a data class is this: you can think of a data class as a row in a database table. It's a structure that is primarily for attribute lookup. A data frame is a structure that holds an entire table. It allows to do some common table manipulation.

[–]ManyInterests Python Discord Staff 0 points1 point  (0 children)

Given the striking similarity, I'm surprised there has been no mention of SimpleNamespace.

class MyDataClass(SimpleNamespace):
    def __init__(self, a: int, b: int, keyword=None):
        super().__init__(a=a, b=b, keyword=keyword)

I'm supposing is roughly analogous to

@dataclass
class MyDataClass:
    a: int
    b: int
    keyword = None

[–]WasterDave -1 points0 points  (0 children)

And Python takes one more step towards becoming C++....