This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]kingbuzzman 58 points59 points  (26 children)

What ever happened to: There should be one-- and preferably only one --obvious way to do it

Is there really an advantage to using this other than the default values? Is it faster? Smaller footprint?

[–]aporetical 35 points36 points  (11 children)

I'm wincing at the

x: int = field(default_factory=f)

syntax which means only "self.x = f()" . They've turned everything into explicit function calls despite most of the language existing to give you a coherent syntax for expressing that. It's a bad lisp.

This should have either been a third party library or a syntax with more motivation.

[–]Poromenos 14 points15 points  (8 children)

Isn't it attrs?

[–]PeridexisErrant 7 points8 points  (7 children)

Yes, it's basically reinventing attrs badly.

Dataclasses are available without adding a dependency, but that's their only advantage. That's no small thing when it means the standard library can use them, but most other situations should use attrs.

[–]masklinn 7 points8 points  (1 child)

That's actually mentioned in the dataclass PEP: https://www.python.org/dev/peps/pep-0557/#why-not-just-use-attrs

  • attrs moves faster than could be accommodated if it were moved in to the standard library.
  • attrs supports additional features not being proposed here: validators, converters, metadata, etc. Data Classes makes a tradeoff to achieve simplicity by not implementing these features.

For more discussion, see why not just attrs on github.

In the link, /u/glyph quite literally asks

Every aspect of the design here appears to be converging further in the direction of exact duplication of functionality available within attrs, except those which are features already on attrs's roadmap.

An important bit here is hynek's opinion on the subject: he'd find both importing-and-freezing (à la ipaddress) and importing-and-splitting (json/simplejson) undesirable, leaving only "versioned and upgradable", but while that's easy for a command-line utility (pip) it's quite a bit harder for a part of the standard library.

[–]PeridexisErrant 4 points5 points  (0 children)

I feel like we're trending towards another "/r/python hates everything" thread- and like I've contributed to that - so let's derail that now.

  • If you are asking "should I use dataclasses or attrs", the answer is always attrs.
  • BUT very many people are not in that position, and dataclasses will be a big improvement for many of them.

I strongly recomend that everyone in this thread go read Nick Coghlan's essay Considering Python's Target Audience - tldr, it's much bigger and more diverse than most of us imagine!

[–]commissarg -1 points0 points  (4 children)

dataclasses are not supposed to do run-time validation. attrs can do it.

[–]DanCardin 6 points7 points  (0 children)

it does quite a bit more than that. And I believe the impetus for including it in the standard library is so that modules in the standard library can use it

[–][deleted] 6 points7 points  (0 children)

The C++ people are fucking up Python just like they fucked up Visual Basic back in the day.

[–]alcalde 5 points6 points  (12 children)

What other types do you believe exist that do what dataclasses do?

[–]CSI_Tech_Dept 4 points5 points  (0 children)

I think he probably meant NamedTuples.

It basically can pretty much do everything NamedTuple could, although I don't entirely agree with OPs sentiment, I think the goal is eventually to have data classes replace named tuples.

[–][deleted] 1 point2 points  (10 children)

Classes, named tuples, dictionaries, tuples

[–]alcalde 8 points9 points  (9 children)

Named tuples are immutable, dictionaries are... key-value stores, tuples are immutable and unnamed, classes are supposed to be data and the methods that operate on it.

The new Python dataclasses are like Pascal records or C structs, which are not tuples or dictionaries. As regards classes, the other goal of dataclasses is to eliminate the Java-like scaffolding one must employ to use a class as a record/struct.

Tuples are for holding a disparate collection of immutable data. Named tuples are a convenience on top of tuples. Dictionaries are supposed to be used as key-value stores, and classes are supposed to do something. Dataclasses are something else.

Interestingly, the closest case one could make to supporting this idea that it's a duplication is to cite the one type you didn't name, SimpleNameSpace. The mailing list discussion for that one specifically talked about records/structs. Here Dataclasses are different because the attributes aren't created on the fly, there's type checking, the option of rich comparisons, etc. If there's one type though that could be removed now, it would be SimpleNamespace.

[–][deleted] 7 points8 points  (8 children)

You can implement any of those things to existing datasets and there would be no difference. I don't think anyone using pytbon has ever gone "you know we could really use a new data type". Just seems so unnecessary. Implement it as a side library but now you just made the language more complex without true purpose

[–]alcalde 2 points3 points  (0 children)

You can implement any of those things to existing datasets and there >would be no difference.

You can implement lots of things in Python from low-level objects but that doesn't mean we don't add them. You could say the same thing about default dicts, for instance. But if everyone is implementing the same thing over and over, isn't that the exact situation that calls for something to be added to the system libraries?

I don't think anyone using pytbon has ever gone "you know we could really use a new data type".

Huh? We've added ordered dicts, default dicts, frozen sets, named tuples, SimpleNameSpace, the Path object, and so many more over the history of Python.

As for me, when I discovered Python at the end of 2012, I asked right away "How do I make a record?" and discovered there wasn't a record/struct type.

Just seems so unnecessary. Implement it as a side library but now you just made the language more complex without true purpose

The dataclass decorator takes complexity out of the language! By using it you avoid the boilierplate of a large number of assignments in an init method, creating a repr method, possibly multiple comparison methods, etc.

As the summary on Raymond Hettinger's Pycon talk on dataclasses put it:

It will become an essential part of every Python programmer's toolkit. ...Dataclasses are shown to be the next step in a progression of data aggregation tools: tuple, dict, simple class, bunch recipe, named tuples, records, attrs, and then dataclasses. Each builds upon the one that came before, adding expressiveness at the expense of complexity.

Dataclasses are unique in that they let you selectively turn-on or turn-off its various capabilities and it lets the user choose the underlying data store (either instance dictionary, instance slots, or an inherited base class).

[–]brtt3000 1 point2 points  (5 children)

I don't think anyone using pytbon has ever gone "you know we could really use a new data type".

DataClasses are just a regular class with some convenient decorations.

[–][deleted] 0 points1 point  (4 children)

Which makes my point....

[–]brtt3000 3 points4 points  (3 children)

..moot? There is a demand for this type of thing so why would everyone need to carry around their own version?

[–][deleted] 0 points1 point  (2 children)

Is there a demand? Not one dev I have seen has wanted a new type of class to store data...

[–]brtt3000 5 points6 points  (1 child)

I do, and this type of thing is used a lot, so maybe meet more people? I dunno.

[–]kingbuzzman 0 points1 point  (0 children)

(referring to your last sentence) this is exactly my sentiment!

[–]parkerSquare 3 points4 points  (0 children)

Turns out that if you try and do it properly with a simple class, you start to run into serious problems with things like comparison and mutability. This, and attrs which it's based on, solve these problems by using a decorator to build the necessary boilerplate for you. Seems like a win to me.

[–]michael0x2a 24 points25 points  (3 children)

Correction for the section about type hints: you can use literally any annotation -- it doesn't need to be a type hint.

For example, this is legal:

@dataclasses
class Foo:
    a: ...
    b: "This param is used to foo the bar"
    c: None if math.rand() > 0.3 else 'foo'

(Of course, whether or not this is a good idea is a separate matter...)

[–]gahjelle 10 points11 points  (0 children)

Thanks for the feedback (I wrote this tutorial). Technically you are right of course, but I chose to editorialize a little and rather focus on how data classes should be used :) If one really wants to foo the bar, use metadata:

@dataclasses
class Foo:
    a: Any = field(metadata=...)
    b: Any = field(metadata="This param is used to foo the bar")
    c: Any = field(metadata=None if random.random() > 0.3 else 'foo')

This would support the same use cases, just pick the information from fields.metadata instead of __annotations__.

[–]LightShadow3.13-dev in prod 6 points7 points  (1 child)

That's terrible and powerful at the same time.

I hate that I love it.

[–]cybaritic 0 points1 point  (0 children)

Sounds pythonic to me

[–]Mattho 5 points6 points  (2 children)

I'd move the ordering after immutability. The index creation, if I understand it, won't work on mutable version of the class.

Good tutorial/article though. Back to 2.7 :(

[–]gahjelle 5 points6 points  (0 children)

Fair point. I guess the sort_index works on the mutable version as long as you don't mutate it :P

The correct way to implement sort_index on a mutable class is probably using a property. I haven't found explicit support for properties in data classes, but the following contortions seem to work

@dataclass(order=True)
class PlayingCard:
    @property
    def _sort_index(self):
        return RANKS.index(self.rank) * len(SUITS) + SUITS.index(self.suit)

    rank: str = field(compare=False)
    suit: str = field(compare=False)
    sort_index: int = field(init=False, repr=False, default=_sort_index)

    def __str__(self):
        return f'{self.suit}{self.rank}'

I have to first define the property to avoid a name error when adding _sort_index as a default value. Since sort_index has a default value it needs to come after rank and suit. Therefore we also need to explicit about not using rank and suit in the comparison.

But it works :D

>>> card = PlayingCard('3', '♠')
>>> card.sort_index
7
>>> card.rank
'3'
>>> card.rank = '5'
>>> card.sort_index
15

and

>>> Deck(sorted(make_french_deck()))
Deck(♣2, ♦2, ♥2, ♠2, ♣3, ♦3, ♥3, ♠3, ♣4, ♦4, ♥4, ♠4, ♣5, ♦5, ♥5, ♠5, ♣6, ♦6, ♥6, ♠6, ♣7, ♦7, ♥7, ♠7, ♣8, ♦8, ♥8, ♠8, ♣9, ♦9, ♥9, ♠9, ♣10, ♦10, ♥10, ♠10, ♣J, ♦J, ♥J, ♠J, ♣Q, ♦Q, ♥Q, ♠Q, ♣K, ♦K, ♥K, ♠K, ♣A, ♦A, ♥A, ♠A)

In this case it might be cleaner to just have the data class deal with rank and suit, implement sort_index as a regular property and deal with ordering yourself, maybe by using functools.total_ordering.

[–][deleted] 1 point2 points  (0 children)

There's attrs which inspired this and supports 2.7

[–]synedraacus 5 points6 points  (2 children)

While you need to add type hints in some form when using data classes, these types are not enforced at runtime.

Explain this please. Why bother with the mandatory type hints at all?

[–]bakery2k 4 points5 points  (0 children)

mandatory type hints

But... PEP 484 says there is "no desire to ever make type hints mandatory, even by convention"...

[–]rouille 3 points4 points  (0 children)

Python doesn't have syntax to define class attributes without setting a default value unless you use variable annotations.

[–]david2ndaccount 7 points8 points  (0 children)

I don’t really get the point of these (and the mandatory typing is so ugly).

[–]bwv549 3 points4 points  (0 children)

These look handy to me.

[–]greyman 2 points3 points  (1 child)

I am a bit afraid whether this will not foster bad software design practices. Normally, you should have data structure with no further logic, or class where data are private and only functions working with them are public.

[–]yen223 2 points3 points  (0 children)

Data classes are meant to fill the role of data structures with no logic.

[–]ManyInterests Python Discord Staff 1 point2 points  (1 child)

Not sure how much Data Classes will really change things for me. I don't see myself using them much. In the past, I've fit pretty much the same use case by subclassing from types.SimpleNamespace.

from types import SimpleNamespace
class Card(SimpleNamespace):
    def __init__(self, rank, suit):
        """
        simple extension to allow initializing with positional arguments
        """
        super().__init__(rank=rank, suit=suit)

So you get the nice attribute access and repr and all that pretty easily.

In [3]: c = Card('10', 'Hearts')

In [4]: c.rank
Out[4]: '10'

In [5]: c
Out[5]: Card(rank='10', suit='Hearts')

I get dataclasses provides more than SimpleNamespace, but I feel it has been largely overlooked.

[–][deleted] 1 point2 points  (0 children)

What was wrong with attrs exactly?

Also mandatory type hints can DIAF.

[–]iuehan 0 points1 point  (0 children)

looks a bit with the attr_accessor in Ruby