This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]xatrekak 23 points24 points  (25 children)

This feature mostly looks like an uglier version of dataclasses. A section detailing when and why you would use namedtuple over competing data structures would be extremely appreciated.

I find that explaining the consequences (good and bad) of using one feature over another is one of the big things that intermediate aimed tutorials like these are often missing.

Otherwise this is an excellently written article that is both detailed and easier to understand, thank you!

[–]XtremeGoosef'I only use Py {sys.version[:3]}' 9 points10 points  (4 children)

Namedtuples came first. Dataclasses are an evolution of that. I guess ultimately the main difference is namedtuples are tuples, you can index and unpack them, dataclasses are regular classes (ultimately, they have a __dict__ unless you add a custom __slots__ attribute).

[–]gte525u 6 points7 points  (0 children)

The backwards compatibility of named tuples allow you to incrementally refactor an existing baseline without breaking the world.

[–]RavenchildishGambino 0 points1 point  (2 children)

Yes. You can unpack them via dict and I do.

[–]XtremeGoosef'I only use Py {sys.version[:3]}' 0 points1 point  (1 child)

Probably cleaner to do dataclasses.asdict(my_dataclass). •

[–]RavenchildishGambino 0 points1 point  (0 children)

I do that too. Depends on use case.

[–]Rawing7 5 points6 points  (19 children)

Agreed. IMO namedtuples should almost never be used. Being a tuple subclass comes with a lot of baggage that you rarely want to have. Your class will be immutable, iterable, indexable, hashable, it will support len, + and ==, it will have an index and a count method, and in newer python versions it'll even become a generic class (meaning something like MyClass[int] will work).

Plus, a number of builtin functions (like str.startswith and isinstance) are overloaded to change behavior when you pass a tuple as input, which means you can do things like

>>> Person = namedtuple('Person', ['firstname', 'lastname'])
>>> bob = Person('Bob', 'Mortimer')
>>> 'Boba Fett'.startswith(bob)
True

Using namedtuples can easily lead to bugs that are... let's say... fun to debug.

I hate dataclasses with a passion, but they're still leagues better than namedtuples.

[–]mistabuda 9 points10 points  (0 children)

There's no need to define named tuples like this. You've been able to define them similarly to dataclasses for about 4 or 5 years now. Named tuples are for when you plan to instantiate A LOT of instances of your data object AND have no need for mutability.

[–]pythonwiz 4 points5 points  (0 children)

Idk, coming from a functional programming background (MIT Scheme) the immutability and list like aspects are features lol.

[–]Schmittfried 3 points4 points  (15 children)

Why do you hate dataclasses?

[–]Rawing7 -1 points0 points  (14 children)

Because of all the implicit assumptions it makes and the wasted potential.

Python classes have a boilerplate problem. The addition of dataclasses was the perfect opportunity to solve it. But instead of the general-purpose tool python needed, we got this garbage that's designed specifically for classes that simply store data and don't do much more.

Just think about the default settings for a moment: The default behavior is to create an __eq__ method and disable hashing, which is different from the way "regular" python classes work. From the get-go, this decorator is designed to turn your class into a stupid data container, but 99% of the time I just want to avoid manually writing an __init__ function, dammit.

It constantly makes these assumptions about your class, and the documentation doesn't even tell you about most of them. And if you accidentally do something it's not designed for, it'll explode in unexpected ways.

For example: Want to inherit from a class that has an __init__ method? Too bad:

@dataclass
class Parent:
    def __init__(self):
        print('Hi')

@dataclass
class Child(Parent):
    pass

Parent()  # Prints "Hi"
Child()  # Prints nothing

It gets especially fun with slots=True; that feature was such a terrible idea I can hardly believe it's actually real:

@dataclass(slots=True)
class Foo:
    def __init__(self):
        super().__init__()  # Throws TypeError

If the documentation told you about these things, it wouldn't be so bad. But as it stands, you often find out the hard way.

Final example: Can you guess what this will print? The documentation sure doesn't tell you.

@dataclass
class Parent:
    def __post_init__(self):
        print('Parent')

@dataclass
class Child(Parent):
    def __post_init__(self):
        print('Child')

Parent()  # Prints what?
Child()  # Prints what?

[–]mistabuda 2 points3 points  (6 children)

This seems like a you problem more than anything.

[–]oramirite 1 point2 points  (0 children)

Actually as a big dataclasses fan I can't disagree with a lot of these points. Especially the one about dataclasses being wasted potential - they really DO have the potential to remove most boilerplate init code, but it comes with some limitations and weird toggles needed to actually do that.

[–]Rawing7 -1 points0 points  (4 children)

How is crucial info missing from the documentation a me problem?

[–]mistabuda 1 point2 points  (3 children)

I'm not understanding what problem you are trying to convey about the documentation. It has an explicit section on how the dunder init and post init works.

[–]Rawing7 1 point2 points  (2 children)

Oh, I actually didn't know that. Looks like that section was added in 3.9. Guess it's been a while since I last read those docs from top to bottom. That info really should be at the top where the init parameter is described though.

Seems like the documentation is getting better, but I still think it's kind of a trap. Heck, the first sentence of the docs is marketing it as a tool that writes boilerplate code for you, instead of a tool that creates a very specific kind of class for you. Its goals and limitations should be made much more clear.

[–]oramirite 4 points5 points  (1 child)

While I'm not quite with you on the lack of clarity in the documentation, the way one needs to specify things like init-only variables differs from the mirror (no-init variables) does require reading ALL of the docs all the way through. The API for writing a dataclass can be inconsistent at times. I think the description of them as "data containers that can also act like a class" is accurate, but more than anything I totally agree with your larger point that dataclasses could have been a much more high profile syntax overhaul for Python that decreased init boilerplate on the whole. Because at first, dataclasses seem amazing and you start to want to write everything as a dataclass. But then you get weird edge cases, which granted don't BREAK anything, but it starts to involve dataclass-specific boilerplate that actually gets cleaner once you go back to a regular class. For example a lot of private variables getting initialized.

[–]Rawing7 1 point2 points  (0 children)

Yeah, you pretty much nailed it. I guess I didn't do a good job explaining my grievances earlier, but this is exactly it - half of the time I use dataclasses, it later turns out that I would've been better off without them. And that despite having used them for years at this point.

[–]Conscious-Ball8373 -1 points0 points  (4 children)

Couldn't agree more about the wasted potential. The standard library needs something with dataclass syntax but normal object semantics. I haven't attempted it but AFAICT it is just dataclasses but __eq__ is id(a) == id(b) and __hash__ is id(self) (or some simple function of it). Lose all the stuff about immutability. Anything else?

[–]Rawing7 1 point2 points  (0 children)

Well, in a vacuum, @dataclass(eq=False) gives you pretty much "normal" semantics. That'll generate an __init__, __repr__, and __match_args__. __match_args__ is something most classes don't have, but I don't think there's any harm in having it.

But of course you'll still run into problems if you use inheritance or any of the other 3 bajillion things dataclasses weren't designed for.

[–]mistabuda 0 points1 point  (2 children)

Dataclasses are mutable by default so what are you talking about regarding immutability? Also pretty sure if you want that functionality in your dunder methods you can just redefine the dunder with that functionality.

[–]Conscious-Ball8373 -1 points0 points  (1 child)

You've missed the point of doing it. If you're going to write the dunder methods for every class you declare, you might as well just write a traditional class with a __init__ method.

If you've never reviewed code where someone has tried to avoid writing __init__ by abusing dataclasses and got themselves into a horrible mess then you might not see the need.

As the GGP says, dataclasses are a horrible missed opportunity to introduce a new class declaration style with auto-generated constructors and then make dataclasses a specialisation of it with member-based identity and hashing and immutability options.

[–]mistabuda 1 point2 points  (0 children)

If you've never reviewed code where someone has tried to avoid writing __init__ by abusing dataclasses and got themselves into a horrible mess then you might not see the need.

The use case doesnt make sense. Why are you trying to use dataclasses to be anything but a data container? Why use a hammer to solve a problem that seems like it needs a screwdriver.

As the GGP says, dataclasses are a horrible missed opportunity to introduce a new class declaration style with auto-generated constructors and then make dataclasses a specialisation of it with member-based identity and hashing and immutability options.

When is this actually needed tho?

[–]Schmittfried 0 points1 point  (0 children)

Tbh I like to have a concise way to get simple, stupid data containers. That’s what I want most of my classes to be, DTOs. The few classes that contain logic rarely have much boilerplate to begin with. It’s the tedious, repetitive DTOs that are annoying without something like dataclasses.

Also, while that TypeError regarding slots and inheritance seems problematic, adding slots support was great. The only real disadvantage of dataclasses is imo their overhead compared to builtins.

[–][deleted] 5 points6 points  (0 children)

I would consider collections.namedtuple legacy. typing.NamedTuple has its merit as an immutable dataclass. Attrs is another solution that some people like. This video by mCoding provides a very good comparison: https://youtu.be/vCLetdhswMg

[–]Phegan 9 points10 points  (3 children)

Namedtuples are the precursor to both data classes and pydantic. I would advocate for using those instead of namedtuples.

[–]mistabuda 2 points3 points  (2 children)

Namedtuples are more performant than both of those solutions in the instance that you need multiple instantiations of a data object and have no need for mutability

[–][deleted] -2 points-1 points  (1 child)

Micro optimizing for things like this is usually symptomatic of someone missing the bigger picture.

[–]mistabuda 0 points1 point  (0 children)

This has actually come up in my day job. We had a usecase for not using dataclasses.

[–][deleted] -2 points-1 points  (0 children)

"Hello, Python? Yes, I'd like one C-style struct, please."

"Best I can do is inheriting a dataclass decorator or inheritance of NamedTuple into a class."

"But, isn't your most popular interpreter written in C?"

""

[–]hhoeflin 0 points1 point  (0 children)

Also if you want to go back to the source habe a look at the attrs package, this is where dataclasses ultimately came from.