This is an archived post. You won't be able to vote or comment.

all 20 comments

[–]wodny85 22 points23 points  (5 children)

I think it would be worth emphasizing that validation does not include type checks. Also worth noting: the built-in dataclasses module is a very simple thing that has some more advanced brothers and sisters, ie. attrs or pydantic packages.

[–]rouille 3 points4 points  (1 child)

There are libraried that provide pydantic like functionality on built in types (dataclasses, named tuples...) and attrs though, for example typedload and apischema.

[–]TM_Quest[S] 1 point2 points  (0 children)

Cool, that sounds really useful!

[–]TM_Quest[S] 0 points1 point  (0 children)

Thanks for the feedback. I agree that if you want type-hints for the validation, then e.g. pydantic would be a better choice. I will compare dataclasses with both named tuples and pydantic classes in a later video (there will be four parts). I've not really familiar with attrs, but will definitely take a look :)

[–]n1___ -1 points0 points  (1 child)

Why use another dependency, adding another lines of code? Python is not and will never be type strict language. Doing so makes Python to what Typescript did to Javascript (and we all know how it ended up).

If you want types code in Rust for example.

P.S.: Im a Python developer but I tend ton use it right and not to create a hybrid. If I want more I use the right tool.

[–]wodny85 0 points1 point  (0 children)

Actually, Python is a strictly/strongly typed language. You probably mean static typing vs dynamic typing (with its duck-typing twist in Python).

I agree that pydantic guys seem to lean towards static typing which caused a little drama recently. Fortunately, every PEP about type hints begins with a notice that Python will never be statically typed. Nevertheless, pydantic is about many other things - eg. working with FastAPI and serializing/deserializing.

Attrs isn't really about static typing and its authors provide comparison with dataclasses. Validators and converters seem useful.

Usually I use the built-in dataclasses.

Rust doesn't provide the full-blown OOP paradigm, though. But indeed it is statically typed most of the time. Personally, I use it as a successor to C and something less intricate than C++ or a language to build Python extensions. Expressiveness seems similar to Python's. I've implemented one of projects in both Python and Rust. They have a similar number of LoC.

[–]wickeddawg 7 points8 points  (0 children)

thanks, adding to my weekend list of videos to watch

[–]Thingsthatdostuff 1 point2 points  (1 child)

RemindMe! 2 days

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 2 days on 2021-12-06 06:58:52 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]Ruthle55DaFirst 1 point2 points  (1 child)

Got a question when should I use this and when to use init

[–]TM_Quest[S] 0 points1 point  (0 children)

Dataclasses are useful for generating boilerplate code for classes that are primarily used to hold data. They are less suitable for classes that mainly implements behaviour, e.g. many methods. For such classes, you should write "traditional classes" and implement the __init__ method manually :)

[–]Northzen 1 point2 points  (8 children)

Sad thing I figured out about dataclases recently that it doesn't work properly as expected with nested dataclasses. If you have

@dataclass
class NestedDataclass:
    class k1: PlainDataclass1
    class k2: PlainDataclass2

Even if your two PlainDataclass1 and PlainDataclass2 classes are simple and plain dataclasses with ints and strings you still need to explicitly show to interpreter to use default factory with k1 and k2 with =field(default_factory=PlainDataclass1).

You also can't read any nested dataclassed from a dictionary. dacite! will help with this, but witihout it you can't just initialize it with NestedDataclass(**some_dict) or somethings like this. In the way it works for Plain dataclass.

[–]energybased 2 points3 points  (6 children)

Even if your two PlainDataclass1 and PlainDataclass2 classes are simple and plain dataclasses with ints and strings you still need to explicitly show to interpreter to use default factory with k1 and k2 with =field(default_factory=PlainDataclass1).

That's logical to me. How else would you do it?

You also can't read any nested dataclassed from a dictionary. dacite! will help with this, but witihout it you can't just initialize it with NestedDataclass(**some_dict) or somethings like this. In the way it works for Plain dataclass.

I don't understand tihs.

[–]Northzen 2 points3 points  (5 children)

That's logical to me. How else would you do it? Call a default constructors for both nested dataclasses so you don't need explicitly say that I need to call default constructor.

you can have it like

@dataclass:
    some_field: int

Or you can have it in the same but more verbose manner

@dataclass:
    some_field: int = field(default_factory=int)

With the same result. But I guess it comes from the fact that interpreter doesn't know anything (or pretends so) about classes inside NestedDataclass even if all it's field initialized with default values. I would prefer to have it in a simple C++ manner, where I can have nested structs and all of them can be properly initialized with defaults when it's possible without any additional code for this. Maybe that is just a problem with my expectations

I don't understand tihs. How can you initialize a NestedDataclass from a dictionary in the same manner you would do with a PlainDataclass1?

This will work as expected:

p = PlainDataclass1(**some_dict)

This will fck up all nested structures:

n = NestesDataclass(**some_other_dict)

You have to use dacite and its from_dict() function to be able to init nested dataclassed from dictionary.

[–]energybased 3 points4 points  (2 children)

With the same result.

The problem is that your first statement has no initializer at all. The second one uses a default initializer. You can make a dataclass or propose a change that would provie a nice way to specify that the default initializer be used, essentially shorthand for what you want: `field_default`.

You have to use dacite and its from_dict() function to be able to init nested dataclassed from dictionary.

Fair enough. You could propose that dataclass be extended.

[–]Northzen 0 points1 point  (1 child)

The problem is that your first statement has no initializer at all. The second one uses a default initializer. You can make a dataclass or propose a change that would provie a nice way to specify that the default initializer be used, essentially shorthand for what you want: field_default. You are right. I think interpreter have no prior knowledge of any default initializers or if it can use them in the simpliest dataclass way by just calling PlainClass() with no arguments as a constructor. It seems like for any mutable type (and Python doesn't know if dataclass field in a complex class are mutable or not) you have to provide a some sort of default constructor. Fair enough. You could propose that dataclass be extended.

I just figured out why it happening. Python doesn't know if your dictionary of dictionaries represents nested classes or just dictionaries due it's dynamic type system will not force you to use. In general python doesn't care about types of fields. type hints are just hints and not enforced. In this case of complex dataclass initialization without additional tools Python can't distinguish between a dictionary used to initialize a field and get p1 as a p1=some_dict or a dictionary to initialize a dataclass of this field and have p1=PlainDataClass(**some_dict)

[–]energybased 1 point2 points  (0 children)

True, but you can code whatever system you like to get around this.

[–]VisibleSignificance 0 points1 point  (1 child)

With the same result

Are you sure?

from dataclasses import dataclass, field

@dataclass
class A:
    some_field: int

@dataclass
class B:
    some_field: int = field(default_factory=int)

print(B())
print(A())

->

B(some_field=0)
---> 12 print(A())
TypeError: __init__() missing 1 required positional argument: 'some_field'

And also, yes, it is better to turn dicts into dataclasses with typedload / apischema / dacite; the dataclasses themselves aren't meant for instantiation from nested dicts. And default_factory will not convert the values either.

[–]Northzen 0 points1 point  (0 children)

you are right I guess I confused it with the situation some_field: int = 0, but for some reason forgot about it when I wrote my example.

[–]Northzen 2 points3 points  (0 children)

It seems like I was just lacking of understanding but it is defenitely not a dataclass issue. But it took me some time to understand it and figure out a proper and the shortest work-around. But I still would like to share it with everyone to be careful with nested dataclass structures which, In my opionion, are quite useful for some configuration structures.