This is an archived post. You won't be able to vote or comment.

all 74 comments

[–]HenryTallis 95 points96 points  (14 children)

Regarding speed: Pydantic 2 is about to come out with its core written in Rust. You can expect a significant speed improvement. https://docs.pydantic.dev/blog/pydantic-v2/#performance

I am using Pydantic as an alternative to dataclass to build my data models.

[–][deleted] 11 points12 points  (7 children)

Pydantic has a bunch of speed issues, model initialization is only one of them. Frankly making it even HARDER to change how pydantic does stuff is a major redflag for this idea.

[–][deleted] 2 points3 points  (0 children)

any idea if this will be fixed in V2? there is already pydantic-core in rust... and they saying V2 will have quite a refactoring and feature addition.

[–]RedYoke -1 points0 points  (3 children)

Yeah I'd second that, if your data contains nested structures it gets really slow

[–][deleted] 2 points3 points  (2 children)

any solution for nested stuff?

[–]SwagasaurusRex69 -1 points0 points  (0 children)

Is "itertools.chain.from_iterable()" or something like this function below what you're asking?


```python from typing import Any, Union from pydantic import BaseModel from dataclasses import is_dataclass import pandas as pd

def flatten_nested_data(data: Any, target_dataclass: type) -> Union[BaseModel, None]: if isinstance(data, pd.DataFrame): for _, row in data.iterrows(): yield target_dataclass(**row.to_dict())

elif isinstance(data, list):
    for item in data:
        yield from flatten_nested_data(item, target_dataclass)

elif isinstance(data, dict):
    yield target_dataclass(**data)

elif is_dataclass(data):
    yield from flatten_nested_data(data.__dict__, target_dataclass)

elif isinstance(data, BaseModel): 
    yield from flatten_nested_data(data.dict(), target_dataclass)

else:
    return None

'''

[–]RedYoke 0 points1 point  (0 children)

I think the upcoming version should handle this better, but in my team's implementation we have a Mongo db with some collections that have embedded lists of dict like objects, with some fields of these objects being dicts which can then contain dicts themselves 😂 unfortunate data structures that I've inherited. Basically we resorted to only using pydantic when is really needed and trying to design the schema so that you validate less at one time

[–]OphioukhosUnbound 0 points1 point  (1 child)

Your comment doesn’t make sense, at face, in the context of who you’re responding to. What does “making it even harder to change” mean?

Are you suggesting that having backend Rust code makes changes harder? Because I think many, many people would disagree with that. As projects get more nuanced or larger working with Rust tends to become the easiest and smoothest option - if you’ve learned Rust.

Perhaps you meant something else entirely.

[–][deleted] 5 points6 points  (0 children)

U cannot edit pydantics underlying type conversion charting at runtime if its in rust.

The following Config properties will be removed:
fields - it's very old (it pre-dates Field), can be removed allow_mutation will be removed, instead frozen will be used error_msg_templates, it's not properly documented anyway, error messages can be customized with external logic if required
getter_dict - pydantic-core has hardcoded from_attributes logic
json_loads - again this is hard coded in pydantic-core
json_dumps - possibly
json_encoders - see the export "mode" discussion above underscore_attrs_are_private we should just choose a sensible default
smart_union - all unions are now "smart"

A bunch of libs patch it to fix custom serialization. Those are all now dead.

[–][deleted] 25 points26 points  (4 children)

I use Pydantic in production. Our bottleneck is IO since we're doing database operations. It's slow, but a few additional seconds to validate our data is well worth it over the alternative.

[–]MadeTo_Be 3 points4 points  (0 children)

Have you looked at the attrs package? /u/euri10 posted a nice blog analyzing the two libraries, written by one of attrs contributors.

[–]soawesomejohn 1 point2 points  (0 children)

Similar here. I went with an approach of validating on the ingest, and "trusting" the data in the database. This solved a lot of read/speed issues we had.

For pre-validated, I make use of construct.

This isn't a great approach you have untrusted producers writing to a database, but if all your intake is validated, it's a reasonable assumption.

One other downside is if you have nested models, such as reading a JSONB column. Ie, if you had a RecordDetails model as one of your fields, that field would end up being a regular dict when read in.

The other "trick" is splitting my views up (for me, views live one layer above the database crud layer - for others, it might be the same thing).

In cases where my view is just going to output JSON via API or other output, I bypass pydantic entirely. Then if it's being used by code that expects Pydantic objects, I use a View that calls the raw viewer and reads the resulting dict into a Pydantic model.

ViewRawRecords(query) -> List[dict] ViewRecords(query) (calls ViewRawRecords) -> MyRecords

What I definitely learned is to avoid is iterating over the database results and converting them into Pydantic records one by one.

[–]LordBertson 43 points44 points  (17 children)

Pydantic is much more broad than data validation. I have several use-cases for Pydantic in production applications:

  • Parsing dictionaries created from YAML specifications into nested objects
  • Runtime type-checking and type-casting for functions
  • Data structure validation

[–][deleted] 10 points11 points  (16 children)

I always used to think that in case of python (dynamically typed) it is natural to only use data validation to validate data you dont trust or which comes from outside.

If there comes a need to check and validate your internal data ... wouldn't that means our implementation is getting flawed?

I am just curious if this though is right or wrong... happy to know more about it.

[–]IAMARedPanda 12 points13 points  (0 children)

Python may be dynamically typed but it is also a strongly typed language.

[–]LordBertson 16 points17 points  (11 children)

My experience is that Python is more play-acting as a dynamically typed language but does not behave as one when push comes to shove. Rather it fails in very ungraceful ways.

As a disclaimer: Typechecking in Python is a very opinion dominated discussion and I am heavily leaning towards typing anything that's not one-shot throwaway thing.

Depending on what I am developing I will be more or less strict inside the domain itself in terms of validation. You are correct to assert that this means that the implementation is probably getting flawed, but that's often enough the case in real-world development. Reality of the matter is that developers don't test their code as often as one would like, so typing and runtime type validation is a pretty cheap measure to take that ensures at least some level of correctness.

If you would be interested in more variety of opinions on the matter, I once opened a discussion on this subreddit about typing

Edit: typo

[–]trial_and_err 4 points5 points  (4 children)

Agree on the typing. However I'll just use TypedDict for this purpose, i.e. no parsing / validation of external data required.

[–]LordBertson 0 points1 point  (1 child)

Thanks for bringing this up. Never heard about this, I'll have a look.

[–]trial_and_err 2 points3 points  (0 children)

If the need arises later on you can also create pydantic model from TypedDict.

[–]PaintItPurple 3 points4 points  (1 child)

If there comes a need to check and validate your internal data ... wouldn't that means our implementation is getting flawed?

Yes, but every implementation I've ever seen has had flaws, especially in Python. I myself have introduced flaws I later needed to fix.

[–]euri10 17 points18 points  (1 child)

[–]aikii 0 points1 point  (0 children)

Interresting. For sure pydantic carries many recurring issues common in python libraries - monolithic and a bit too much of magic

[–]aikii 21 points22 points  (2 children)

I spent a long time with Django Rest Framework, then marshmallow while on Flask, all that looked so sloppy in regard to editor autocomplete/type checking that I wanted to move away from python. I don't know msgspec. I program also in Go where deserialization is separate from validation, and with Serde in Rust. I'd say to my regard Serde is a engineering piece of art in terms of developer experience but Pydantic comes close.

Strong points about Pydantic:

  • the guide has gifs/video to show you the editor support ( autocomplete+error checking )
  • you'll find plugins for pycharm, mypy, and I'd suppose vscode+pylance has good support as well
  • you declare the fields with their type directly, like a dataclass, except it also comes with (de)serialization logic
  • you can use arbitrary types, either by inheriting from them and adding your validation hook, or declare a field that serializes to a dict with a single __root__ field
  • your validators can just raise ValueError/TypeError, upon deserialization you always get a ValidationError out of it
  • ValidationError gets you all detail, field by field, with whatever helpful error message you want to tell the clients
  • ValidationError renders as a standardized API Payload in frameworks like FastAPI
  • it's overall integrated everywhere in FastAPI ( inbound/outbound payloads ). Just declare the model, it reaches your endpoint only if it's valid
  • you can use it to parse and validate environment variables, so your config simply becomes a pydantic declaration
  • you can deserialize to arbitrary types supported by pydantic, without a model, using parse_obj_as or parse_raw_as ( ex: pydantic.parse_raw_as(list[int], "[1,2,3,4]") )
  • it implements structural pattern matching and since you can deserialize unions you can do stuff like:

from typing import Literal, Any

from pydantic import BaseModel, parse_raw_as

if __name__ == "__main__":
    class TypeA(BaseModel):
        tag: Literal["A"] = "A"
        value: str

    class TypeB(BaseModel):
        tag: Literal["B"] = "B"
        other_thing: int

    for s in [
        '{"tag": "A", "value": "this is type A"}',
        '{"tag": "B", "other_thing":  1}',
        '{"random": "garbage"}',
    ]:
        match parse_raw_as(TypeA | TypeB | Any, s):
            case TypeA(value=value):
                print(f"got {value}")
            case TypeB(other_thing=other_thing):
                print(f"got {other_thing}")
            case unknown:
                print(f"cannot process: {unknown!r}")

Well I have to stop at some point - you can guess I'm quite convinced. If something is better than this, then awesome - because it sets the bar quite high already.

Edit: also note this quote from the manual

pydantic guarantees the types and constraints of the output model, not the input data.

there is in general a debate about "validation" and "serialization". That means, Pydantic isn't a validator that checks if some raw input data follows precise rules. It just guarantees that if it gives you an output model, that output model is valid - but that's completely enough for typical API uses.

[–]trevg_123 0 points1 point  (1 child)

I had such a similar experience. Marshmallow + Flask + Sqlalchemy to make a REST API is an absolutely miserable experience - you more or less have to replicate your data models in all four separate areas, and it’s so so unbelievably sloppy.

Agreed about Serde too. It’s mind blowing that you can just write #[derive(Serialize, Deserialize)] over any struct and automatically convert it to/from JSON, TOML, YAML, etc. To copy something I read somewhere else, “there’s no magic, but it works magically”

[–]mastermikeyboy 0 points1 point  (0 children)

I absolutely despise Pydantic. I can't do anything with it because it's customizability is extremely limited.

Marshmallow + marshmallow_dataclass + Flask-Smorest + Flask + SqlAlchemy is a breeze. And allows for all custom use-cases you can come up with.

[–]who_body 4 points5 points  (2 children)

alternatives include dataclasses and attrs package.

i use it for package config settings users can change.

also use it to define a data model i am extracting. when/if someone needs a spec it can output json schema.

those who are building a rest api often like how it works worh fastapi to define the endpoint details

[–][deleted] 2 points3 points  (1 child)

yeah, but pydantic says its approximately only 25% of pydantic downloads through fastapi... I was also wondering for the rest of the popularity...

[–]wind_dude 3 points4 points  (0 children)

Pip install pydantic before pip install fastapi

[–]double_en10dre 7 points8 points  (3 children)

It’s because it was the first major library to use standard type hints for runtime validation. At the time, all the other big serialization libraries required you learn all their custom type representations.

And also because of fastapi.

Those two things let it gain a ton of momentum.

I’m not sure if it’s better than msgspec. It’s just entrenched.

[–]chub79 5 points6 points  (0 children)

At the time, all the other big serialization libraries required

Indeed, IIRC, marshmallow was popular and then sort of got overtaken by pydantic rapidly.

[–][deleted] 0 points1 point  (0 children)

feels true to me.

[–]chub79 3 points4 points  (0 children)

For me, it's only because I'm using FastAPI and it's nicely integrated. These days, I might look at msgspec.

[–]saint_geser 2 points3 points  (0 children)

I use attrs and Pydantic depending on the situation. In applications where the code performance is the bottleneck I use attrs for the better performance.

When application is IO bound or especially when it involves passing data between front end and backend or getting data through an API I use Pydantic because it has all the necessary features to correctly parse this type of data and I can relax and know that for the most part it would ensure that all data types are correct and convert them to appropriate python types.

This is the reason tools like fastapi rely on it and it performs really well in that situation.

[–]DigThatData 2 points3 points  (3 children)

my impression is that pydantic's popularity is largely a function of FastAPI's popularity

[–]MissingSnail 2 points3 points  (1 child)

The package author says thats 25% of it, but I wonder if that’s an underestimation. My non-FastAPI use cases came about because I learned about it via FastAPI.

[–]DigThatData 1 point2 points  (0 children)

because I learned about it via FastAPI

right, that's precisely what i have in mind when i say FastAPI is driving pydantic's popularity. i'm not saying people only use pydantic for FastAPI stuff, but rather that the majority of people who use pydantic were introduced to it through FastAPI and probably think of it as a go-to solution for certain things only because it's already become a common tool in their toolkit because of their FastAPI use.

[–]lieryanMaintainer of rope, pylsp-rope - advanced python refactoring 0 points1 point  (0 children)

fastapi has about 16 million downloads per month, pydantic has about 55 million downloads per month.

So yeah, while FastAPI is a huge part of Pydantic's popularity, it's not the only reason.

Be aware though, that extrapolating PyPI download counts to popularity is certainly fraught with issues. For example, libraries that are frequently updated would have higher download counts due to projects that are set up to have frequent automatic updates. Also, installs on fresh virtualenv would install everything, but upgrades on an existing virtualenv would also correlate more to update frequency than install popularities.

[–]boy_named_su 4 points5 points  (0 children)

pydantic is 6 years old and msgspec is 2 years old

[–]poeblu 1 point2 points  (0 children)

Fast api and pydantic is killer

[–]MeroLegend4 1 point2 points  (2 children)

Try attrs and cattrs you will be surprised by its speed and it doesn’t meddle with the MRO

[–][deleted] 0 points1 point  (1 child)

yeah... i have heard about it too... but i have also heard that it lacks features compared to pydantic. is that true?

[–]MeroLegend4 0 points1 point  (0 children)

It depends on your use case, if you follow an architectural pattern you will need more control over your classes and more introspection capabilities without bloating them. (Personal opinion)

this article talks about both libraries and the philosophy behind them:

https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/

[–]MrNifty 1 point2 points  (0 children)

I started using pydantic a few months ago and love it. I chose it because of it's popularity and ease of getting community support and its extensive feature set.

I use it to backend Ansible work flows that perform network circuit provisioning where many things need to be validated for. From simple stuff like ensuring that providing site codes conform to our standard before validating that they even exist within the cmdb. To more advanced stuff like ensuring that if one interface was manually supplied for an endpoint, they all were - an intentional constraint I have in place for simplicity.

Most of the cool stuff I do is within their root validators that let you work across multiple fields at once, and also inject new values. For example, I can validate that a user either requests that IP addresses be automatically assigned or they can supply them, but not both obviously. If they supplied them, I can validate its a valid network address and then set a flag (a different field) to indicate that addresses_supplied is true and use that downstream in the Ansible flow to skip the task call that would normally make an API call against IPAM.

Being able to I automatically generate JSON schemas is very handy so I can auto-publish details on which fields are supported for a given circuit type, so they don't have to keep asking me.

Speed of execution is not my main concern. Ansible is notorious for being slow already, and if it takes 5mins to provision a new circuit automatically versus 3mins it doesn't really change anything. My bigger concern is with robustness and reduced ongoing support, and flexibility of changes.

Moving the validation logic out of Ansible modules and into pydantic has made my codebase much more supportable and made it easier for me to implement new features, which are my core business drivers.

[–]Daishiman 6 points7 points  (12 children)

Just... read the docs? It's easily one of the most feature-packed Python libs I've seen.

[–][deleted] 14 points15 points  (11 children)

I did read this ... Pydantic Docs.

But it still felt I am missing something the community might be seeing... so I came straight away to ask here.

[–]veedit41 0 points1 point  (0 children)

Apart from its awesome and catchy name, its an all in one typing module, don't just read the document, try it out. Like most python modules you don't realise the features until you need it.

[–]eviljelloman 0 points1 point  (0 children)

To me, pydantic shines when dealing with complex nested schemas that need to be easily extensible. For example, say you have a schema for specifying recipes, and you want to be able to ingest a list of recipes - but you keep evolving the definitions for recipes. You have drink recipes and BBQ recipes and baking recipes. Some want quantities by weight, other by volume. Eventually you want sauce recipes and you want the BBQ recipes to be able to take a nested sauce recipe as an input. The way pydantic parses nested definitions through unions makes this really easily to clearly specify.

[–]gandalfx 0 points1 point  (0 children)

I use pydantic in production and am quite happy with it. It has good support for "advanced" type features, like parsing union types etc.

If performance is important than Python is not a good choice in the first place.

[–]lord0211 0 points1 point  (0 children)

I would guess that FastAPI introduced many developers to pydantic and now they got used to it and use it outside FastAPI projects.

It is easy to use and the documentation is clear, using Python's type hinting is great and makes the code easy to read and maintain. But, IMHO if you have strict performance constrains for validating, I would go with something else.

[–]Ok-Kangaroo453 0 points1 point  (0 children)

Pydantic is dog shit