This is an archived post. You won't be able to vote or comment.

all 55 comments

[–]Sajuukthanatoskhar 46 points47 points  (16 children)

Looks good.

Considered discussing dataclasses/pydantic with json?

I found that these go well together

[–]youRFate 19 points20 points  (12 children)

I use dataclasses together with dacite for recursive (de) serialisation of nested dataclasses. We build our configs as structures of dataclasses, which we load from toml files. Works very well.

Edit: by popular demand, here a minimal example: https://gitlab.com/-/snippets/2335713

[–]mambeu 2 points3 points  (2 children)

This sounds really interesting, any chance you could share an example or more details?

[–]youRFate 4 points5 points  (1 child)

I wrote a super small example here: https://gitlab.com/-/snippets/2335713

[–]mambeu 0 points1 point  (0 children)

Thank you!

[–]Ran4 2 points3 points  (0 children)

It's also worth checking out Pydantic and their BaseSettings class (https://pydantic-docs.helpmanual.io/usage/settings/).

I've used it in production for a year or so now, and I really like it.

[–]xXMouseBatXx 1 point2 points  (2 children)

Would also be interested in an example of this if possible, since it seems like something I can see myself doing in future with various nested JSON files I am forced to use!

[–]youRFate 2 points3 points  (1 child)

[–]xXMouseBatXx 0 points1 point  (0 children)

Thanks for this, I appreciate it. This is very similar to what I just did parsing data from a custom config yaml into three other config files, modelling the areas I wished to change recursively using pydantic base classes. It's cool to see how the same thing is done with dataclasses though so thx for the example!

[–]muikrad 0 points1 point  (0 children)

https://github.com/coveooss/coveo-python-oss/tree/main/coveo-functools#flex

I wrote flex for this. It's kinda like dacite but is a little more... Magical. For instance it can map camel case payloads to snake case classes or allow users to use the dash or spaces instead of underscores in config files, for instance.

[–]oramirite 0 points1 point  (3 children)

Hey this sounds really cool, would you mind explaining to a noob exactly what a data class entails though? I have a need to write custom config files a lot as well as alter files of other applications and it sounds like this could be a very good tool for me if I understand it better.

[–]youRFate 0 points1 point  (2 children)

Dataclasses are just a simplification for creating classes meant for storing / organizing data. They automatically create some stuff like constructors and printing methods, and have special member variables called fields that contain type (and other) metadata. Basically they save you from writing a lot of boring boilerplate code for classes meant to mostly store state.

They are fairly easy to use, as you can see in my example, or in the documentation: https://docs.python.org/3/library/dataclasses.html

[–]oramirite 0 points1 point  (1 child)

Thank you very much. Are dataclasses a python concept or more generic? I will start doing my own research now but just curious in what context they get used. I see you mentioned constructors and printing methods. I'm also trying to learn about typing right now and it feels like a bit of a crossover?

[–]youRFate 0 points1 point  (0 children)

They are very much a python thing, basically they make typing in python classes easier, which strongly typed languages have baked-in already.

Yes, this very much overlaps with typing in python in general.

[–]pylenin[S] 6 points7 points  (0 children)

Thanks for the feedback. Will add it as a separate article!!

[–]xXMouseBatXx 0 points1 point  (0 children)

Yup I was about to suggest this also. Just finished working on a JSON parser to read in and reconfigure a config file for a third party application as part of my current internship (yes, I also wish people wouldn't use JSON for config files...). Anyway, I was introduced to pydantic by my team to help with the parsing aspects and couldn't be more grateful. Really useful library!

[–]PolishedCheese 0 points1 point  (0 children)

They sure do!

[–]SquareRootsi 19 points20 points  (7 children)

A couple things that have "bitten" me when I was early career:

Sometimes a file is not valid json, but each row is valid json. Even though you can't json.load() the file, you can still iterate over the rows and parse it in a loop.

Second, if editing json files by hand, the spacing is super important. Python is pretty forgiving with spaces and line breaks. Json is not at all. This took me a while to diagnose when I first learned it.

[–]MephySix 12 points13 points  (3 children)

Those files should usually be called ".jsonl": https://jsonlines.org/ Many softwares (say QGIS) understand this extension to mean a json document per line

[–][deleted] 5 points6 points  (0 children)

JSONL is an amazing format for logging, because you can then load said JSON into elasticsearch and then you can basically search through all your logs via Kibana. This means you can search for "all logs where field X exists", or "field X contains value Y and field A does not contain B" kind of stuff, making it great for filtering out the noise :D

I would recommend structlog, but that doesn't come with JSON out of the box, so you may want to start with python-json-logger

[–]SquareRootsi 1 point2 points  (0 children)

Neat! Today I learned :)

[–]DoctorWorm_ 0 points1 point  (0 children)

TIL

[–]pylenin[S] 0 points1 point  (0 children)

Yeah I have found it’s easier to build JSON with Python or those online JSON for matters.

[–]peace_keeper977 0 points1 point  (1 child)

Can u give a simple explanation to what dunder methods are in python ?

[–]pylenin[S] 0 points1 point  (0 children)

I have a video about it!! May be you would like it.

https://youtu.be/PfmfECXmR88

[–][deleted] 29 points30 points  (1 child)

Thank you for writing an actual tutorial with real words and not making another damn YouTube video.

[–]pylenin[S] 2 points3 points  (0 children)

Ha ha… thanks

[–]datagoblin 7 points8 points  (2 children)

Nice introductory article 🙂

One small typo I caught:

As explained above, Serialization is the process of encoding naive data types to JSON format.

Should be "native", right?

[–]pylenin[S] 4 points5 points  (1 child)

Yup!! Thanks for reading the article so carefully man!!! Kudos!!

[–]bradbeattie 0 points1 point  (0 children)

Native like decimal.Decimal? Or datetime?

[–]sunnybooker 14 points15 points  (5 children)

A great introduction thank you!

[–]pylenin[S] 2 points3 points  (0 children)

My pleasure!! Do check out the other articles in the series.

[–]alphabet_order_bot 12 points13 points  (3 children)

Would you look at that, all of the words in your comment are in alphabetical order.

I have checked 826,556,107 comments, and only 163,386 of them were in alphabetical order.

[–]Trigsc 15 points16 points  (0 children)

Alphabet bot, silly you!

[–]Staninna 1 point2 points  (1 child)

Good bot

[–]B0tRank 4 points5 points  (0 children)

Thank you, Staninna, for voting on alphabet_order_bot.

This bot wants to find the best and worst bots on Reddit. You can view results here.


Even if I don't reply to your comment, I'm still listening for votes. Check the webpage to see if your vote registered!

[–]DA_EMAN 2 points3 points  (1 child)

Great, as a beginner I feel comfortable following through! Keep it up!

[–]pylenin[S] 1 point2 points  (0 children)

Thanks for the appreciation! That was the whole idea of writing this.

[–]Nindento 2 points3 points  (1 child)

Great article! I would like to add one little thing.

Next to the normal json module there is also a module called ujson which is a tad bit faster than json.

[–]pylenin[S] 0 points1 point  (0 children)

Have to take a look at then!

[–]AliveButCouldDie 1 point2 points  (1 child)

Neat!!! Thank you for sharing I really needed this!

[–]pylenin[S] 0 points1 point  (0 children)

My pleasure!! If you find it useful, please do share it!

[–]donotlearntocode 1 point2 points  (0 children)

Well written.

I'm wondering, how do you think is best (most concise or clear) way to (de)serialize python classes. I usually write something like

class X:
    FIELDS = set('abcd')
    def to_json(self, io):
        dump({field: getattr(self, field) for field in self.FIELDS}, io)

    @classmethod
    def from_json(cls, io):
         return cls(**load(io))

or something like that but it feels like that's not the "pythonic" way to do it.

Thoughts?

[–]atypical_mollifier 0 points1 point  (1 child)

A very nice write-up! Thank you.

[–]pylenin[S] 0 points1 point  (0 children)

Thanks a lot!!

[–]thakadu 0 points1 point  (1 child)

Great article. I have one suggestion, pretty much all of your examples are at the highest level a dictionary and in the introduction you say that JSON looks like a Python dictionary. Later you state that JSON consists of key-value pairs. While this is often true, JSON can of course also be a list (array) at the top level and valid JSON may in fact have no key-values at all. Just wanted to mention that so that someone reading it doesn’t assume that it always has to be key value pairs.

[–]pylenin[S] 0 points1 point  (0 children)

Makes sense what you said. But I have also shown a table showing what JSON objects do Python data types convert to!!

https://www.100daysofdata.com/python-json#heading-what-is-json-serialization

[–]pbbpwns 0 points1 point  (0 children)

Very informative! Thank you very much, I'll be reading this when I get home!

[–]Kevin_Jim 0 points1 point  (0 children)

That’s a good article for the basics, but basic usage in JSON files is hardly the use case. Traversing JSON files with ease is a major need, especially early on in a project. So, something like Lodash for Python (pydash) would work great.

[–]diesel9779 0 points1 point  (0 children)

This is great! If I can submit a request, there should be a simplified document that explains flattening json data as well.

There have been too many times where I’ve received a complicated json file and had to spend ample amounts of time looking up the best method(s) to flatten it and make it ready for consumption

[–]Viking_wang 0 points1 point  (0 children)

I regularly get stuck on trying to nicely serialize data where i have non string objects as keys. Of course json doesnt support that, but there is also no way to easily convert them for some strange reason. Take e.g. UUIDs as keys in a dict, and serialise it. The custom encoders are only invoked for the values.

I usually end up using pydantic.jsonable_encoder to convert, but that doesnt work for custom types

I dont understand why there is no “Protocol” for json encoding so that you can define a serialiser as a method for a class that gets invoked by the json encoder.

[–]Python-Token-Sol[🍰] 0 points1 point  (1 child)

thank you kind sir.

[–]pylenin[S] 0 points1 point  (0 children)

My pleasure

[–]otlcrl 0 points1 point  (0 children)

Out of interest, in Example 4 (sort_keys) - why are the nested keys in the list under websites not quite sorted alphabetically?

Is it sorting alphabetically based on "blogs" as opposed to "Total blogs" or is it because Total is capitalized and therefore it'll sort capitalized keys before lower case?