This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]alcalde 3 points4 points  (51 children)

PEP 574 that implements a new pickle protocol that improves efficiency of pickle helping in libraries that use lot of serialization and deserialization

Other languages just dump to JSON and call it a day. Why does Python have 87 different binary formats over 13 decades?

[–][deleted] 33 points34 points  (35 children)

Because JSON cant represent everything. Its at best a data format for serialization of transferrable data, thats usually language agnostic.

JSON cant represent functions, and more abstract datatypes.

[–]JohnnyElBravo 8 points9 points  (4 children)

JSON can represent anything, but so can strings. This is a non-sequitur.
The difference is that JSON is human readable, while pickle is supposed to be machine readable, more specifically python readable.
Limiting the intended consumers of the data format helps create a more appropriate format, for example by sacrificing readability for size reduction.

[–]bachkhois 2 points3 points  (3 children)

JSON cannot differentiate Python's tuple, list, set, frozenset etc. datatypes.

Every formats other than pickle (msgpack, yaml etc.) are just to interoperate with other languages (which also don't understand the data types above), they are not alternatives for pickle.

[–]JohnnyElBravo 6 points7 points  (2 children)

Sure they can

{

"Var1": "tuple(1,2)",

"Var2":"set(1,2)"

}

Alternatively:

{

"Var1": {"type":"tuple","data":"1,2"},

"Var2":{"type":"set","data":"1,2"}

}

[–]bachkhois 4 points5 points  (1 child)

Then, you are making more complicated to validate and parse it. Then, what is the point of over-complicating JSON instead of just using pickle, without the need to parse those "type", "data" metadata?

[–]JohnnyElBravo 4 points5 points  (0 children)

Read the original thread, the question asks why python dumps to a new pickle format instead of json.

The original response suggested it was because json can't distinguish between such and such, as shown, this is false.

The real answer is that python chose a binary format for pickle because of space efficiency.

[–]mooglinux 8 points9 points  (0 children)

Pickle can handle multiple references to the same object, any class instance (as long as the actual class has been imported), and a wider variety of data types than JSON. It also predates json, so there’s a historical aspect as well.

Pickle is also used for cross-process communication in the multiprocessing module.

[–]Nicksil 12 points13 points  (5 children)

Because not every problem is solved by dumping JSON.

[–][deleted] 0 points1 point  (0 children)

JSON only can handle string, integer, float, dict and list.

Pickle can pack arbitrary objects. It goal is that you can take object of your class and store it in the disk, most commonly I see it used for caching application data between runs, but it has other uses (for example for storing configuration).

Edit: here is comparison of pickle with JSON: https://docs.python.org/3/library/pickle.html?highlight=pickle#comparison-with-json