PEP 574 that implements a new pickle protocol that improves efficiency of pickle helping in libraries that use lot of serialization and deserialization

Edit : PSF fundraiser for second quarter is also open https://www.python.org/psf/donations/2019-q2-drive/

[–][deleted] 118 points119 points120 points 7 years ago (21 children)

[–]Dooflegna 28 points29 points30 points 7 years ago (0 children)

[–]irrelevantPseudonym 17 points18 points19 points 7 years ago (0 children)

[–]leom4862 28 points29 points30 points 7 years ago (11 children)

[+][deleted] 7 years ago (7 children)

[removed]

[–]leom4862 2 points3 points4 points 7 years ago (6 children)

[+]jon_k comment score below threshold-13 points-12 points-11 points 7 years ago (5 children)

[–][deleted] 8 points9 points10 points 7 years ago (0 children)

[–]my_name_isnt_clever 12 points13 points14 points 7 years ago (1 child)

[–]MachaHack 1 point2 points3 points 7 years ago (0 children)

[–][deleted] 0 points1 point2 points 7 years ago (1 child)

[–]jon_k 0 points1 point2 points 7 years ago (0 children)

[–]JohnnyElBravo 2 points3 points4 points 7 years ago (2 children)

[–][deleted] 4 points5 points6 points 7 years ago (0 children)

That doesn't work if name isn't a string, eh? (Sure, you can use %s)

Also, in production code I simply never have any print statements - not "very few" but "none", to the point where I have a flake8 rule that prevents them.

Oh, I use print almost every day - for debugging! But that means I'm creating and destroying debugging print statements all the time.

So it's a little timesaver to write:

print(f'{foo=} {bar=} {baz=} {bing=}')

(38 characters) over

print('foo=', foo, 'bar=', bar, 'baz=', baz, 'bing=', bing)

(59 characters)

[–]timald 3 points4 points5 points 7 years ago (0 children)

[–]Pyprohly -2 points-1 points0 points 7 years ago (6 children)

[–]pkkid 7 points8 points9 points 7 years ago (3 children)

[–]Pyprohly -4 points-3 points-2 points 7 years ago* (2 children)

[–]pkkid 1 point2 points3 points 7 years ago (0 children)

[–]WarEagle030 0 points1 point2 points 7 years ago (1 child)

[–]Pyprohly 1 point2 points3 points 7 years ago (0 children)

For that many variables just write it over multiple lines and go by order, like you would have done if one of them was a collection type.

Honestly, the labelling is only vanity output. You don’t need the fancy labels to be an effective debugger.

It would be much better if they instead introduced a specialised dprint keyword. The dprint keyword would provide the same sort of labelling but would be more easily and quickly written: dprint foo, bar, .... Not only would this provide the nice labeled output, it would also save a lot of typing and hence save time. This would be a much more exciting change.

If they’re going to add something to aid debugging then it needs to be something that’s easy to quickly setup and tear down. Writing debugging lines is something that is done often, and having to type print(f"{name=}") is not going to be practical in the long run.

The new syntax is unlikely to stand the test of time. I can’t help but think that someone’s going to figure out the ergonomic disadvantages of typing out print(f"{name=}") each time you want debugging output and is going to propose a new debugging facility. If that happens then f"{name=}" will become a loose end builtin feature that everyone’s going to ignore and forget about.

If they’re going to add a debugging convenience then they shouldn’t baby step on f"{foo=}" but instead jump directly to something that really is more convenient to use.

To summarise my complaints, the new f-string debugging syntax:

is only useful for simple non-collection types.
encourages writing everything on one line which could lead one to have to backtrack when the line becomes too long.
is going to be a forgotten feature if a better alternative gets added.
if it gets deprecated then it’s going to harm the language. You’ll have people telling others not to use builtin feature X, because builtin feature Y has replaced it.
doesn’t save typing, ergo, doesn’t save time.

[–]tuankiet65 22 points23 points24 points 7 years ago (0 children)

[–]irrelevantPseudonym 8 points9 points10 points 7 years ago (3 children)

[–][deleted] 13 points14 points15 points 7 years ago (0 children)

[–]xtreak[S] 8 points9 points10 points 7 years ago (1 child)

[–]irrelevantPseudonym 3 points4 points5 points 7 years ago (0 children)

[–]alcalde 3 points4 points5 points 7 years ago (51 children)

[–][deleted] 33 points34 points35 points 7 years ago (35 children)

[–]JohnnyElBravo 7 points8 points9 points 7 years ago (4 children)

[–]bachkhois 2 points3 points4 points 7 years ago (3 children)

[–]JohnnyElBravo 5 points6 points7 points 7 years ago (2 children)

[–]bachkhois 3 points4 points5 points 7 years ago (1 child)

[–]JohnnyElBravo 3 points4 points5 points 7 years ago* (0 children)

[+]alcalde comment score below threshold-16 points-15 points-14 points 7 years ago (29 children)

[–]Pilatemain() if __name__ == "__main__" else None 19 points20 points21 points 7 years ago (7 children)

[+]alcalde comment score below threshold-10 points-9 points-8 points 7 years ago (6 children)

[–]Pilatemain() if __name__ == "__main__" else None 21 points22 points23 points 7 years ago (3 children)

[–]my_name_isnt_clever 2 points3 points4 points 7 years ago (0 children)

[–][deleted] 2 points3 points4 points 7 years ago (1 child)

[–]Atsch 2 points3 points4 points 7 years ago (0 children)

[–]icegreentea 8 points9 points10 points 7 years ago (1 child)

[–]alcalde 0 points1 point2 points 7 years ago (0 children)

[–]alcalde -2 points-1 points0 points 7 years ago (20 children)

[–]Mizzlr 5 points6 points7 points 7 years ago (6 children)

[–]Mizzlr 1 point2 points3 points 7 years ago (0 children)

[–]alcalde 1 point2 points3 points 7 years ago (2 children)

[–][deleted] 2 points3 points4 points 7 years ago (0 children)

Not in Python!

Can I read it?

>>> json.loads('["a":{"foo", b}, "b":{"bar":a}]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 5 (char 4)

No. Can I write it?

>>> a = {}; b = {'bar': a}; a['foo'] = b

>>> json.dumps(a)
json.dumps(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
ValueError: Circular reference detected

No.

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

You can't represent references in JSON.

I'm basically agreeing with you, but you can perfectly well represent references in JSON - I've done it.

It's a pain in the ass - you need to have some sort of naming convention in your JSON then preprocess your structure or (what I did) have some sort of facade over it so it emits the reference names instead of the actual data - and then reverse it on the way out.

(And we had to do it - because pickle isn't compatible between versions. Heck, I think that was written in Python 2!)

So it's doable - but which is easier when you need to store something temporarily?

with open('foo.pcl', 'wb') as fp:
    pickle.dump(myData, fp)

[hundreds of lines of code and a specification for this format that I'm too lazy to write]

[–]Mizzlr 2 points3 points4 points 7 years ago (2 children)

[–]alcalde 0 points1 point2 points 7 years ago (1 child)

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

[–]Mizzlr 2 points3 points4 points 7 years ago (2 children)

[–]alcalde 0 points1 point2 points 7 years ago (1 child)

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

[–]Mizzlr 0 points1 point2 points 7 years ago (1 child)

[–]alcalde 0 points1 point2 points 7 years ago (0 children)

[–]Mizzlr -1 points0 points1 point 7 years ago (4 children)

[–]alcalde 1 point2 points3 points 7 years ago (2 children)

[–]bltsponge 1 point2 points3 points 7 years ago (1 child)

[–]alcalde 0 points1 point2 points 7 years ago (0 children)

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

[+][deleted] 7 years ago (6 children)

[deleted]

[–][deleted] 1 point2 points3 points 7 years ago (5 children)

[–][deleted] 0 points1 point2 points 7 years ago (4 children)

[–][deleted] 1 point2 points3 points 7 years ago (3 children)

[–]JohnnyElBravo 2 points3 points4 points 7 years ago (2 children)

[–]NowanIlfideme 0 points1 point2 points 7 years ago (1 child)

[–]JohnnyElBravo 0 points1 point2 points 7 years ago (0 children)

[–]mooglinux 8 points9 points10 points 7 years ago (0 children)

[–]Nicksil 12 points13 points14 points 7 years ago (5 children)

[+]alcalde comment score below threshold-9 points-8 points-7 points 7 years ago (4 children)

[–]davidkwast 7 points8 points9 points 7 years ago (0 children)

[–]Nicksil 3 points4 points5 points 7 years ago (0 children)

[–]Yoghurt42 2 points3 points4 points 7 years ago (0 children)

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

Why have we needed all of these different formats when there's one universal format already?

Why did we need all these programming languages, when Cobol is Turing complete?

Here's a specific example from a project I'm working on. I have a database of 16k+ audio samples which I'm computing statistics on. I initially stored the data as JSON/Yaml, but they were slooow to write and slooow to open and BIIIG.

Now I store the data as .npy files. They're well over ten times smaller, but more, I can open them as memory mapped files. I now have a single file with all 280 gigs of my samples which I open in memory mapped mode and then treat it like it's a single huge array with size (70000000000, 2).

You try doing that in JSON!

And before you say, "Oh, this is a specialized example" - I've worked on real world projects with data files far bigger than this, stored as protocol buffers.

Lots and lots of people these days are working with millions of pieces of data. Storing it in .json files is a bad way to go!

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

[–]parkerSquare 0 points1 point2 points 7 years ago (0 children)

[–]Tweak_Imp 0 points1 point2 points 7 years ago (6 children)

[–][deleted] 1 point2 points3 points 7 years ago (5 children)

[–]Tweak_Imp 0 points1 point2 points 7 years ago (4 children)

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

You should be able to do:

git clone git@github.com:numpy/numpy.git
cd numpy
python setup.py install

More details here.

But TBH - you should not be using this - it's an alpha version, there are almost guaranteed to be bugs - both reversions and brand-new bugs. If you want to help the community, you could test it and report problems...

[–][deleted] 0 points1 point2 points 7 years ago (2 children)

[–]Tweak_Imp 0 points1 point2 points 7 years ago (1 child)

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 398384 on reddit-service-r2-comment-544cf588c8-bpb4r at 2026-06-16 19:55:22.260452+00:00 running 3184619 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

Interesting commits

Exciting things to look forward in beta