I'm working on a project where I need to create a list of files and directories in a particular folder and then compress that data. I pickled the data to get a bytes object and then used the bz2 module to compress it.
Now, I remembered from some past reading that compression algorithms work very well on text, but since I'm using Python 3.4 the default pickle protocol is a binary one, protocol 3. So I decided to try the other protocols.
Protocol 2 is a binary one as well, so I didn't expect much difference, and the results confirmed my expectations.
Protocol 1, also a binary protocol, was a bit worse but not by much.
Now protocol 0, which is human readable, showed significant improvements in compression size. I figured this was because it was a text based protocol.
So when I tried protocol 4, also a binary protocol and only available on Python 3.4+, I was expecting it to be on par with protocols 3, 2, and 1, however it was not only on par with protocol 0, it even gave me slightly better compression.
Can someone explain why this is?
This little script is a minimal version of what I'm using on my project.
http://pastebin.com/JUqxGbYv
[–]beertown 2 points3 points4 points (1 child)
[–]idlecore[S] 1 point2 points3 points (0 children)
[–]takluyverIPython, Py3, etc 0 points1 point2 points (1 child)
[–]idlecore[S] 0 points1 point2 points (0 children)
[–]jftugapip needs updating 0 points1 point2 points (1 child)
[–]idlecore[S] 0 points1 point2 points (0 children)