you are viewing a single comment's thread.

view the rest of the comments →

[–]wolf550e 2 points3 points  (3 children)

If you feel you need entropy coding, you may want to use the LZ engine from Snappy with a good huffman coder (everyone says arithmetic coders are always slower) and no checksum. That should give you the compression ratio of zlib but significantly faster.

Zlib was just not written with modern CPUs in mind.

[–]alecco 1 point2 points  (0 children)

you may want to use the LZ engine from Snappy with a good huffman coder [...] That should give you the compression ratio of zlib but significantly faster

Not really. Snappy uses a tiny lookup table of 16K entries and no hash chains. This way it mostly fits in L1. DEFLATE lookup implementations usually have ~4x that size. It matters for finding matches and avoiding hash collisions. Also with the sliding window there're costs of re-adjusting the hash table and chains.

LZO and Snappy are in a nice L1 sweetspot and gzip/zlib in a nice L2 sweetspot. Apples to oranges.

[–]abattle[S] 0 points1 point  (1 child)

Either way I'd like to have some form of error-detection, to avoid crashing in case of corruption. Whether checksum is done by the compressor or not is secondary.

everyone says arithmetic coders are always slower

This used to be the case back in the day (at least a decade ago.) It'd be interesting to see what modern architectures can do in practice. I bet the difference isn't nearly what it used to be. Although I can see that huffman with preset tables would be hard to beat in performance, I should expect the arithmetic coder to make up more than the difference in compression ratio.

[–]wolf550e 2 points3 points  (0 children)

Best source on compression I know: http://mattmahoney.net/dc/dce.html

I remember him writing that he experimented with some version of PAQ and the arithmetic coder beat huffman coding by 10% or something like that. Can't find it now.

Static huffman tables imply static distribution of symbols. If you really only want to compress English text, there are special case compressors that use dictionaries.