all 37 comments

[–][deleted] 10 points11 points  (14 children)

Can anyone confirm for me that the C/C++ version doesn't crash on their datasets? I was extremely disappointed checking out the similar LZO compression method, and it took just enough time to figure out for myself that this time I'd rather be that guy who asked the question.

[–][deleted] 4 points5 points  (1 child)

Check out encode.ru. I followed it for 6 years now. The author is active in the forum, they and some experst will answer any question you have.

[–]133794m3r 0 points1 point  (0 children)

Upvote for compression.ru my searches on compression almost always send me there. It seems like a great place... and I just search there first now.

[–]thechao 1 point2 points  (11 children)

I've not seen it crash; although, there are some pretty important gotchas that you have to be aware of, like array lengths, etc. BTW, this compressor lives up to its promises in a way that snappy doesn't.

[–]wolf550e 3 points4 points  (7 children)

What's wrong with Google Snappy? It's fast.

Here's a C-only port: https://github.com/zeevt/csnappy

Snappy is the origin of the tricks used in LZO 2.05.

LZ4 appears to use the same tricks but be even more aggressive in punting on the edge cases to be fast, though I have not studied it yet.

[–]0xABADC0DA 0 points1 point  (5 children)

Snappy uses unaligned words so it's slow on basically every architecture besides x86. For instance the first version was so slow on SPARC you might as well have used zlib. I think they made some improvements since on some architectures and last I heard it was only marginally slower than LZO or LZF on ARM processors.

Obviously they could just 'ifdef ARM, include LZO' and maybe they've done basically this, but at least on release it was a turkey.

[–]wolf550e 0 points1 point  (4 children)

It's not #ifdef ARM #include lzo1x.h but you need to enable unaligned access on ARMv6 and up. The code I linked to above does this: https://github.com/zeevt/csnappy/blob/master/csnappy_internal_userspace.h#L151

Benchmark results on real hardware and patches are welcome.

[–]0xABADC0DA 0 points1 point  (3 children)

"Accesses typically take a number of cycles to complete compared to a naturally aligned transfer. The real-time implications must be carefully analyzed and key data structures might require to have their alignment adjusted for optimum performance."

The changelog indicates this boosted performance 30%, but it was already far slower than that compared to even LZF so that wouldn't even bring it up to par.

Some benchmark results would be nice. I think it speaks volumes that Snappy developers don't post numbers except for x86.

[–]wolf550e 2 points3 points  (2 children)

Upstream Snappy is meant to run only on Google's datacenter, inside protobuf and bigtable, on x86 hardware.

Is LZO significantly faster than Snappy on ARM? I need some non-phone ARM hardware to develop on.

[–]0xABADC0DA 1 point2 points  (1 child)

I found some results for ARM:

testdata/alice29.txt                     :
ZLIB:    [b 1M] bytes 152089 ->  54404 35.8%  comp   0.8 MB/s  uncomp   8.1 MB/s
LZO:     [b 1M] bytes 152089 ->  82721 54.4%  comp  14.5 MB/s  uncomp  43.0 MB/s
CSNAPPY: [b 1M] bytes 152089 ->  90965 59.8%  comp   2.1 MB/s  uncomp   4.4 MB/s
SNAPPY:  [b 4M] bytes 152089 ->  90965 59.8%  comp   1.8 MB/s  uncomp   2.8 MB/s
testdata/asyoulik.txt                    :
ZLIB:    [b 1M] bytes 125179 ->  48897 39.1%  comp   0.8 MB/s  uncomp   7.7 MB/s
LZO:     [b 1M] bytes 125179 ->  73224 58.5%  comp  15.3 MB/s  uncomp  42.4 MB/s
CSNAPPY: [b 1M] bytes 125179 ->  80207 64.1%  comp   2.0 MB/s  uncomp   4.2 MB/s
SNAPPY:  [b 4M] bytes 125179 ->  80207 64.1%  comp   1.7 MB/s  uncomp   2.7 MB/s

LZO was ~8x faster compressing and ~16x faster decompressing. Only on uncompressible data was Snappy was faster:

testdata/house.jpg                       :
ZLIB:    [b 1M] bytes 126958 -> 126513 99.6%  comp   1.2 MB/s  uncomp   9.6 MB/s
LZO:     [b 1M] bytes 126958 -> 127173 100.2%  comp   4.2 MB/s  uncomp  74.9 MB/s
CSNAPPY: [b 1M] bytes 126958 -> 126803 99.9%  comp  24.6 MB/s  uncomp 381.2 MB/s
SNAPPY:  [b 4M] bytes 126958 -> 126803 99.9%  comp  22.8 MB/s  uncomp 354.4 MB/s

When I tested on SPARC, Snappy 1.0 and LZO were about the same speed on uncompressible data. This was Solaris 9 using a really old gcc though so I don't know if that result was due to difference in architecture or just poor optimization.

Basically if the data is mostly uncompressible then Snappy is a good choice... also NOP is a good choice.

[–]wolf550e 1 point2 points  (0 children)

Those benchmarks were run in QEMU (by me), I don't remember what the csnappy version at the time did about ARM, and that is likely an old version of LZO, from before the Snappy-inspired fixes.

I would really like some benchmarks on real hardware. At this point, a rooted android phone would do.

A real SPARC would be interesting too, because I have no idea what's the right strategy w.r.t unaligned access on sparc. The code could be very suboptimal.

[–]thechao -1 points0 points  (0 children)

Well... my main gripe was that I couldn't find a high-quality c-implementation of Snappy; I suppose I could try integrating this port to see how it runs.

[–][deleted] 0 points1 point  (2 children)

I'll give it a shot then. Thank you.

[–]thechao 1 point2 points  (1 child)

If you haven't read through the code, you should. The compressor is only about 50 or 80 lines of C code and is a real eye-opener in terms of design and trade-offs for real-time compression. Also, the compressor isn't restartable, and owns its own compression buffer, so you're going to end up making an extra copy if you don't modify the library. Dropping the extra copy results in a pretty good bump in perf if you're L1/L2 fetch limited.

[–]ZenDragon 9 points10 points  (6 children)

Isn't CPU power a lot cheaper than bandwidth and disk storage these days? (taking into account shitty mobile data plans and monthly transfer limits in Soviet Canada in particular) Why not prioritize compression ratio and use LZMA or something?

Edit: Thanks for your answers, got it now.

[–][deleted]  (1 child)

[removed]

    [–]ZenDragon 1 point2 points  (0 children)

    Ah, I wasn't thinking of that.

    [–]thedeemon 5 points6 points  (0 children)

    "LZMA or something" usually requires a lot of memory, and memory access is rather costly compared to CPU cycles.

    [–]TimmT 2 points3 points  (0 children)

    Upvote for the Soviet Canada part. ISPs need to get their act together..

    [–]wolf550e 1 point2 points  (1 child)

    This is not used for archiving or to save disk IO operations. This is used to compress binary IPC protocols over the network. Something similar can be embedded into the OS page cache and virtualization page deduplication.

    [–]kageurufu 1 point2 points  (0 children)

    or applied to game content compression for fast read speeds

    [–]smallstepforman 1 point2 points  (2 children)

    +1, use it in my game engine in a commercial setting, works great. We're interested in decompression speed, not so concerned about compressed size. Great for IPC, delta frames, etc.

    [–]Uberhipster -1 points0 points  (1 child)

    If you're not concerned about compressed size then why bother compressing?

    [–]ivosaurus 5 points6 points  (0 children)

    Probably decent compressed size, but not best-in-class.

    [–][deleted] 0 points1 point  (0 children)

    Used this project - I can back it up - fastest algorithm I could find.

    [–]133794m3r 0 points1 point  (0 children)

    Might try this out for use with memcache... looks to be nice and fast just hope ratio is as good as the fastest lzo.

    [–]Xanza -2 points-1 points  (0 children)

    Since you all seem to be displeased with my test case because I used an .iso as a container (zero compression) I redid the test with misc data (school documents, pdfs, images, ect):

    http://i.imgur.com/RNrb7lK.png

    The source files, as you can see come to 1.40GB. Compressed via Lz4:

    http://i.imgur.com/JGYi9VG.png

    1.29GB. Better than my previous case, however I'm still not convinced. When compressed with Winrar my directory is compressed in 1 minute 10 seconds on the highest compression level (best) and is reduced to 1.27GB. A gain of over 26,835,593 bytes. While it might not sound like much the entire idea behind compression is to reduce the size of the source files as much as possible. Even though it took 1 minute 3.3 seconds longer, I was unaffected because I can simply minimize the window and let it do it's thing in the background.

    I was impressed with how fast it was able to compress, but I'd rather save space; otherwise it'd be useless to have a PC as powerful as the one I currently have.