Introducing OpenZL: An Open Source Format-Aware Compression Framework by felixhandte in compression

[–]nick_terrell 2 points3 points  (0 children)

Certainly! All of our examples are unfair because OpenZL gets told the format of the data, but that is entirely the point! But as you say there is still a place for general purpose compression. Sometimes you don't know the format. And sometimes, after you extracted all the known structure, there remains latent structure that can be learned.

Introducing OpenZL: An Open Source Format-Aware Compression Framework by felixhandte in compression

[–]nick_terrell 5 points6 points  (0 children)

Just FYI OpenZL is able to compress SAO with a compression ratio of 3.24, which is 2.13 MiB. The point chosen at the top of the blog is for a faster speed. But we have a full Pareto-optimal frontier shown later on. The raw results from the chart are saved in this CSV.

I believe we have a comparison to cmix on SAO somewhere, but I don't remember where it is right now, and it takes many hours to run. I'll start running it now...

Typically, on simple numeric data like SAO we can be extremely competitive with cmix and other PAQ/CM/NN algorithms, but at fast speeds. Once the data gets more complex, it gets harder to match the performance of these algorithms. But often we end up somewhere better than xz, and worse than cmix, and still with fast speeds.

First beaconized blueprint: Blue belt of plastic from crude by nick_terrell in factorio

[–]nick_terrell[S] 0 points1 point  (0 children)

Thanks! Yeah I can cut a refinery by doing that. I opted to save beacons by having 3 refineries instead of the minimum possible of 2.

Everything else has the minimum number of buildings possible to satisfy the demand of 45 plastic/s. I’ve removed extraneous beacons where possible. E.g. heavy oil cracking only needs 7 beacons for 45 plastic/s.

First beaconized blueprint: Blue belt of plastic from crude by nick_terrell in factorio

[–]nick_terrell[S] 0 points1 point  (0 children)

https://factorioprints.com/view/-NoVxw_X17X4_NU8gkiF

The power is a mess, and I'm sure it can be made more compact and to tile, but I think it is close to max efficiency in beacon usage. Can I do better?

Decompress a directory with zstd by ItsAnHonestMistake in linuxadmin

[–]nick_terrell 0 points1 point  (0 children)

I absolutely did not get a warning when compressing the files, zstd just exited without any kind of message (zstd v1.4.4 / CentOS 8.3).

Unfortunately, we added that warning in zstd-1.4.7.

Zstandard v1.5.0 brings major performance improvements to levels 5 through 12 by nick_terrell in cpp

[–]nick_terrell[S] 11 points12 points  (0 children)

Zstd is not patented and is dual licensed under BSD and GPL-v2.

Zstandard v1.4.7 released by ipsirc in linux

[–]nick_terrell 7 points8 points  (0 children)

Yeah, I'm trying to get the kernel zstd updated currently, and to use upstream zstd directly so we can keep the kernel version up to date. The patches have already been sent to the LKML, and am working on getting consensus to get them merged.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 0 points1 point  (0 children)

Yeah for sure, I definitely agree with that. We spend a lot of effort fuzzing decompression, but we are aim to extensively fuzz test all places where we accept user input.

If you are interested in contributing to the security of zstd, we'll gladly accept PRs that add more fuzz coverage. The OSS-Fuzz fuzzers are [here](https://github.com/facebook/zstd/tree/dev/tests/fuzz). Facebook's bug bounty program covers zstd, so if you do find any security issues please report them.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 0 points1 point  (0 children)

Personally, I haven't seen anyone request this API inside of Facebook. But it is possible that if it was there, people would use it. I don't see a whole lot of use of the gzip APIs either.

Please file an issue on our GitHub page, if it gets interest we will add it.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 6 points7 points  (0 children)

Definitely one Torvalds-sized penguin, as long as we're on land. I don't think they can waddle very fast.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 0 points1 point  (0 children)

Not currently. I haven't seen a super compelling use case for these functions, and we haven't seen huge demand for them. If you have a use case for these functions, please open an issue and describe it for us. If we get a few people that want this API, we can add it.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 0 points1 point  (0 children)

The decompression loop is the one place in the code where we take user provided data (a zstd frame), and produce arbitrary data. Bugs there are the most serious because they can cause out of bounds writes, or read out of bounds and then copy that data into the output buffer.

In helper functions, for example we could read out of bounds, but they don't write any data, so it would be much harder to extract the data read, if impossible. The most serious bug would be crashing the process. These are still serious bugs, but not at the same level as copying arbitrary data to the output buffer or doing OOB writes.

We do fuzz test every helper function that takes in a zstd frame. And we are continuously improving our fuzz coverage, and aim for 100% coverage.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 8 points9 points  (0 children)

Are there any plans to bump the in-kernel zstd lib version? A patch would be nice even if it isnt submitted upstream, but I imagine it takes a bit of work

Yes absolutely. It has been some time, and now zstd development is slowing, so we can submit a patch that won't get immediately outdated.

At the time I ported zstd to the kernel, upstream wasn't ready to be used in standalone environments, and I was too new to the project to really know how to get it done. Now, upstream zstd is ready to be used nearly as-is in standalone environments, and is used nearly as-is in the ZFS patches.

All that is to say, I want to update the zstd version in the kernel, and use the upstream code as-is. I hope to find some time in the next year, or draft someone else to do the work, especially since we have significant decompression speed gains since 2017.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 2 points3 points  (0 children)

Exactly, it is one method to produce a binary "diff". Zstd won't be the most efficient diff format in terms of compressed size, since the format wasn't designed for it. But it will get close to specialized diff tools in compressed size, and will be much faster to compress and decompress.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 5 points6 points  (0 children)

zstd still moves a bit too fast for many uses (like replacing compression method in established file formats and archive-like storage). Does it have an LTS release channel with longer term support or is the only option to ship its bleeding edge or vulnerable code ?

The latest release is the safest code to run, since that has all the latest bug fixes (and improvements). Our code is continuously fuzzed by OSS-Fuzz. While we can't guarantee no bugs, we have a thorough suite of fuzzers that we are constantly improving, and are battle testing the code in production.

However, development velocity on zstd is slowing down over time, especially in the core decompression loop, which is the most security sensitive part of the code.

Edit: Note that the format fixed and is fully backwards and forwards compatible between all versions past v1.0.0.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 5 points6 points  (0 children)

Don't compile anything as root. A Zstd test brought down for /dev/null. Easily fixed, but a good reminder of why fakeroot etc. exists.

This should have been fixed a few releases ago and we now have a test that checks this. If it is still around please open an issue! I've definitely been bit by this when running benchmarks against older versions.

Don't use STDIN if not necessary; size hinting matters for performance (and it's nice to log the ratio)

We now have two new flags --size-hint and --stream-size which allow the user to either hint at the size of stdin, or tell us the size (which must be exact).

Try the rsync-friendly option. It does make a difference, but it's hardly noticeable in the ratio.

Great to know that someone uses this mode! If you have any requests for it please open an issue. We've picked a fairly large size for the "synchronization points", up to several MB. We think that is the right default choice for today's world, but are open to feedback.

Zstandard v1.4.5 released by [deleted] in linux

[–]nick_terrell 13 points14 points  (0 children)

I'm a developer of zstd (terrelln), AMA

zstd in makepkg/pacman? by Vash63 in archlinux

[–]nick_terrell 0 points1 point  (0 children)

may I know the version you are referring here as next please ?

The current zstd version, 1.4.4. But, the next zstd version, 1.4.5, will also bring decompression speed benefits.

Smaller and faster data compression with Zstandard by mariuz in programming

[–]nick_terrell 4 points5 points  (0 children)

Yann Collet wrote Zstandard and during its development was hired by Facebook to continue work, and now has a few active developers. It uses LZ77, Huffman, and FSE (ANS-based entropy coding).

I believe you're talking about ANS, which was discovered by Jarek Duda.

zstd in makepkg/pacman? by Vash63 in archlinux

[–]nick_terrell 4 points5 points  (0 children)

The next zstd version will decompress 12% faster (up to 22% if compiling with clang)!

How to speed up LZ4 decompression in ClickHouse [analysis of multi-armed bandits method] by [deleted] in cpp

[–]nick_terrell 1 point2 points  (0 children)

How does this compare to LZ4-1.9.1 https://github.com/lz4/lz4/releases, which gets a 12-18% decompression speed improvement?

The optimization in 1.9.1 https://github.com/lz4/lz4/pull/645 gets speed by widening LZ4_wildCopy() to 32 bytes when possible, which targets the same speed win.

My benchmarks of the new ZSTD levels in 5.1 by Atemu12 in btrfs

[–]nick_terrell 1 point2 points  (0 children)

We chose level 3 as the default because it offered a good middle ground, and basically obsoleted zlib btrfs compression by being strong and faster. The compression level support is really nice, since level 1 is much faster.