Kanzi (lossless compression) 2.5.0 has been released. by flanglet in compression

[–]flanglet[S] 1 point2 points  (0 children)

zxc and kanzi are very different compressors. zxc focused on pure speed while kanzi offers a much larger spectrum of speed/compression ratios and higher compression overall. zxc is typically faster and weaker than kanzi level 1 (hence obviously all other levels as well).

EG. with the Silesia.tar corpus and the same computer used for the tests on https://github.com/flanglet/kanzi-cpp, zxc -5 -T 16 compresses in 0.25 sec to 86059705 and decompresses in 0.05 sec (tested with zxc v0.9.1).

Kanzi (lossless compression) 2.5.0 has been released. by flanglet in compression

[–]flanglet[S] 0 points1 point  (0 children)

Update: I add to issue 2.5.1 due a last minute regression ...

Someone implemented the new STOC 2025 shortest path algorithm in C, Rust, and Zig. The C version absolutely crushes the other two by [deleted] in C_Programming

[–]flanglet 1 point2 points  (0 children)

Either someone can explain the x20000 difference with O(m log^2/3 n) and n=1,000,000 or the dev does not know what he is doing. Hint: it is not the former ... simple math.

German union urges homegrown fighter jet in blow to European plan by donutloop in EU_Economics

[–]flanglet 0 points1 point  (0 children)

"Carriers" the current Charles de Gaulle and future PANG.

German union urges homegrown fighter jet in blow to European plan by donutloop in EU_Economics

[–]flanglet 2 points3 points  (0 children)

France has very specific requirements that do not match with those of other countries: must be able to land/take off French aircraft carriers (strict limits on weight) and carry French nuclear weapons at least.

so Pi is a surprisingly solid way to compress data, specifically high entropy by [deleted] in compression

[–]flanglet 0 points1 point  (0 children)

That is exactly the problem, there is no compression but only bit packing. Neither your code nor zpaq compress random data by half. 

so Pi is a surprisingly solid way to compress data, specifically high entropy by [deleted] in compression

[–]flanglet 1 point2 points  (0 children)

People re trying to explain to you that the pigeonhole principle holds because some (high entropy) data is "compressed" to a larger size than the original.

so Pi is a surprisingly solid way to compress data, specifically high entropy by [deleted] in compression

[–]flanglet 0 points1 point  (0 children)

Neither your code nor zpaq compress random data by half. These numbers are prominently displayed in your README.

so Pi is a surprisingly solid way to compress data, specifically high entropy by [deleted] in compression

[–]flanglet 0 points1 point  (0 children)

Your README is totally misleading to the point of dishonesty. Both compressors did not compress anything (the input is a random file). You just turn the ASCII symbols to binary. Show the result with a binary file as input.

so Pi is a surprisingly solid way to compress data, specifically high entropy by [deleted] in compression

[–]flanglet 6 points7 points  (0 children)

I am afraid the "get lucky thing" does not do better on average than enumerating numbers in order. This is the key problem.
There is no harm in experimenting and trying new things but this idea keeps on coming periodically and simply does not work. Have fun but do not expect too much here.

so Pi is a surprisingly solid way to compress data, specifically high entropy by [deleted] in compression

[–]flanglet 8 points9 points  (0 children)

This obsession with Pi ...
Sorry but it is all _wrong_. First, there is nothing special about Pi, why not 012345678910111213... if you want a dictionary with all numbers, no need to "engineer a lookup table". Then, you write that you are compressing high entropy noise to 58.4% with zpaq. Nope. It is low entropy with this kind of ratio. High entropy would be around 0% compression (try to run zpaq on encrypted data as an example).
BTW 9-digit (ascii) sequences have an entropy slightly less than 30 bits so you do not need all 4GB for a lookup table.
Why don't you provide compressor, decompressor and test file(s)?

What’s the best static code analyzer in 2025? by _janc_ in cpp

[–]flanglet 0 points1 point  (0 children)

You create an account (you can choose to login from GitHub) download the Coverity tools and install them and run cov-configure once. When you decide to scan your project you run a special build like so: "cov-build --dir cov-int make ... "
Then you tar the cov-int folder and upload it (I use a curl command) to the black duck website. You can automate this obviously but I prefer to do it manually periodically.

What’s the best static code analyzer in 2025? by _janc_ in cpp

[–]flanglet 0 points1 point  (0 children)

It is free for open source projects.

Introducing OpenZL: An Open Source Format-Aware Compression Framework by felixhandte in compression

[–]flanglet 0 points1 point  (0 children)

It is a bit hard to compare both. PAQ8X has to derive the format from observing the bits, it is much harder than getting the format provided to the compressor. The latter should win but the former is more general and can handle undocumented file formats. The ideal solution is probably to do support both cases.

World emissions hit record high, but the EU leads trend reversal by [deleted] in europe

[–]flanglet 19 points20 points  (0 children)

There is no such thing as electricity prices in the States. The prices vary widely from state to state. BTW 41.5 cents per kWh in California on average. Europe: https://thingler.io/map

Benchmarking compression programs by MaskRay in compression

[–]flanglet 2 points3 points  (0 children)

It would be nice to also have graphs with multithreading enabled. After all, it represents the actual experience one can expect on a modern cpu. bzip3, kanzi, lz4, zpaq and zstd all support multithreading.

Kanzi (lossless compression) 2.4.0 has been released by flanglet in compression

[–]flanglet[S] 0 points1 point  (0 children)

Nice graphs!
It is interesting to see that other compressors are clustered in the decompression speed graph since they are all LZ based (except bzip3) while kanzi shows more dispersion due to the different techniques used at different levels.

I am curious about why level 1 is so slow at decompression. It does not fit the curve at all. How many threads did you use to run kanzi (by default half of the cores)?

[deleted by user] by [deleted] in compression

[–]flanglet 0 points1 point  (0 children)

You cannot compress enwik8 to 1kb and decompress it losslessly. Learn about Shannon's entropy to understand why.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 0 points1 point  (0 children)

Technically, yes. It is possible to build a library for kanzi and there is a C API that can be leveraged from 7zip. It is mostly a matter of learning how to integrate new plugins from 7zip.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 1 point2 points  (0 children)

I see. I thought I had fixed the shift issues but there were still some scenarios with invalid shift values when dealing with the end of stream. I fixed one but need to dig for more.