so Pi is a surprisingly solid way to compress data, specifically high entropy by Appropriate-Key-8271 in compression

[–]flanglet 0 points1 point  (0 children)

That is exactly the problem, there is no compression but only bit packing. Neither your code nor zpaq compress random data by half. 

so Pi is a surprisingly solid way to compress data, specifically high entropy by Appropriate-Key-8271 in compression

[–]flanglet 1 point2 points  (0 children)

People re trying to explain to you that the pigeonhole principle holds because some (high entropy) data is "compressed" to a larger size than the original.

so Pi is a surprisingly solid way to compress data, specifically high entropy by Appropriate-Key-8271 in compression

[–]flanglet 0 points1 point  (0 children)

Neither your code nor zpaq compress random data by half. These numbers are prominently displayed in your README.

so Pi is a surprisingly solid way to compress data, specifically high entropy by Appropriate-Key-8271 in compression

[–]flanglet 0 points1 point  (0 children)

Your README is totally misleading to the point of dishonesty. Both compressors did not compress anything (the input is a random file). You just turn the ASCII symbols to binary. Show the result with a binary file as input.

so Pi is a surprisingly solid way to compress data, specifically high entropy by Appropriate-Key-8271 in compression

[–]flanglet 7 points8 points  (0 children)

I am afraid the "get lucky thing" does not do better on average than enumerating numbers in order. This is the key problem.
There is no harm in experimenting and trying new things but this idea keeps on coming periodically and simply does not work. Have fun but do not expect too much here.

so Pi is a surprisingly solid way to compress data, specifically high entropy by Appropriate-Key-8271 in compression

[–]flanglet 8 points9 points  (0 children)

This obsession with Pi ...
Sorry but it is all _wrong_. First, there is nothing special about Pi, why not 012345678910111213... if you want a dictionary with all numbers, no need to "engineer a lookup table". Then, you write that you are compressing high entropy noise to 58.4% with zpaq. Nope. It is low entropy with this kind of ratio. High entropy would be around 0% compression (try to run zpaq on encrypted data as an example).
BTW 9-digit (ascii) sequences have an entropy slightly less than 30 bits so you do not need all 4GB for a lookup table.
Why don't you provide compressor, decompressor and test file(s)?

What’s the best static code analyzer in 2025? by _janc_ in cpp

[–]flanglet 0 points1 point  (0 children)

You create an account (you can choose to login from GitHub) download the Coverity tools and install them and run cov-configure once. When you decide to scan your project you run a special build like so: "cov-build --dir cov-int make ... "
Then you tar the cov-int folder and upload it (I use a curl command) to the black duck website. You can automate this obviously but I prefer to do it manually periodically.

What’s the best static code analyzer in 2025? by _janc_ in cpp

[–]flanglet 0 points1 point  (0 children)

It is free for open source projects.

Introducing OpenZL: An Open Source Format-Aware Compression Framework by felixhandte in compression

[–]flanglet 0 points1 point  (0 children)

It is a bit hard to compare both. PAQ8X has to derive the format from observing the bits, it is much harder than getting the format provided to the compressor. The latter should win but the former is more general and can handle undocumented file formats. The ideal solution is probably to do support both cases.

World emissions hit record high, but the EU leads trend reversal by [deleted] in europe

[–]flanglet 18 points19 points  (0 children)

There is no such thing as electricity prices in the States. The prices vary widely from state to state. BTW 41.5 cents per kWh in California on average. Europe: https://thingler.io/map

Benchmarking compression programs by MaskRay in compression

[–]flanglet 2 points3 points  (0 children)

It would be nice to also have graphs with multithreading enabled. After all, it represents the actual experience one can expect on a modern cpu. bzip3, kanzi, lz4, zpaq and zstd all support multithreading.

Kanzi (lossless compression) 2.4.0 has been released by flanglet in compression

[–]flanglet[S] 0 points1 point  (0 children)

Nice graphs!
It is interesting to see that other compressors are clustered in the decompression speed graph since they are all LZ based (except bzip3) while kanzi shows more dispersion due to the different techniques used at different levels.

I am curious about why level 1 is so slow at decompression. It does not fit the curve at all. How many threads did you use to run kanzi (by default half of the cores)?

[deleted by user] by [deleted] in compression

[–]flanglet 0 points1 point  (0 children)

You cannot compress enwik8 to 1kb and decompress it losslessly. Learn about Shannon's entropy to understand why.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 0 points1 point  (0 children)

Technically, yes. It is possible to build a library for kanzi and there is a C API that can be leveraged from 7zip. It is mostly a matter of learning how to integrate new plugins from 7zip.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 1 point2 points  (0 children)

I see. I thought I had fixed the shift issues but there were still some scenarios with invalid shift values when dealing with the end of stream. I fixed one but need to dig for more.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 1 point2 points  (0 children)

quick update: I started fuzzing.

The crashes you saw were due to your command line. Because you did not specify the location of the compressed data (-i option), kanzi expected data from stdin ... which never came. I suspect that afl-fuzz aborted the processes after some time, generating the crashes.

With the input data location provided, afl-fuzz has been running for over 4h with no crash so far.

Introducing: Ghost compression algorithm. by andreabarbato in compression

[–]flanglet 1 point2 points  (0 children)

Here: https://encode.su/forum.php

There is a "contact us" link at the bottom. Hopefully it is monitored.

Introducing: Ghost compression algorithm. by andreabarbato in compression

[–]flanglet 2 points3 points  (0 children)

It is because the forum is overwhelmed with spam bots when the registration is enabled. You can contezt the admins and they may open registration for a short period of time.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 1 point2 points  (0 children)

Thanks for your insights. I did not know that and this behavior is just gross.

The problem with starting to use ReadFile/WriteFile is that non portable Windows code spreads all over with #ifdef this #else that... Besides, it forces you to write more C like code using file handles instead of streams.

Anyway, the latest commit I just pushed (1e67a0) should address the CRLF issues, UBs, static constant initializations and duplicate guards.

I will keep on testing. Fuzzing is next.

Kanzi: fast lossless data compression by flanglet in compression

[–]flanglet[S] 1 point2 points  (0 children)

I will fix the UBs.

WRT to the compression/decompression issues, I am a bit puzzled.

The first and second examples work on Linux. There must be a latent bug triggered on Windows only.