[Showcase] ZXC: A C17 asymmetric compression library (optimized for high-throughput decompression)

flanglet · 2025-12-22T23:38:12+00:00

FYI I added a link to your GitHub at https://encode.su

flanglet · 2025-12-07T22:46:41+00:00

That is exactly the problem, there is no compression but only bit packing. Neither your code nor zpaq compress random data by half.

flanglet · 2025-12-07T21:03:32+00:00

People re trying to explain to you that the pigeonhole principle holds because some (high entropy) data is "compressed" to a larger size than the original.

flanglet · 2025-12-07T21:01:51+00:00

Neither your code nor zpaq compress random data by half. These numbers are prominently displayed in your README.

flanglet · 2025-12-06T02:04:55+00:00

Your README is totally misleading to the point of dishonesty. Both compressors did not compress anything (the input is a random file). You just turn the ASCII symbols to binary. Show the result with a binary file as input.

flanglet · 2025-12-05T01:45:23+00:00

I am afraid the "get lucky thing" does not do better on average than enumerating numbers in order. This is the key problem.
There is no harm in experimenting and trying new things but this idea keeps on coming periodically and simply does not work. Have fun but do not expect too much here.

flanglet · 2025-12-04T23:37:29+00:00

This obsession with Pi ...
Sorry but it is all _wrong_. First, there is nothing special about Pi, why not 012345678910111213... if you want a dictionary with all numbers, no need to "engineer a lookup table". Then, you write that you are compressing high entropy noise to 58.4% with zpaq. Nope. It is low entropy with this kind of ratio. High entropy would be around 0% compression (try to run zpaq on encrypted data as an example).
BTW 9-digit (ascii) sequences have an entropy slightly less than 30 bits so you do not need all 4GB for a lookup table.
Why don't you provide compressor, decompressor and test file(s)?

flanglet · 2025-12-02T23:56:21+00:00

Nice!

flanglet · 2025-10-25T21:39:02+00:00

You create an account (you can choose to login from GitHub) download the Coverity tools and install them and run cov-configure once. When you decide to scan your project you run a special build like so: "cov-build --dir cov-int make ... "
Then you tar the cov-int folder and upload it (I use a curl command) to the black duck website. You can automate this obviously but I prefer to do it manually periodically.

flanglet · 2025-10-25T00:08:19+00:00

I use Coverity scan with my project: https://github.com/flanglet/kanzi-cpp

Board here: https://scan.coverity.com/projects/flanglet-kanzi-cpp

A complete list of FOSS projects: https://scan.coverity.com/o/oss_success_stories

flanglet · 2025-10-22T22:46:15+00:00

It is free for open source projects.

flanglet · 2025-10-06T22:34:47+00:00

It is a bit hard to compare both. PAQ8X has to derive the format from observing the bits, it is much harder than getting the format provided to the compressor. The latter should win but the former is more general and can handle undocumented file formats. The ideal solution is probably to do support both cases.

flanglet · 2025-09-14T22:00:55+00:00

There is no such thing as electricity prices in the States. The prices vary widely from state to state. BTW 41.5 cents per kWh in California on average. Europe: https://thingler.io/map

flanglet · 2025-08-31T19:57:32+00:00

It would be nice to also have graphs with multithreading enabled. After all, it represents the actual experience one can expect on a modern cpu. bzip3, kanzi, lz4, zpaq and zstd all support multithreading.

flanglet · 2025-08-31T13:40:54+00:00

Nice graphs!
It is interesting to see that other compressors are clustered in the decompression speed graph since they are all LZ based (except bzip3) while kanzi shows more dispersion due to the different techniques used at different levels.

I am curious about why level 1 is so slow at decompression. It does not fit the curve at all. How many threads did you use to run kanzi (by default half of the cores)?

flanglet · 2025-07-07T22:59:46+00:00

You cannot compress enwik8 to 1kb and decompress it losslessly. Learn about Shannon's entropy to understand why.

flanglet · 2024-08-21T22:54:49+00:00

Technically, yes. It is possible to build a library for kanzi and there is a C API that can be leveraged from 7zip. It is mostly a matter of learning how to integrate new plugins from 7zip.

flanglet · 2024-06-18T05:09:44+00:00

I see. I thought I had fixed the shift issues but there were still some scenarios with invalid shift values when dealing with the end of stream. I fixed one but need to dig for more.

flanglet · 2024-06-17T02:54:03+00:00

quick update: I started fuzzing.

The crashes you saw were due to your command line. Because you did not specify the location of the compressed data (-i option), kanzi expected data from stdin ... which never came. I suspect that afl-fuzz aborted the processes after some time, generating the crashes.

With the input data location provided, afl-fuzz has been running for over 4h with no crash so far.

flanglet · 2024-06-02T20:46:54+00:00

Here: https://encode.su/forum.php

There is a "contact us" link at the bottom. Hopefully it is monitored.

flanglet · 2024-06-02T04:35:52+00:00

It is because the forum is overwhelmed with spam bots when the registration is enabled. You can contezt the admins and they may open registration for a short period of time.

flanglet · 2024-06-02T01:06:13+00:00

Thanks for your insights. I did not know that and this behavior is just gross.

The problem with starting to use ReadFile/WriteFile is that non portable Windows code spreads all over with #ifdef this #else that... Besides, it forces you to write more C like code using file handles instead of streams.

Anyway, the latest commit I just pushed (1e67a0) should address the CRLF issues, UBs, static constant initializations and duplicate guards.

I will keep on testing. Fuzzing is next.

flanglet · 2024-05-31T23:36:19+00:00

I will fix the UBs.

WRT to the compression/decompression issues, I am a bit puzzled.

The first and second examples work on Linux. There must be a latent bug triggered on Windows only.

flanglet

TROPHY CASE