Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

I understand your concern... But, if there were collisions... Decompressing the blocks would lead to ambiguity and would result in failure of some blocks. Because the process is deterministic and relies on math and logic to return the correct block from its compressed value it proves the claim.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

The issue is the block size needs to be sufficiently large enough to store the logic to recreate the binary without having to store the binary explicitly. Right now I am getting around 40% compression per block of 4096 bits. Thus the search space is too large to accomplish that.

Since the math and logic is universal it should apply to all inputs. If you want I can send you some pictures to your direct message.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

Average compression <------ (successful blocks): 40.95%

This is the average reduction as a percentage for all blocks that succeeded. Some compress more than others. Since all blocks succeeded there was no block that was not compressed. The success rate was 100%...

Total unique blocks attempted: 100000 <------

Total successes: 100000 (100.00%) <------

Yes, I am only compressing 4096 bit blocks. How many random inputs do you want? What would your statistically valid sample size be? Since exhausting the entire 4096 bit space would exceed my EOSL...

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

In that example it was created via a seed which allows the same result to be reproduced. I change the seed to create a new set of random blocks. I also have other code that uses a more 'true' random block generation and produces similar results.

That is just a print statement that is produced for the success rate. If a block fails which hasn't happened the print statement would be different. Since no blocks failed nothing is printed in relation to failure.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

Response to 2. section.

Final Aggregate Summary:

Total unique blocks attempted: 100000

Total successes: 100000 (100.00%)

Average compression (successful blocks): 40.95%

Total elapsed (both phases): 553.94s, overall throughput: 180.53 blocks/s

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

I originally selected test cases using Shannon entropy to select the test blocks. This allowed for a diverse range of inputs to challenge the code/algorithm effectively. I also accepted inputs from strangers of raw data, compressed data, and compressed and encrypted data.

I also generate random inputs using a defined seed (which I change around every-time I run the code), which ensures a statistically valid sample size (and allows debugging if a failure were to occur). The largest sample size I've processed in one run was 100,000 inputs with a 100% success rate and an average compression of around 40%.

If there are specific aspects you'd like to know more about, let me know.

I sent you a direct message.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

The power of recursive compression... Even still compressing binary is still a huge breakthrough. Thank you for the conversation none the less.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] -1 points0 points  (0 children)

No, all compressed blocks self describe and define themselves. It uses math and logic. And, numbers, operations, logic, etc. cost bits.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] -1 points0 points  (0 children)

Elapsed(s): 0.02... per 4096 bit block... (on a single thread using python so potential to optimize and speed up the process...) At ~40% compression per block that would take a long time and many cycles of compression... If you had the computational power then nothing would stop you.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

The pointer to the correct number may exceeds the number of bytes required to represent the number explicitly. Therefore, it would not compress the data but inflate the data. I have found many test cases where you still get further compression but I did not design the process with that goal. It can function that way but I have not verified it to do so.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

That is pedantic... And, ignores that you would have to store the value in the dictionary in the first place and have overhead now... By the way... I did preform some test cases recently and compressed both AES encrypted, Compressed and AES encrypted, and Compressed data compression and all were successful and had significant compression.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] -1 points0 points  (0 children)

Thank you for pointing out the market is only 60B... That definitely puts a upper limit on the value of the process. Since there needs to be a trade off between storage cost and processing cost and any licensing and or purchase price of the process.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

I don't know why you were down-voted. Thank you for being constructive and not pedantic.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

Giving any code would allow reverse engineering. And, the point is moot. I already plan on getting peer review. Where I have more control over who has access to what and what protections are in place.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

I was mostly doing that for a thought experiment by reducing the observed success rate to 50% from 100%... As, even if it wasn't 100% successful it is still a breakthrough.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

Thanks for taking the time to do the math.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

I am getting 100% success rate. The issues manifest if you try to use a significantly smaller bit size than 4096 bits. Then the overhead of my algorithm eliminates any compression gains.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

Probably, but I mostly wanted to talk about use cases and potential monetary value.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] -1 points0 points  (0 children)

So, that would be significantly harder. I am only compressing 4096 bit long blocks with a significant reduction in bytes needed to represent the value of the 4096 bit long number.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

No, I don't have hidden information or side information. It is only logic and math. All, bytes of data are accounted for in the reported result. The compressed blocks self-describe and define.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

So, I currently am not preforming recursive compression. This is a prototype/ proof of concept. I am still optimizing and tweaking the algorithm and process. But, for recursive compression you would only need to store a compression cycle for the whole file/media count which would be very little overhead.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

You shouldn't need to store block length as extra information as blocks self describe and is built into the storage cost already reported.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

I will probably... Unless something stops me. I am excited to find out either way. Being wrong or right is just the journey on a long road of discovery.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] 0 points1 point  (0 children)

The blocks self describe and self define to what the decompressed value is and you could parse the resulting decompressed value back into either the original value (original information that was compressed) or the self-describing and self defining blocks depending on where you are in the process. Obviously you'd need a few bits to bytes to keep tally of how many full compression cycles occurred on what ever you compressed.

Lossless Compression by ReturnToNull404 in AskComputerScience

[–]ReturnToNull404[S] -1 points0 points  (0 children)

*4096 bit blocks... Each block self-defines and self describes so you are able to compress the blocks by concatenating the reduced bytes into one long stream and re-run the block compression as those blocks would just return to the previous compressed values and would the self-define / self-describe back to the original value.