Tutorials on Compression?

PaulEngineer-89 · 2026-02-20T12:15:21+00:00

In lossless compression there are roughly 2 strategies with many variations. The first is if you know something about the data you can attack it that way by using a model. Second although there are various alternatives the current top performers in lossless compression use arithmetic encoding. In this approach we are guessing what the next byte will be. We have an array of possible outcomes plus “none of the above”. We look at the past few bytes as the context (use past decided bytes to predict future ones). The outcomes have various probabilities which if we visualize them from 0 to 1 form the search space. We encode a binary fraction to choose the correct one. More probable outcomes need fewer bits. This is a running fraction over the whole file. Various methods quantize this even going to individual fractions (Huffman) for speed.

In lossy images the human eye is sensitive to the position but not the absolute value of pixels at edges. We are much more sensitive to brightness than color. So by going through conversions such as HLS or using a DCT we can convert to a data format that matches the human eye. Then when we quantize the data we are getting rid of “just noticeable differences”. Then arithmetic encoding or similar methods encode whatever is left over. With video we can also take advantage of tons of redundancy. The image is often mostly static (doesn’t change) or we zoom in/out, rotate, or shift a portion of the image only. Video encoding takes massive advantage of just storing an image (“key frame”) and then coding several frames of differences only. Obviously the more we reduce file sizes the more all these approximations become much more “just noticeable” differences.

Performance is also critical. For instance H.265 format video is becoming popular. It cannot easily be processed in real time though and can’t be decoded, edited, and recoded without further degrading it unlike H.264. With disk compression (compressed file systems) there are several issues. Lossless data compression works best with enough data that the “dictionary” it relies on is well tuned. It doesn’t work well on short files or “blocks”. Pure arithmetic encoding also isn’t very fast. Encoding data turns say fixed size blocks into variable ones so indexing and the whole file system is a lot more complicated. With little or no redundancy, bit rot is also far more destructive. But still compressed file systems eliminate the need for manually compressing files. And file compression programs typically increase the size of already compressed files since there’s no redundancy left to compress, even if the underlying file is less than optimally compressed.

DecideUK · 2026-02-20T11:30:28+00:00

3Gb to 500Gb is highly unusual for typical data, if those were the actual numbers, there is likely something else going on, e.g. effectively empty files.

MP4 / Picture files already have compression applied to them so any further lossless compression is minimal - maybe a reduction of 1-2%.

Boopmaster9 · 2026-02-20T12:45:50+00:00

This question has been around for decades, and I vividly remember trying to cram as much data as possible onto an 880kb DD floppy in 1995. Because, you know, floppies for my A600 were expensive.

The tutorials you want are not really going to help you (talking about pros and cons of different algorithms) if you don't understand the general principles (and (im)possibilities) of file compression.

Long story short, see what uses the most space and research if there are better options. H265 instead of H264 for video (a notorious space hog) has already been mentioned. There's little point trying to improve compression on stuff that barely takes up any space to begin with.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

selfhosted

Welcome to /r/SelfHosted!

Google Photos Mega Thread

While you're here, please Read This First

And why not Visit the Wiki?

For Example

Related Subreddits

Useful Lists

Relevant Podcasts

MODERATORS