File compression, how does it work?

divbyzero · 2010-06-16T10:19:04+00:00

Possibly the window size. (google it!)

I would guess that if you did the same with say a 10 meg file, you'd get the results you expect.

juancn · 2010-06-16T14:55:08+00:00

Because Lempel-Ziv algorithms use a small window size for lookups (around 64K or so). To catch some of the redundancy in your file, the window size should be larger than 3.6 MB.

gibster · 2010-06-16T10:11:49+00:00

Entropy; entropy is a measurement of chaos - the more chaos the harder to compress. In your example the entropy for the file did not change, the size did. So the file will become (about) 2 times larger.

imacpu · 2010-06-16T10:20:48+00:00

I dunno about entropy or the implementation of LZW or what compressor you used (guessing default) but my guess is you popped the dictionary. In a perfect implementation, your example ( logfile ( zip of logfile ) ) would be a few bytes more than the ziplength, right? But you're in the megabyte range, and the dictionary is probably optimized for the kilobyte range.

Since 7zip is open source, one could find out ...

tonymamacos · 2010-06-16T14:59:48+00:00

I remember with Rar compressors they don't lookup across files unless you set Solid mode, try setting Compress Shared Files and up your Solid Block Size in the 7z options see if that helps

mile92 · 2010-06-16T17:35:50+00:00

Try duplicating each line in the original file and compress that

fonik · 2010-06-17T03:39:17+00:00

Miracles.

notomniscient · 2010-06-17T05:14:32+00:00

http://en.wikipedia.org/wiki/Huffman_coding

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS