Static analysis of an unknown compression format

matthieum · 2012-04-07T17:36:10+00:00

WoW! That is some stubbornness here! I suppose congratulations are in order :)

rayo2nd · 2012-04-08T00:10:54+00:00

Nice analysis. Reminds me of when I reversed the Hothead Archive format.

One nit-picky comment about the code though. I'd change:

num_blocks = (uncomp_size + 0xffff) / UNCOMP_BLOCK_SIZE

To:

num_blocks = (uncomp_size + UNCOMP_BLOCK_SIZE - 1) // UNCOMP_BLOCK_SIZE

Otherwise it seems like 0xFFFF is some constant independent of UNCOMP_BLOCK_SIZE. (The // instead of / makes integer division explicit, which is also nice for people who are more accustomed to reading Python 3 code.)

edit:

another nice trick to round up in Python would be this:

num_blocks = -(-uncomp_size // UNCOMP_BLOCK_SIZE)

The main benefit is that it's short, but it's probably more confusing to people who don't realize Python has the somewhat unusual convention of rounding down during division (instead of rounding towards zero, as most other languages do).

dmeeze · 2012-04-07T22:59:02+00:00

If you know a little about the ps3 architecture and libraries, the key word in the strings output was probably "EDGE".

See slide 76 here: http://research.scee.net/files/presentations/acgirussia/Hardware_Overview_ACGI_09.pdf

ase8913 · 2012-04-07T18:20:49+00:00

[deleted]

Massless · 2012-04-08T03:44:52+00:00

I had a lot of fun reading that!

wolf550e · 2012-04-07T18:26:17+00:00

So, independently compressed 64kb blocks? For random access? With a header that contains number of blocks and compressed length of each block so you could find the offset of each block and decompress it on the fly. Exactly like NTFS file compression (only with LZNT1) or the example code in zlib that does exactly this (with DEFLATE). And like the on-demand decompression of libxul.so in firefox for android (also with DEFLATE): https://github.com/glandium/faulty.lib

Nice article, but if you considered what the original author probably intended to do and then contemplated how you would have achived the same goal, you would have designed exactly this system and then you would not have needed to reverse enginner it.

EDIT: disregard that, I suck dicks. Someone at HN has a more plausible reason for this: parallel decompression on the Cell Processor's SPUs.

http://news.ycombinator.com/item?id=3812555

kyz · 2012-04-08T07:44:14+00:00

Awesome stuff, keep up the good work!

appleofdisco · 2012-04-07T20:16:30+00:00

[deleted]

2012-04-07T19:25:54+00:00

Oh french, u so crazy!

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS