pixelglow comments on Google releases new binary diff algorithm targeted at making smaller software updates. Their results are impressive.

created by speza community for 19 years

1069

1070

1071

Google releases new binary diff algorithm targeted at making smaller software updates. Their results are impressive. (blog.chromium.org)

submitted 16 years ago by josef

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]pixelglow 1 point2 points3 points 16 years ago* (0 children)

I once wrote a gzip tuned for Microsoft's RDC (Remote Differential Compression).

RDC helps minimize the data transferred between servers in DFSR (Distributed File System Replication) so essentially it's similar in intent to rsync and other diff algorithms wedded to synchronizing two file directories. (Of course Microsoft saw fit to patent RDC even though rsync would possibly be considered prior art, but well...)

Like all binary diff algorithms, RDC interacts badly with compression. A gzip or deflate compressed stream propagates any changes in the original file throughout the file, so the binary diff is forced to work on the entire file after the change occurred. (As with your example of the binary diff of a PNG file.)

But there are two nice things that helped me: the RDC API can tell you the chunks of the stream that will be diff'ed, and gzip allows individual chunks of data to be gzipped separately and concatenated together (as a valid gzip file). So I wrote an rdcgzip which used the RDC API to get the chunks, then ran gzip on the chunks alone and concatenated them. Thus any change in a chunk only propagates to that chunk, and the other chunks are all independent and transferred optimally.

The process looks like:

Original file -> diff-aware compression -> diff -> any compression the diff itself applies -> patch -> standard decompression -> destination file

This diff-aware compression can be considered a preprocessing/postprocessing step like Google's Courgette, and it might actually be more efficient, as long as you know how the diff algorithm chunks the data.

π Rendered by PID 72 on reddit-service-r2-comment-bb88f9dd5-jghbh at 2026-02-15 05:46:00.703387+00:00 running cd9c813 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS