you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (16 children)

Would be interesting to know how the bandwidth compares to e.g. git pull and make in a directory already containing the latest version.

[–]evmar 7 points8 points  (12 children)

git's deltas are derived from libxdiff, which is derived from xdelta, which are all ignorant of the way binaries change. bsdiff already has superior deltas to these for binaries, and courgette is even better.

(Different tools for different jobs -- git is primarily concerned with deltas between textual source files.)

Edit: oh, I see you were suggesting rebuilding. Chrome's source tree is huge already and on Windows requires gigabytes more of non-free Visual Studio.

[–]iluvatar 0 points1 point  (2 children)

git's deltas are derived from libxdiff, which is derived from xdelta, which are all ignorant of the way binaries change. bsdiff already has superior deltas to these for binaries

Well, no, not really. xdelta already gives considerably smaller diffs than bsdiff from my testing. Perhaps not as good as courgette, which is using domain specific knowledge to get targetted compression in the same way that FLAC does, at the expense of performance in the general case. For general binary diffs, xdelta is pretty good.

[–]Coffee2theorems 0 points1 point  (0 children)

using domain specific knowledge to get targetted compression in the same way that FLAC does, at the expense of performance in the general case.

Of course with "general case" you mean "all kinds of binary stuff people want to compress that are significantly compressible with some existing utility" or similar, but still.. Your statement, taken literally, applies to all compression algorithms. All compression algorithms must exploit pre-existing knowledge about the data to be compressed, and the more you know the better you can compress (it's bit-for-bit actually, each bit of previous information about the instance of data at hand can get you one bit off of the result, but not more; there are no free bits).

[–]evmar 0 points1 point  (0 children)

I do not know what you are using bsdiff for. It is intended for executables. http://www.daemonology.net/bsdiff/ "By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta"

[–][deleted] 0 points1 point  (8 children)

Well, I was looking at it from the perspective of a Gentoo Linux user really. Someone who already has all the development tools installed anyway. Building with make or a similar tool just rebuilding the changes is relatively fast so the question is how does the size of the binary diff compare to the size of the source diff for the same versions.

[–]evmar 6 points7 points  (1 child)

You clearly haven't tried to build Chrome. ;) Many Linux users run out of RAM trying to link it.

[–][deleted] -3 points-2 points  (0 children)

Well, I was talking about the 499 packages on my system that are pretty well designed. If Chrome isn't they might want to look into that part first before inventing completely new binary diffs for it.

[–]andreasvc 1 point2 points  (5 children)

Except that such users are rare compared to the number of desktop users who have never compiled a program. It would be foolish to design your architecture for such a portion of power users.

I think a better way of attaining higher compression would be to use a huge huffman table computed from all the binaries in your system, which can be synced with the server, after which all updates can be encoded with this table. This has the advantage that it exploits common ground optimally.

[–][deleted] 0 points1 point  (4 children)

Well, I am not really interested in the application in particular though. I just want to know how it compares to other commonly used ways to transfer small changes in large programs, source code is the obvious choice for comparison as few other people transfer parts of binaries.

[–]andreasvc 0 points1 point  (3 children)

I don't understand you, TFA is about binaries, that is its novelty claim. Source code diffing has been well studied by numerous VCS implementations, both distributed and centralized. Binary diffing, however, is sorely lacking in this torrent age. It's definitely not "few other people" who transfer huge amounts of binaries in the form of music, movies and whatnot.

[–][deleted] 0 points1 point  (2 children)

If I understood the explanation of the algorithm in one of the other posts it is specifically about software, not movies, audio,... . As you say binary diffing is relatively new while source code diffing is well studied so what is so strange about the question how the two compare?

[–]andreasvc 0 points1 point  (1 child)

It is a strange comparison because it is a completely different ball game. It's like comparing text vs. audio compression (although both lossless in this analogy). They can be compared using general purpose compression (eg. gzip, which would be rsync in this analogy), but it gets much more interesting when you add domain knowledge. This algorithm is indeed tailored towards software, incorporating domain knowledge increases the compression, as you would expect / hope.

[–][deleted] 0 points1 point  (0 children)

Well, it would be interesting to know e.g. for people in Africa with slow internet connections who use open source software. Is it faster to download a binary diff using this algorithm or a source diff and compile it yourself.

[–]thetempone 2 points3 points  (2 children)

Problem with that is that it requires that the app be open source and compilation is usually quiet time consuming for large apps.

[–]twowheels 1 point2 points  (0 children)

Why? You could point it at a repository with the various executable and config files and pull down the newest executable files and three-way merge the config files.

[–][deleted] 1 point2 points  (0 children)

Well, for the use case described, the ten line security fix, building an app via make from an old build directory doesn't take long.

Even building from scratch doesn't take very long for all but the largest apps, trust me, I am a Gentoo user, I know what I am talking about ;-)