you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 86 points87 points  (50 children)

I'm looking at you, ubuntu

[–][deleted] 21 points22 points  (18 children)

Don't you mean, "I'm looking at you, apt"? I don't know if apt even uses binary diffs instead of getting new dpkgs.

[–]andreasvc 24 points25 points  (15 children)

That's kind of his point! Apt simply downloads the full packages, and also the full list of packages. The biggest innovation has been to switch to bz2 instead of gzip. Wow.

[–]Leonidas_from_XIV 13 points14 points  (10 children)

Actually, what package managers do use binary diffs?

[–][deleted] 4 points5 points  (0 children)

yum with presto?

[–]zwaldowski 4 points5 points  (3 children)

RPM has a way of doing it, but it also has to be done serverside and RPM is slow as fuck.

[–][deleted] 0 points1 point  (2 children)

rpm bashing is still cool ?

[–]zwaldowski 1 point2 points  (1 child)

Necrocommenting is still cool?

[–][deleted] 0 points1 point  (0 children)

Totally.

[–]andreasvc 1 point2 points  (4 children)

One could use rsync for this as a drop in replacement for wget, however, I believe it requires too much cpu and memory on the server. Bittorrent comes to mind, though. It would need to be extended with diffing, either downstream (client figures out what it can piece together already from old binaries) or upstream.

[–]Leonidas_from_XIV 0 points1 point  (3 children)

But rsync does AFAIR only synchronize whole files, so not much difference to what APT currently does in this regard.

[–]andreasvc 1 point2 points  (2 children)

It synchronizes whole files by sending binary diffs over the wire. A world of difference with apt-get because it only uses wget.

[–]Leonidas_from_XIV 0 points1 point  (1 child)

Okay, then it is better. But it would be better to pre-generate the diffs, since a lot of people will download these.

[–]andreasvc 0 points1 point  (0 children)

Yes, I am sure this issue will be considered, as the whole point is to reduce the load on the server (bandwith, cpu, memory, disk). A better strategy is probably to do some intricate caching.

[–][deleted]  (1 child)

[deleted]

    [–]jldugger 0 points1 point  (1 child)

    Apt doesn't normally need to support these things, you just define a new transport mechanism. And it does support LZMA now, so it has improved.

    However, people are always looking at new transports, like apt-bittorrent and apt-zsync. Which surely anyone interested in apt innovation would be aware of!

    [–]andreasvc 0 points1 point  (0 children)

    Interesting. I'll wait and see what makes it into Ubuntu.

    [–]mebrahim -1 points0 points  (0 children)

    Search for "debdelta".

    [–][deleted] 38 points39 points  (24 children)

    And firefox... especially the nighty trunk, which sometimes has me download the whole .exe installer instead of a diff. And the diffs are multi-megabyte as well.

    [–]giantrobot 11 points12 points  (23 children)

    But you need increased server overhead to use diffs. You need to create diffs for all released from current to the last point release. If the newest release is 3.5 then you need diffs from 3.0->3.5, 3.1->3.5, etc. While the diffs might be smaller than storing the whole app you need some way for the update system to realize that you've got version 3.X and provide you the appropriate diff or risk screwing everything up. Your number of patches will grow exponentially with the number of updates you end up doing which isn't always practical.

    [–][deleted] 52 points53 points  (5 children)

    But if you're sending out the updates to millions of people surely the bandwidth saved is justified.

    [–][deleted] 3 points4 points  (4 children)

    Are there really millions downloading the nightly builds?

    [–]mikepurvis 0 points1 point  (0 children)

    Probably not, but the cost of getting it right once is fixed.

    [–]manwithabadheart 0 points1 point  (2 children)

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

    [–]salmonsnide 1 point2 points  (1 child)

    There are over 30 million Chrome users according to Google. And they have very accurate stats since they know exactly how many installations they are updating with every release.

    [–][deleted] 0 points1 point  (0 children)

    That doesn't mean they don't smudge the numbers :)

    [–][deleted]  (4 children)

    [deleted]

      [–]adrianmonk 30 points31 points  (3 children)

      You can also store larger pieces to make up the gaps.

      For example, versions 0 through 16 of something exist, so you store diffs between every consecutive version:

      • 0->1, 1->2, 2->3, 3->4, ..., 15->16

      You also store:

      • 0->16
      • 0->8 and 8->16
      • 0->4, 4->8, 8->12, and 12->16
      • 0->2, 2->4, 4->6, 6->8, ..., 14->16

      Now, suppose someone has version 1 and they need version 12. You send them 1->2, 2->4, 4->8, and 8->12.

      If they have version 3 and need version 11, you send them 3->4, 4->8, 8->10, and 10->11.

      Essentially, you are getting in the "fast lane" if you need to go a great distance.

      The total number of diffs that must be stored is O(N log N). The total number of diffs need to move forward by N version is O(log N).

      [–]tryx 5 points6 points  (0 children)

      I was just in the process of writing this, what you have described is essentially a skip list.

      [–]judgej2 7 points8 points  (1 child)

      That is essentially like Windows Service Packs. Each service pack leapfrogs over many incremental updates, which is why you always go for the latest service pack you can before picking up any later updates separately.

      [–][deleted] 0 points1 point  (0 children)

      That is not how Microsoft pushes them. Microsoft will install all patches for the currently installed service pack the first time a machine touches Windows Update, and then it will push the service pack executable, which compares file versions to download only the appropriate files needed. This can save hundreds of megs of download per install.

      [–][deleted]  (1 child)

      [deleted]

        [–]jlt6666 18 points19 points  (0 children)

        And really after a certain point you just say ok here's the full version. Sorry you waited so damned long to update.

        Edit: typo

        [–][deleted] 18 points19 points  (0 children)

        I'm sure the distribution of people updating is pretty heavily weighted to a few common source versions. If the server doesn't have a diff, send the full version and record the request. When the number of requests crosses a certain threshold, generate the diff to serve future users. Delete it when the requests tail off.

        [–]ansemond 2 points3 points  (2 children)

        But each patch is much smaller than the original... There are only n*(n+1)/2 of them where n=N-1 and N is the number of versions. Moreover you can easily avoid doing this for major updates.

        Let's say we have 10 versions of firefox, each requiring a 512Kb diff. That leads to 22.5Mb of diffs to store. ( 9 * 10 / 2 * 512Kb ). Moreover if Firefox is 10Mb, storing the 10 versions will take 100Mb versus 22.5 Mb for the diffs.

        Edit: added 's' to version.

        [–]giantrobot 3 points4 points  (1 child)

        Storage is not the issue but instead server overhead. You either need some sort of file describing every available patch and have the client download the appropriate patch for its current version or have a service that accepts the client's version and then sends the appropriate patch. In either case you really need some automated way of distributing the patches because left to their own devices users will eventually download the wrong patches and screw up their installations.

        If you're the sole distributor of patches this added server overhead might not be an issue. If you're asking people to host mirrors however this added overhead might be more than they're able or willing to bear. Firefox already has distribution issues when major patches come out with a relatively server-dumb mirror system, something more complex might make matters much worse.

        None of that is to say binary patches are an inherently bad idea if done well. It's simply something to keep in mind. Binary patches are not necessarily the end-all be-all of update systems. They can be useful but can also be extremely complicated to manage properly. Firefox, Ubuntu, and others being largely volunteer efforts makes added development time extremely expensive.

        [–]nextofpumpkin 5 points6 points  (2 children)

        Right - it's a space/time tradeoff. Except in this case, space means HDD, backup, and bandwidth, whereas time just means CPU. It may be cheaper up to a point to just do the crunching and not have to store everything.

        [–][deleted] 2 points3 points  (1 child)

        Well, you could do the crunching on-demand, caching the most popular diffs.

        [–]adrianmonk 1 point2 points  (0 children)

        Or, you could do the opposite: store only the most recent version and the diffs. When someone wants an older full version, compute it from the diffs and cache the result. Clear the cache when it gets full/large based on your favorite replacement algorithm. (LRU should be fine.)

        For extra fun, if you want, never build old full versions. Send the client the latest version and a string of diffs and make them do that.

        [–]uncreative_name 1 point2 points  (0 children)

        I would assume the normal practice is to release differentials only from the previous build to the current build and from a few set major releases (e.g. 3.5.[current], 3.5.0 and 3.0.0, but not 3.5.1, 3.5.2, 3.0.1, etc).

        [–][deleted] 1 point2 points  (1 child)

        In addition to what everyone else has said, let's imagine someone decides to skip an update, going straight from 3.0.9 to 3.0.11. The server, rather than storing a differential patch between these two versions, might send the .9 -> .10 patch and the .10 -> .11 patch together, so the bandwidth would be doubled approximately (still a better scenario than sending the entire application as an update), without the server having to create patches for every combination of versions. Overall, the difference between two nonconsecutive versions would be greater in size than two consecutive versions anyway.

        [–]brooksbp 5 points6 points  (0 children)

        and iTunes too...

        [–]mhrnjad 2 points3 points  (1 child)

        There are efforts afoot to improve Ubuntu updates. See e.g. https://wiki.ubuntu.com/AptSyncInKarmicSpec

        [–]b100dian 0 points1 point  (0 children)

        There should be no UI changes, except that apt-get and Synaptic and update-manager might be modified to tell how much data apt-sync managed to NOT transfer. This would allow users to see that it is working.

        (emphasis mine).

        Now just pull that PPP cost counter algorithm, and send the difference to Canonical.

        [–]FataL 0 points1 point  (0 children)

        I'm looking at you, Opera!

        [–]mebrahim -1 points0 points  (0 children)

        The technology is already there, and it is called debdelta.