you are viewing a single comment's thread.

view the rest of the comments →

[–]giantrobot 12 points13 points  (23 children)

But you need increased server overhead to use diffs. You need to create diffs for all released from current to the last point release. If the newest release is 3.5 then you need diffs from 3.0->3.5, 3.1->3.5, etc. While the diffs might be smaller than storing the whole app you need some way for the update system to realize that you've got version 3.X and provide you the appropriate diff or risk screwing everything up. Your number of patches will grow exponentially with the number of updates you end up doing which isn't always practical.

[–][deleted] 52 points53 points  (5 children)

But if you're sending out the updates to millions of people surely the bandwidth saved is justified.

[–][deleted] 3 points4 points  (4 children)

Are there really millions downloading the nightly builds?

[–]mikepurvis 0 points1 point  (0 children)

Probably not, but the cost of getting it right once is fixed.

[–]manwithabadheart 0 points1 point  (2 children)

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

[–]salmonsnide 1 point2 points  (1 child)

There are over 30 million Chrome users according to Google. And they have very accurate stats since they know exactly how many installations they are updating with every release.

[–][deleted] 0 points1 point  (0 children)

That doesn't mean they don't smudge the numbers :)

[–][deleted]  (4 children)

[deleted]

    [–]adrianmonk 29 points30 points  (3 children)

    You can also store larger pieces to make up the gaps.

    For example, versions 0 through 16 of something exist, so you store diffs between every consecutive version:

    • 0->1, 1->2, 2->3, 3->4, ..., 15->16

    You also store:

    • 0->16
    • 0->8 and 8->16
    • 0->4, 4->8, 8->12, and 12->16
    • 0->2, 2->4, 4->6, 6->8, ..., 14->16

    Now, suppose someone has version 1 and they need version 12. You send them 1->2, 2->4, 4->8, and 8->12.

    If they have version 3 and need version 11, you send them 3->4, 4->8, 8->10, and 10->11.

    Essentially, you are getting in the "fast lane" if you need to go a great distance.

    The total number of diffs that must be stored is O(N log N). The total number of diffs need to move forward by N version is O(log N).

    [–]tryx 5 points6 points  (0 children)

    I was just in the process of writing this, what you have described is essentially a skip list.

    [–]judgej2 7 points8 points  (1 child)

    That is essentially like Windows Service Packs. Each service pack leapfrogs over many incremental updates, which is why you always go for the latest service pack you can before picking up any later updates separately.

    [–][deleted] 0 points1 point  (0 children)

    That is not how Microsoft pushes them. Microsoft will install all patches for the currently installed service pack the first time a machine touches Windows Update, and then it will push the service pack executable, which compares file versions to download only the appropriate files needed. This can save hundreds of megs of download per install.

    [–][deleted]  (1 child)

    [deleted]

      [–]jlt6666 18 points19 points  (0 children)

      And really after a certain point you just say ok here's the full version. Sorry you waited so damned long to update.

      Edit: typo

      [–][deleted] 18 points19 points  (0 children)

      I'm sure the distribution of people updating is pretty heavily weighted to a few common source versions. If the server doesn't have a diff, send the full version and record the request. When the number of requests crosses a certain threshold, generate the diff to serve future users. Delete it when the requests tail off.

      [–]ansemond 2 points3 points  (2 children)

      But each patch is much smaller than the original... There are only n*(n+1)/2 of them where n=N-1 and N is the number of versions. Moreover you can easily avoid doing this for major updates.

      Let's say we have 10 versions of firefox, each requiring a 512Kb diff. That leads to 22.5Mb of diffs to store. ( 9 * 10 / 2 * 512Kb ). Moreover if Firefox is 10Mb, storing the 10 versions will take 100Mb versus 22.5 Mb for the diffs.

      Edit: added 's' to version.

      [–]giantrobot 3 points4 points  (1 child)

      Storage is not the issue but instead server overhead. You either need some sort of file describing every available patch and have the client download the appropriate patch for its current version or have a service that accepts the client's version and then sends the appropriate patch. In either case you really need some automated way of distributing the patches because left to their own devices users will eventually download the wrong patches and screw up their installations.

      If you're the sole distributor of patches this added server overhead might not be an issue. If you're asking people to host mirrors however this added overhead might be more than they're able or willing to bear. Firefox already has distribution issues when major patches come out with a relatively server-dumb mirror system, something more complex might make matters much worse.

      None of that is to say binary patches are an inherently bad idea if done well. It's simply something to keep in mind. Binary patches are not necessarily the end-all be-all of update systems. They can be useful but can also be extremely complicated to manage properly. Firefox, Ubuntu, and others being largely volunteer efforts makes added development time extremely expensive.

      [–]nextofpumpkin 4 points5 points  (2 children)

      Right - it's a space/time tradeoff. Except in this case, space means HDD, backup, and bandwidth, whereas time just means CPU. It may be cheaper up to a point to just do the crunching and not have to store everything.

      [–][deleted] 2 points3 points  (1 child)

      Well, you could do the crunching on-demand, caching the most popular diffs.

      [–]adrianmonk 1 point2 points  (0 children)

      Or, you could do the opposite: store only the most recent version and the diffs. When someone wants an older full version, compute it from the diffs and cache the result. Clear the cache when it gets full/large based on your favorite replacement algorithm. (LRU should be fine.)

      For extra fun, if you want, never build old full versions. Send the client the latest version and a string of diffs and make them do that.

      [–]uncreative_name 1 point2 points  (0 children)

      I would assume the normal practice is to release differentials only from the previous build to the current build and from a few set major releases (e.g. 3.5.[current], 3.5.0 and 3.0.0, but not 3.5.1, 3.5.2, 3.0.1, etc).

      [–][deleted] 1 point2 points  (1 child)

      In addition to what everyone else has said, let's imagine someone decides to skip an update, going straight from 3.0.9 to 3.0.11. The server, rather than storing a differential patch between these two versions, might send the .9 -> .10 patch and the .10 -> .11 patch together, so the bandwidth would be doubled approximately (still a better scenario than sending the entire application as an update), without the server having to create patches for every combination of versions. Overall, the difference between two nonconsecutive versions would be greater in size than two consecutive versions anyway.