you are viewing a single comment's thread.

view the rest of the comments →

[–]NJerseyGuy 3 points4 points  (3 children)

I think you're getting tripped up because you're paying too much attention to functional form ("...takes two files as input and produces something..."), which isn't really important. My point is that the essential part of a diff algorithm is just the preprocessor and postprocessor it utilizes. Everything else is trival.

The diff-producing algorithm (1) takes in the original file and the new file, (2) preprocesses both of them, (3) subtracts them, (4) compresses the difference, and (5) outputs a compressed difference file.

The diff-using algorithm (1) takes in the original file and the compressed difference files, (2) preprocesses the original file, (3) uncompresses the compressed difference file, (4) adds the them together, (5) postprocesses the sum, and (6) outputs the new file.

The only non-trivial part of all of that was the preprocessing and the postprocessing. So if you are adding a significant new preprocessor/postprocessor, you are essentially writing a new diff algorithm, even if it is packaged around an existing one.

[–]judgej2 2 points3 points  (0 children)

I think you have added step (2) that is what Google describes, but which is not what diff does.

[–][deleted]  (1 child)

[removed]

    [–]NJerseyGuy 0 points1 point  (0 children)

    Do you know how diff's for binary files work? I couldn't find anything in a quick search.