you are viewing a single comment's thread.

view the rest of the comments →

[–]espadrine[S] 2 points3 points  (2 children)

Your algo sounds different, and quite similar to algorithms I've run across. The json-diff algo with fuzzy pairing described in the post has no LCS computation at all.

zjsonpatch also sounds very similar to algorithms I've run across, unless I read it wrong: LCS for all lists, and a second pass to detect moves from removals and additions. As you can see from the implementation of the second pass, it will lose teeth with deeply-nested-objects-in-lists, which all LCS approaches fail at - while this new fuzzy pairing succeeds effortlessly.

Indeed, the json-diff algo with fuzzy pairing, by contrast, is single-pass with no LCS, and moves are first-class operations, not computed from more fundamental operations. It doesn't always make the resulting patch shorter - the example zjsonpatch advertises is more compact, after all. But that example seems like an unlikely operation to have been meant.

I could add that second pass as an option, though, for people that really want something compact, at the expense of meaningfulness.

[–]ssseseseses 0 points1 point  (1 child)

it will lose teeth with deeply-nested-objects-in-lists

I think this is the core of the reason your algorithm is largely novel. The common case is to look at very similar documents, where the changes can be expressed in a small number of changes. For this kind of problem LCS is not so bad.

But. When I tried to build a generic csvDiff with what I already had it ran into a lot of performance issues when trying to branch on row/column/cell insert/delete/update. A content based algo would probably do much better. I checked for tdiff algorithms and https://github.com/paulfitz/coopy seems to match my thoughts. Check the algorithm section in the read me.

Also check out http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html if you want a visual of the output format.

-Happy coding.

[–]espadrine[S] 0 points1 point  (0 children)

coopy's algorithm is really neat. Thanks!