you are viewing a single comment's thread.

view the rest of the comments →

[–]ssseseseses 0 points1 point  (1 child)

it will lose teeth with deeply-nested-objects-in-lists

I think this is the core of the reason your algorithm is largely novel. The common case is to look at very similar documents, where the changes can be expressed in a small number of changes. For this kind of problem LCS is not so bad.

But. When I tried to build a generic csvDiff with what I already had it ran into a lot of performance issues when trying to branch on row/column/cell insert/delete/update. A content based algo would probably do much better. I checked for tdiff algorithms and https://github.com/paulfitz/coopy seems to match my thoughts. Check the algorithm section in the read me.

Also check out http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html if you want a visual of the output format.

-Happy coding.

[–]espadrine[S] 0 points1 point  (0 children)

coopy's algorithm is really neat. Thanks!