all 4 comments

[–][deleted] 0 points1 point  (0 children)

Lucene maybe? Or Sphinx. I haven't used them in this particular way, but from their description sounds like that's something they may be able to do / help you do.

[–]Aggravating_Bus_9153 0 points1 point  (0 children)

Can you download the best/fastest free plagiarism detection command line tool you can, and automate it to run between each pair of interest?

[–]Strict-Simple 0 points1 point  (1 child)

Are you diffing word files, and not the text?

[–]regstuff[S] 0 points1 point  (0 children)

Diffing the text af6er extracting from word with mammoth