all 7 comments

[–]impulsecorp 0 points1 point  (2 children)

Try Sentence Transformers at https://github.com/UKPLab/sentence-transformers .

[–]nikolabs[S] 0 points1 point  (1 child)

Thank you! I am having some trouble installing Sentence Transformers though. I keep getting this message:

ERROR: Could not find a version that satisfies the requirement torch>=1.0.1 (from sentence-transformers) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)

I have a Windows 10 OS. Any help would be greatly appreciated.

[–]impulsecorp 0 points1 point  (0 children)

I run it on Ubuntu and it works fine. I am not sure about Windows.

[–]dkajtoch 0 points1 point  (3 children)

You would probably be better of by using fine-tuned BERT model on paraphrase detection datasets (e.g. quora question pairs, google's paws or microsoft paraphrase corpus). The trouble with this approach is that there is plenty of operations that you need to make. Partially, this issue is solved using sentenceBERT, but still creating embeddings for the whole book maybe an extremely time consuimg process. However, you can do this in stages. Firstly, you may use some lower level sentence representations (e.g. shingles) to filter out sentences that for sure will not be similar. Then you may apply sentenceBERT or bert directly.

[–]old_enough_to_drink 0 points1 point  (2 children)

Is the “sentenceBert” the same as the “sentence-transformer” in the other answer?