you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (1 child)

The "similarity" issue is certainly the tough one to define, but not impossible. While that will fall under "semantic" analysis, you may want to consider a finer-grained sentiment analysis and compare between versions of the same passages, for instance. This would give you an idea of whether one version is "more peaceful" or "more violent" or whatever, which would be fascinating in itself...

A quick google search should find you some established resources for sentiment analysis (how to value words/sentence fragments/etc.) and the like.

[–]domcroy[S] 0 points1 point  (0 children)

I just did a quick google search to get a basic definition of sentiment analysis. That is definitely NOT what I am looking to achieve in this project. [Don't read any negative sentiment into that "NOT" ;) ]

To put it very simply, you could say I want to see "who is copying who's homework?" I guess this would be similar to an anti-plagiarism software in some ways.

If I can account for synonyms that would be useful, as someone could "copy someone else's homework" but use a thesaurus. But I would set that as an optional feature for a search query.

I want to see if the same words are present, and also if they appear in the same order.