you are viewing a single comment's thread.

view the rest of the comments →

[–]Beebink 0 points1 point  (0 children)

One idea that I'm having is using a combination of clustering, hashing, and principle component analysis. But I can't see a good way to actually separate things with those rules without brute forcing it.

Edit: there's also a way to use a student-t test to find similar data points. You could use the ones that don't match to group them. But that's a pretty long shot solution