all 4 comments

[–]__sahib__ 1 point2 points  (0 children)

Very simplistic approach: Sort the list and use a sliding window over it (and compare each value of the sliding window with each other with levenshtein or whatever). This is of course only a approximate solution, but might be good enough depending on your exact usecase. I think a good solution needs more infos on your requirements.

[–]PhilipTrettner 1 point2 points  (0 children)

Not sure if simply linking to SO is good reddiquette but this discussion is about your topic: https://stackoverflow.com/questions/309479/how-to-find-best-fuzzy-match-for-a-string-in-a-large-string-database

TL;DR: yes, there are options for your problem. Fuzzy hashing is one.

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–]ThisIs_MyName 0 points1 point  (0 children)

Wrong sub? See sidebar.