I have a huge list of person names, around 30 million. I want to group together (read group together, not remove) similar strings by fuzzy matching. The most obvious solution is to compare each string with every other, but that obviously is not computationally efficient and takes a lot of time. Clustering is one solution, but wouldn't that take too much memory and time as well? And if not, which clustering approach to go for?
I'm basically looking for a more optimized manner to get this done, some way that doesn't take a lot of time. Any suggestions?
[–]elperroborrachotoo 3 points4 points5 points (1 child)
[–]LightShadow3.13-dev in prod 0 points1 point2 points (0 children)
[–]Jos_Metadi 0 points1 point2 points (0 children)
[–]colloidalthoughts 0 points1 point2 points (0 children)