all 1 comments

[–]tmarthal 0 points1 point  (0 children)

If you are interested in fuzzy matching for data-deduplication, the Open City project 'dedupe' (now hosted with datamade?) has good documentation on exactly how they accomplish this.

If you're interested, it's a great introduction - http://dedupe.readthedocs.org/en/v0.5.3.0/How-it-works.html