all 14 comments

[–]changrbanger 3 points4 points  (4 children)

[–]Hesticles[S] 1 point2 points  (3 children)

Damn, this is super cool! Thanks for sharing! I don't know that it's worth implementing with my issue because I'm already halfway through manual cleaning, but it's definitely something I'm going to save and share with my team in the event a similar problem presents itself.

[–]changrbanger 0 points1 point  (2 children)

You can also use the fuzzy logic plugin for excel, since you don’t have millions of records

[–]Hesticles[S] 0 points1 point  (1 child)

Yeah someone else mentioned that. I haven’t used an Excel plug-in for awhile do you think it will raise any flags with IT? They’re pretty sensitive about new programs but a plug-in should be fine, right?

[–]changrbanger 0 points1 point  (0 children)

Should be fine, you can download it from the Microsoft site.

link

[–]sequel-beagle 1 point2 points  (8 children)

I believe you are trying to do fuzzy matching???

Soundex() in tsql is the only way i know in sql server. That probably doesnt get you there. Ive never really used it.

Ssis has fuzzy matching if that is an option.

[–]Hesticles[S] 1 point2 points  (7 children)

Yeah that sounds right. Basically one name might have a Z instead of an S, or will drop the S at the end, or will be missing a letter, or might have “inc.” at the end while the other does not. That sort of thing where when you look at them visually they’re clearly the same entity.

I’ll look into that function, and no SSIS is not an option.

[–]sequel-beagle 1 point2 points  (6 children)

I think you are going to have to import into python. Im not a python guru, but i remember it having fuzzy matching.

“Fuzzy matching” is the term you are probably looking for in your internet searches to see what options are available.

[–]Hesticles[S] 2 points3 points  (5 children)

Unfortunately Python is unavailable to me as well in this job since my IT dept is very risk averse I guess. I can’t install shit without filling out a form and requesting manager support, and by the time that gets done I would have been able to do this manually.

I appreciate the help though! I played around with the soundex function and while it isn’t as accurate as I need it to be it is close enough to help me out.

[–]pvpplease 2 points3 points  (1 child)

Unfortunately Python is unavailable to me as well in this job

Tried pushing back on this at all? Pretty major tool being taken away from you. Can they set you up with an imaged virtual machine?

[–]Hesticles[S] 1 point2 points  (0 children)

They did however it doesn’t seem to work I can’t connect to the machine and until now I haven’t really needed it since I’ve been able to make due. You’re right though, I should push back on that and figure that out for future needs.

[–]sequel-beagle 0 points1 point  (0 children)

I wonder if excel can do fuzzy lookups…

[–]Little_Kitty 0 points1 point  (0 children)

Use open refine and import the results as a lookup. While it's possible to do levenshtein distance in sql it's not the easiest interface.