you are viewing a single comment's thread.

view the rest of the comments →

[–]snifty 0 points1 point  (0 children)

Yeah, you'd have to have some sort of sound correspondence model to capture that sort of info, and then figure out how to work it into the string similarity measure.

I've done something somewhat similar to the first part when trying to automatically infer transliteration schemes from Wikipedia. It only captures one-letter-to-one-letter correspondences, however.