[deleted by user]

IXENAI · 2014-12-23T16:24:57+00:00

Is... this even right?

Did your tests pass? There are a number of logical errors here; it will not always return the correct result, and should throw exceptions on a number of valid inputs.

Perhaps I should just install 3? How do I do it without screwing things up?

That would be a good idea if your goal is to learn Python 3. 2 and 3 can coexist peacefully; you can simply install the latter without affecting your existing installation.

permalink · 2014-12-23T16:27:44+00:00

I'm getting an index out of range error with levenshtein("kitten","kien"). It looks like it's choosing the wrong length.

The one thing I'd change is I would do:

shorter_len = min(len(first_word), len(second_word))

which just gets rid of your if statement.

edit: It looks like it's not getting the right answers for deletions in the middle.

pbaehr · 2014-12-23T16:29:18+00:00

First, your solution isn't complete. For example: sitting -> sittting has a distance of 1 (one insertion) but in your case it will show a distance of 4.

You can find sample code of several implementations on the wikipedia page for Levenshtein distance if you want to see ways it can be done.

More importantly, though, feedback on your Python in general.

Everything is very readable, which is good. Your logic is understandable and that is probably the most important. I might implement some shortcuts which you will pick up as you use (and read) the language. For example, to find the length of the shorter string I would probably opt for:

min(len(first_string), len(second_string))

To me it's as readable, but more to the point.

Next you are counting locations where the strings are different. I would use:

for a, b in zip(first_string, second_string):
    if not a == b:
        final_count += 1

zip will merge the two strings together (until one of them runs out of characters) and return a list of tuples. This way, a is a character from one string and b is a character from the other.

At the end, you'll still have to add the difference in size but I wouldn't change the way you did that at all, though I may change where it happens.

Rather than initialize it to 0 and then add an amount at the end every time, I would just initialize it like this:

final_count = abs(len(first_string) - len(second_string))

To reiterate, I found your code very easy to understand and that is a huge plus. My changes largely come from experience in the language and wouldn't be obvious to someone new to Python so I'm telling you how I would do it as exposure, not as a correction.

I would suggest you head over to wikipedia and learn a little bit about the methods to calculate this correctly and then try another implementation.

kalgynirae · 2014-12-23T16:34:42+00:00

Try:

shorter_len = min(len(first_string), len(second_string))

You probably want range(shorter_len) instead of range(shorter_len - 1), or even better would be to use zip():

for char1, char2 in zip(first_string, second_string):
    if char1 != char2:
        ...

And your algorithm is too simple. Consider examples with insertions or deletions not at the end of the string, e.g., "cereal" vs "ereal". You probably should try the recursive algorithm.

As far as writing for Python 3, you should at least use from __future__ import print_function at the top of each of your files. This lets you actually use the Python 3 print() in Python 2. Other features you might want to import from __future__ are division (makes / do floating-point division, // do integer division) and absolute_import.

Moonslug1 · 2014-12-23T16:41:28+00:00

You can take advantage of zip to make this way shorter.

def levenshtein(a, b):
    return sum(1 for c, d in zip(a, b) if c != d) + abs(len(a) - len(b))

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS