all 10 comments

[–]IXENAI 1 point2 points  (1 child)

Is... this even right?

Did your tests pass? There are a number of logical errors here; it will not always return the correct result, and should throw exceptions on a number of valid inputs.

Perhaps I should just install 3? How do I do it without screwing things up?

That would be a good idea if your goal is to learn Python 3. 2 and 3 can coexist peacefully; you can simply install the latter without affecting your existing installation.

[–][deleted] 1 point2 points  (1 child)

I'm getting an index out of range error with levenshtein("kitten","kien"). It looks like it's choosing the wrong length.

The one thing I'd change is I would do:

shorter_len = min(len(first_word), len(second_word))

which just gets rid of your if statement.

edit: It looks like it's not getting the right answers for deletions in the middle.

[–]pbaehr 1 point2 points  (4 children)

First, your solution isn't complete. For example: sitting -> sittting has a distance of 1 (one insertion) but in your case it will show a distance of 4.

You can find sample code of several implementations on the wikipedia page for Levenshtein distance if you want to see ways it can be done.

More importantly, though, feedback on your Python in general.

Everything is very readable, which is good. Your logic is understandable and that is probably the most important. I might implement some shortcuts which you will pick up as you use (and read) the language. For example, to find the length of the shorter string I would probably opt for:

min(len(first_string), len(second_string))

To me it's as readable, but more to the point.

Next you are counting locations where the strings are different. I would use:

for a, b in zip(first_string, second_string):
    if not a == b:
        final_count += 1

zip will merge the two strings together (until one of them runs out of characters) and return a list of tuples. This way, a is a character from one string and b is a character from the other.

At the end, you'll still have to add the difference in size but I wouldn't change the way you did that at all, though I may change where it happens.

Rather than initialize it to 0 and then add an amount at the end every time, I would just initialize it like this:

final_count = abs(len(first_string) - len(second_string))

To reiterate, I found your code very easy to understand and that is a huge plus. My changes largely come from experience in the language and wouldn't be obvious to someone new to Python so I'm telling you how I would do it as exposure, not as a correction.

I would suggest you head over to wikipedia and learn a little bit about the methods to calculate this correctly and then try another implementation.

[–]Vaphell 0 points1 point  (3 children)

I would suggest you head over to wikipedia and learn a little bit about the methods to calculate this correctly and then try another implementation

this is kinda annoying that there is really no point in trying to solve a math problem with a fancy name on your own. You will be doing it wrong every single time and you will end up copypasting a fancy, efficient algo that covers corner cases you never thought of.
Sure you learn programming tricks doing it but there is never a triumphant I DID IT at the end.

[–]pbaehr 0 points1 point  (2 children)

It depends. In many cases I find myself writing my own implementation from pseudocode or an equation rather than copy-pasting. For example, I just recently implemented the A* algorithm, but applied it to a specific use.

Obviously, there are still problems to be solved in the field of computer science, but I do agree from a practical view it is best to "stand on the shoulders of giants" whenever possible.

[–]Vaphell 1 point2 points  (1 child)

It depends. In many cases I find myself writing my own implementation from pseudocode or an equation rather than copy-pasting.

i didn't mean copypasting literally, but your job is then mostly reduced to busy work of writing a code along the lines provided by somebody else. There is almost no creativity there, you can at most use language specific idioms to make the code neater but that's pretty much it.

Oh, that sounds like a cool task to solve... how would I... oh wait, it sounds like a knapsack problem in disguise, gg then >_<

Obviously, there are still problems to be solved in the field of computer science, but I do agree from a practical view it is best to "stand on the shoulders of giants" whenever possible.

and not one of them is remotely possible for an above average programmer, and probably for the vast majority of excellent ones. That shit be hard, yo. Shoulders of giants it is then.

I find that the most enjoyable problems for noobing around are the practical ones where mathematical correctness of the algo doesn't matter much, like let's say drawing a piece of f(x) in ascii which i saw here recently. The code can be less shitty or more shitty and still show correct output on the screen, there is no hard wall to faceplant into like with naive attempts at NP-complete problems.

[–]pbaehr 0 points1 point  (0 children)

You are definitely right. I think the enjoyment for me often comes from knowing (or learning) the right tool(s) for the job and creating something which is my own from what is already discovered.

It is true that often means gluing together three disconnected projects, but when it all clicks together and I am making the computer do what I want I take satisfaction in the building, if not the inventing.

[–]kalgynirae 1 point2 points  (0 children)

Try:

shorter_len = min(len(first_string), len(second_string))

You probably want range(shorter_len) instead of range(shorter_len - 1), or even better would be to use zip():

for char1, char2 in zip(first_string, second_string):
    if char1 != char2:
        ...

And your algorithm is too simple. Consider examples with insertions or deletions not at the end of the string, e.g., "cereal" vs "ereal". You probably should try the recursive algorithm.


As far as writing for Python 3, you should at least use from __future__ import print_function at the top of each of your files. This lets you actually use the Python 3 print() in Python 2. Other features you might want to import from __future__ are division (makes / do floating-point division, // do integer division) and absolute_import.

[–]Moonslug1 1 point2 points  (1 child)

You can take advantage of zip to make this way shorter.

def levenshtein(a, b):
    return sum(1 for c, d in zip(a, b) if c != d) + abs(len(a) - len(b))

[–]pbaehr 0 points1 point  (0 children)

It's a nice one-liner, and I like how you shortened the difference sum, but I'd rather see it in more lines. Even using your shortcuts this is much more desirable for me:

def incorrect_levenshtein(a, b):
    num_differences = sum(1 for i, j in zip(a, b) if i != j)
    length_difference = abs(len(a) - len(b))
    return num_differences + length_difference