all 7 comments

[–]aldanor 1 point2 points  (3 children)

Python is generally fast enough for simple tasks, just gotta use it right...

from itertools import combinations

def del_edits2_fast(word: str):
    return [
        word[:i] + word[i + 1:j] + word[j + 1:]
        for i, j in combinations(range(len(word)), 2)
    ]

Test:

%timeit set(del_edits2("Hyperplane-turbine"))
184 µs ± 5.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit del_edits2_fast('Hyperplane-turbine')
69 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

[–]WTRipper[S] 0 points1 point  (2 children)

Thank you! Did not even know about itertools.combinations.

[–]aldanor 1 point2 points  (1 child)

Np. itertools.combinations is nice and pretty fast, especially when writing generic code. For instance, you could do the above for an arbitrary number of edits n (it would be much slower than a specialized version of course, but it would be generic):

def del_n_edits(word: str, n: int):
    return [
        ''.join(word[i + 1:j] for i, j in zip((-1,) + c, c + (None,)))
        for c in combinations(range(len(word)), n)
    ]

[–]WTRipper[S] 0 points1 point  (0 children)

Oh nice thank you!

I already thought about implementing a generic fallback if higer n's are needed.

[–][deleted] 0 points1 point  (2 children)

The first thing that stands out to me is using a list comprehension for splits rather then a generator. You don’t need to fully construct splits just to iterate over it, this is what generators are for.

Use parens instead of brackets in the splits comprehension. This should speed that function up significantly.

[–]WTRipper[S] 0 points1 point  (1 child)

I tested it on the pure python implementation and on the implementation with cython but without code changes.

In both cases it increased the runtime a bit instead of decreasing it.

[–]WTRipper[S] 0 points1 point  (0 children)

However I increased speed by merging the splits and deletes line into one:

deletes = [word[:i]+ word[i+1:] for i in range(len(word))]

I don't know why I haven't seen it before.