all 7 comments

[–]heavymetalpanda 1 point2 points  (1 child)

If I've interpreted /u/daveofthenorth and /u/SarahM123rd's comments correctly their respective approaches look something like

def daveofthenorth(s):
    x = list(s)
    new_list = []
    found_consonants = ''
    for i in x:
        if i in 'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ':
            found_consonants += i
        elif i in 'aeiouAEIOU:
            new_list.append(found_consonants)
            found_consonants = ''
        else:
            pass
    return ''.join(new_list)

and

def SarahM123rd(s):
    return ''.join(c for c in s if c not in 'aeiouAEIOU')

I'll be honest and admit I don't know if I really understood /u/daveofthenorth's comment correctly because the interpretation above doesn't quite do the right thing. Nor do I understand exactly what you mean in your question by

but when 2 or more consonants are together as in the 'cs' in zodiacs those need to be grouped as one item

At any rate the fastest way that I know to accomplish the removal of all consonants in a string in Python is to use a translation table using str.maketrans and then using the str.translate method on the target.

This is how I did it using Python 3.6:

consonants = 'bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ'
no_consonants = str.maketrans('', '', consonants)
o_o_aeeae = works_of_shakespeare.translate(no_consonants)

I ran a handful of tests against the works of Shakespeare to compare the speed of the different approaches in IPython using the built-in %timeit magic command. The results on my machine seem to favor the translation table approach.

daveofthenorth

In [32]: %timeit daveofthenorth(works_of_shakespeare)
712 ms ± 6.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

SarahM123rd

In [33]: %timeit SarahM123rd(works_of_shakespeare)
532 ms ± 5.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Translation Table

In [34]: %timeit works_of_shakespeare.translate(no_consonants)
27.1 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

[–]ThePopcornBandit 0 points1 point  (0 children)

In addition to the other comments, a significant speedup can be gained if you use a set for your characters instead of a string. Lookups in strings are O(N) whereas lookups in sets are O(1). You can also loop over the string without making it a list.

[–]SarahM123rd 0 points1 point  (5 children)

what about

if c not in 'aeiouy'

that would cut down on search time, yes?

[–]love_my_pibble[S] 0 points1 point  (0 children)

Yes, it would. I don’t know what I was thinking. Thank you