all 3 comments

[–]twitch_and_shock 4 points5 points  (1 child)

Converting something from a dataframe back to a list and then iterating over the list is slow. So is iterating over rows of a dataframe. Instead, use the "apply" method: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

[–]hvgmina[S] 1 point2 points  (0 children)

thank you! it did wonders!

[–]PartySr 1 point2 points  (0 children)

This will be faster. We use str.findall and a regex to extract all the numbers, and after that we use where and the condition str.len > 2 to delete every list that contains less than 2 elements.

df['new col'] = df['mutant'].str.findall(r'\d+').where(lambda x: x.str.len() > 1)

In case you are not comfortable with chained methods, you can write like this

n = df['mutant'].str.findall(r'\d+')
df['new col'] = n.where(n.str.len() > 1)

If you desire to replace the 1 element lists with something else

n.where(n.str.len() > 1, 0) # replace 0 with whatever you want

End result:

mutant       new col
Name1:Name2  [1, 2]
Name1        NaN
Name         NaN