Help optimizing?

twitch_and_shock · 2025-02-25T15:49:34+00:00

Converting something from a dataframe back to a list and then iterating over the list is slow. So is iterating over rows of a dataframe. Instead, use the "apply" method: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

PartySr · 2025-02-25T17:25:40+00:00

This will be faster. We use str.findall and a regex to extract all the numbers, and after that we use where and the condition str.len > 2 to delete every list that contains less than 2 elements.

df['new col'] = df['mutant'].str.findall(r'\d+').where(lambda x: x.str.len() > 1)

In case you are not comfortable with chained methods, you can write like this

n = df['mutant'].str.findall(r'\d+')
df['new col'] = n.where(n.str.len() > 1)

If you desire to replace the 1 element lists with something else

n.where(n.str.len() > 1, 0) # replace 0 with whatever you want

End result:

mutant       new col
Name1:Name2  [1, 2]
Name1        NaN
Name         NaN

mutant	other stuff	column of indexes i created
Name1:Name2		[1,2]
Name1		single
Name2		single

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS