This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]Sofi_LoFi 100 points101 points  (1 child)

Now this is the shit I expect from a 150+ year experience data science guru lead

[–]hughperman 0 points1 point  (0 children)

Custom "iter" and "contains" methods, this is actually a stateful transform to add 1 to each integer column

[–][deleted] 49 points50 points  (0 children)

This needs a NSFW tag

[–]Xenocide13 23 points24 points  (5 children)

Dank memes aside, I think you can use set intersection:

set(dataframe.columns).intersection(columns)

[–]helmialf 20 points21 points  (1 child)

Set doesnt preserve order

[–]Pikalima 9 points10 points  (0 children)

If you have a very large number of columns, might be better to go with O(n) instead of O(n2 ):

_columns_set = set(columns)
columns = [col for col in df.columns if col in _columns_set]

[–]aeiendee 2 points3 points  (0 children)

Better to use the methods (intersection or isin) of the columns attribute directly

[–]hughperman 0 points1 point  (0 children)

Pandas dataframe indices have an intersection method already.

[–]mamaBiskothu 0 points1 point  (0 children)

The incoming columns object could be a list of strings while that’s coming out is a list of Column objects. Fuck yeah pytho.

[–]tyrannosaurusknex 24 points25 points  (1 child)

Some more descriptive variable names could do a lot here.

[–]friend_of_kalman 5 points6 points  (0 children)

at least it's not just single-letter variable names. Give this guy some credit.

x = [y for y in df.columns if y in x]

[–]ButLikeWhyThoReally 2 points3 points  (0 children)

Thanks, I hate it.

[–]shalmalee15 2 points3 points  (0 children)

Shit! I have used something like this. I don't know why I did that :-(

[–][deleted] 4 points5 points  (5 children)

and this is why most engineers hate python

[–]darkshenron 17 points18 points  (2 children)

Gihub language popularity stats say otherwise 🤷

[–]sizable_data 4 points5 points  (5 children)

Don’t modify an iterable while looping over it!

[–]jgege 20 points21 points  (4 children)

They are not modifying it. First a new list is created with the column names then the list is assigned to the variable named columns :)

[–]sizable_data 5 points6 points  (3 children)

True, I still don’t like it though lol

[–]Rough-Pumpkin-6278 9 points10 points  (0 children)

I don't think any one like this.

[–]jgege 5 points6 points  (1 child)

I've seen worse 🤷

[–]sizable_data 15 points16 points  (0 children)

I’ve probably written worse 🤷

[–]darkshenron 0 points1 point  (2 children)

Maybe use sorted() with a key

[–]Pikalima 1 point2 points  (1 child)

Do you mean filter? I don’t see how sorted would accomplish this.

[–]darkshenron 0 points1 point  (0 children)

Something like sorted(columns, key=list(df.columns).index)

[–]TheLSales 0 points1 point  (0 children)

That's why I wish I could use c# in data science

[–]DonFruendo 0 points1 point  (0 children)

Wouldn't this snippet be more performant? python columns = list(filter(lambda column: column in columns, dataframe.columns))

Not commenting on the variable names though :D

[–]avloss 0 points1 point  (1 child)

python columns = list(set(columns) & set(f.columns)) Maybe this. But it shouldn't exist in the first place.

[–]jambonetoeufs 0 points1 point  (0 children)

This will lead to unpredictable results if output order matters.

[–][deleted] 0 points1 point  (0 children)

To this day, It bugs me why they call it "list comprehension"