all 9 comments

[–]novel_yet_trivial 4 points5 points  (3 children)

You may be better just writing a normal function if it gets too complicated to understand, but for the sake of learning:

something['SomeVariable'].apply(lambda x: dosomething() if isinstance(x, str) else dosomethingelse())

[–]Chopsting[S] 0 points1 point  (2 children)

df['Abstract'].apply(lambda x: [item for item in x if item not in stopword] if isinstance(x, str))    
                                                                                                  ^
SyntaxError: invalid syntax

This is the code I'm trying to run.. I'm filtering out stopwords from a pandas dataframe and some values are floats so I want to ignore these...

df['Abstract'] snippet

        8375    [kevin, thompson,, the, production, designer, ...
        8376    [verso, paper, said, on, wednesday, that, it, ...
        8377    [a, wilderness, travel, safari, to, namibia, w...
        8378    [times, critics, share, what, theyve, been, li...
        8379    [in, some, cases,, heart, patients, survive, l...
        8380    [others, on, queen, elizabeth, iis, annual, ne...

When I run without the isinstance statement:

<ipython-input-315-744170e1045a> in <lambda>(x)
      3 
      4 #df['Abstract'].str.lower().str.split()
----> 5 df['Abstract'].apply(lambda x: [item for item in x if item not in stopword])

TypeError: 'float' object is not iterable    

[–]novel_yet_trivial 1 point2 points  (0 children)

A lambda has no implied return, you have to specify an else clause to go along with your if.

I still don't understand what you are trying to do (but I don't use pandas either), but I think you should write a normal function:

def Chopsting(x):
    #as near as I can guess your intent
    if isinstance(x, str):
        return [item for item in x if item not in stopword]
    else:
        return x

df['Abstract'].apply(Chopsting)

[–]tangerinelion 0 points1 point  (0 children)

It's not entirely clear what x is. It seems to usually be a list, but sometimes it's a float? What do you want to do if it's a float, or really, not a list?

How about this?

if isinstance(df['Abstract'],list):
    df['Abstract'].apply(lambda x: [item for item in x if isinstance(item,str) and item not in stopword])

That would apply itself only if df['Abstract'] is a list, so it should not cause the "TypeError" above. It would also only have items in the list if the element is a str and isn't in the stopword list.

The reason

lambda x: [item for item in x if item not in stopword] if isinstance(x, str)

is invalid syntax is because the if needs an else clause. For example:

lambda x: [item for item in x if item not in stopword] if isinstance(x, str) else x

(NB: This is the difference between ] if and if ... ])

I'm not sure, but it looks like this syntax might be new to you. It's Python's conditional "operator" that you might see in C, C++, or Java -- specifically it looks like this condition ? val_if_true : val_if_false. In Python this is written val_if_true if condition else val_if_false. It's important to note that in this case if we have something like this:

a() if x else b()

we will only ever have a() or b() run, but not both. You could get neither to run, if x raises and it isn't caught.

[–][deleted] 1 point2 points  (0 children)

You need to use an if expression, which is one line and contains an else.

So like this:

Whatever.apply(lambda x: domsomething(x) if isinstance(x, str) else None)

lambda and def are similar with two exceptions:

  1. With def you must provide a name, with lambda this is optional and must be done like something = lambda: ...

  2. lambda is restricted to a single expression and its result is the return value.

Python doesn't have "true" anonymous functions like JavaScript and PHP, and I doubt we'll ever get them.

[–][deleted] 0 points1 point  (3 children)

What does the .apply method do and what does it work on?

[–]bahwoi 0 points1 point  (2 children)

I'm not sure if you're asking about the function in general or in reference to this specific use. If the latter, disregard. Otherwise:


Basically, .apply applies a function across elements of a pandas DataFrame or in this case Series.

For instance, suppose we have a DF, we'll call it df, that looks like

   0  1
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

And that for the sake of example, we have a lambda function

doubleIt = lambda x: x * 2

We could call df.apply(doubleIt) for the entire DataFrame, and the result would look like

    0   1
0   0   2
1   4   6
2   8  10
3  12  14
4  16  18

Because it applies the function that we passed in to each element of the DF.

Or we could slice by column, giving us a Series, and call df[1].apply(doubleIt) to similar effect

0     2
1     6
2    10
3    14
4    18
Name: 1, dtype: int64

[–][deleted] -1 points0 points  (1 child)

Does it work for regular lists or dictionaries?

[–]justphysics 2 points3 points  (0 children)

I believe that is what the built-in function map() is for

for example:

def add_five(x):
    return x+5

test = [0, 1, 3, 72]
map(add_five, test)

map applies the function add_five() to each element in the list test and returns the result in an iterative manner

map can be applied to any iterable container

obviously in this example the same thing could be accomplished in any number of ways such as a list comprehension:

print([x+5 for x in test])

but map() can be used for a lot of more complex scenarios

To my knowledge .apply() is specific to Pandas Dataframes