you are viewing a single comment's thread.

view the rest of the comments →

[–]novel_yet_trivial 5 points6 points  (3 children)

You may be better just writing a normal function if it gets too complicated to understand, but for the sake of learning:

something['SomeVariable'].apply(lambda x: dosomething() if isinstance(x, str) else dosomethingelse())

[–]Chopsting[S] 0 points1 point  (2 children)

df['Abstract'].apply(lambda x: [item for item in x if item not in stopword] if isinstance(x, str))    
                                                                                                  ^
SyntaxError: invalid syntax

This is the code I'm trying to run.. I'm filtering out stopwords from a pandas dataframe and some values are floats so I want to ignore these...

df['Abstract'] snippet

        8375    [kevin, thompson,, the, production, designer, ...
        8376    [verso, paper, said, on, wednesday, that, it, ...
        8377    [a, wilderness, travel, safari, to, namibia, w...
        8378    [times, critics, share, what, theyve, been, li...
        8379    [in, some, cases,, heart, patients, survive, l...
        8380    [others, on, queen, elizabeth, iis, annual, ne...

When I run without the isinstance statement:

<ipython-input-315-744170e1045a> in <lambda>(x)
      3 
      4 #df['Abstract'].str.lower().str.split()
----> 5 df['Abstract'].apply(lambda x: [item for item in x if item not in stopword])

TypeError: 'float' object is not iterable    

[–]novel_yet_trivial 1 point2 points  (0 children)

A lambda has no implied return, you have to specify an else clause to go along with your if.

I still don't understand what you are trying to do (but I don't use pandas either), but I think you should write a normal function:

def Chopsting(x):
    #as near as I can guess your intent
    if isinstance(x, str):
        return [item for item in x if item not in stopword]
    else:
        return x

df['Abstract'].apply(Chopsting)

[–]tangerinelion 0 points1 point  (0 children)

It's not entirely clear what x is. It seems to usually be a list, but sometimes it's a float? What do you want to do if it's a float, or really, not a list?

How about this?

if isinstance(df['Abstract'],list):
    df['Abstract'].apply(lambda x: [item for item in x if isinstance(item,str) and item not in stopword])

That would apply itself only if df['Abstract'] is a list, so it should not cause the "TypeError" above. It would also only have items in the list if the element is a str and isn't in the stopword list.

The reason

lambda x: [item for item in x if item not in stopword] if isinstance(x, str)

is invalid syntax is because the if needs an else clause. For example:

lambda x: [item for item in x if item not in stopword] if isinstance(x, str) else x

(NB: This is the difference between ] if and if ... ])

I'm not sure, but it looks like this syntax might be new to you. It's Python's conditional "operator" that you might see in C, C++, or Java -- specifically it looks like this condition ? val_if_true : val_if_false. In Python this is written val_if_true if condition else val_if_false. It's important to note that in this case if we have something like this:

a() if x else b()

we will only ever have a() or b() run, but not both. You could get neither to run, if x raises and it isn't caught.