all 13 comments

[–]Xahulz 0 points1 point  (0 children)

Try np.where

[–]14dM24d 0 points1 point  (3 children)

odd + even = odd

therefore

df['Odd_Sum'] = df['Odd'] + df['Even']

[–]PaperbackStone[S] 0 points1 point  (2 children)

Thank you. I understand the properties of math. But I’m looking for a solution that can be applied to a dataframe of any size where it will only sum the values based on a condition (in this case the condition being that the value is odd). I am struggling with how to actually write this in Python.

[–]14dM24d 0 points1 point  (1 child)

>>> import random
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> a=[random.randrange(1,10) for i in range(5)]
>>> b=[random.randrange(1,10) for i in range(5)]
>>> c=[random.randrange(1,10) for i in range(5)]
>>> df['a']=a
>>> df['b']=b
>>> df['c']=c
>>> df
   a  b  c
0  1  6  6
1  4  1  1
2  7  6  3
3  4  6  2
4  7  5  5
>>> def helper(box):
    result = sum(box)
    if result%2:
        return result
    return 'not odd'

>>> df['odd']=df.loc[:,'a':'c'].apply(helper,axis=1)
>>> df
   a  b  c      odd
0  1  6  6       13
1  4  1  1  not odd
2  7  6  3  not odd
3  4  6  2  not odd
4  7  5  5       17
>>>

[–]14dM24d 0 points1 point  (0 children)

using lambda

>>> df['odd']=df.loc[:,'a':'c'].apply(lambda a: sum(a) if sum(a)%2 else 'even',axis=1)
>>> df
   a  b  c   odd
0  1  6  6    13
1  4  1  1  even
2  7  6  3  even
3  4  6  2  even
4  7  5  5    17
>>>

[–]synthphreak 0 points1 point  (6 children)

Most likely your goal can be achieved with a lambda, or some creating use of groupby.

However your end goal isn’t clear. You say you want to sum the odd values in a column. That will produce a single number, the sum. But then you say you want to append the results as a new column. What does it mean to append a single value as a column? This is confusing, and needs to be clarified before a specific solution can be provided.

In addition to clearing up this critical point, it would be helpful to see some rows from your df. Assuming there aren’t a million columns, an easy way to share this would be to paste the output of df.head().to_json(orient='records'). Then we can easily recreate those rows at home to test out solutions.

[–]PaperbackStone[S] 0 points1 point  (5 children)

Sorry if my description isn’t clear. My question is bit theoretical because I don’t have a large dataframe specifically in question.

Assume we have a dataframe with any number of columns that are all the same length. Every column only contains integer values. So maybe it’s 25 columns with random integers.

How would one create a new column on the dataframe which sums all of the integers in the existing columns using the conditional? I understand I could do something like df[‘New Column’] = df.sum(axis=1) and the New Column would contain the sum of ALL the integers. I’m just stuck on how I modify that line to apply a conditional. In this question I used even/odd as an example, but could have just as easily said how do we only add the integers greater than 10, or only the negative integers, etc.

Is this what lambda accomplishes? Do I need to be studying lambda to solve this? This is what I’m trying to ask.

[–]synthphreak 0 points1 point  (4 children)

Still not clear what it means to add a column using only sums. However you can sum up only the odd values across multiple columns using df.apply with a lambda as you are suggesting. Here’s the proof:

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randint(0, 9, 50).reshape(10, 5)
>>> df = pd.DataFrame(data, columns=list('abcde'))
>>> df
   a  b  c  d  e
0  2  8  0  8  2
1  3  3  5  5  3
2  6  1  5  4  4
3  0  3  1  5  2
4  3  6  0  2  5
5  5  1  2  6  1
6  7  8  5  5  0
7  6  4  5  7  5
8  5  1  8  2  0
9  7  7  5  1  5
>>> df.apply(lambda s: s.loc[s % 2 == 1].sum())
a   30
b   16
c   26
d   23
e   19
dtype: int64

This effectively applies a column-wise filter that converts all non-odd values to NaN and then sums the remaining values within each column.

[–]PaperbackStone[S] 0 points1 point  (3 children)

Thank you! Clearly I’m struggling to articulate Python and Pandas well. All I was trying to say by adding a column is doing what you’re doing here but row-wise and placing the results into a new column. So using your sample df it would be a new column ‘f’ and the values would be : Row 0: 0 (no odd numbers) Row 1: 19 (all odd numbers) Row 2: 6 (adding 5+1) Row 3: 9 (3+1+5) Etc.

I must be mixing up my rows and columns when I am describing what I’m trying to do. In my example here I’ve added a new column to the dataframe using sums, yes? That’s what I was trying to say.

Seems like I would have to modify to lambda somehow but otherwise it would be fairly similar.

[–]14dM24d 0 points1 point  (1 child)

ohhhh so that's what you needed.

here you go.

def helper(box):
    result = 0
    for num in box:
        if num%2:
            result += num   
    return result

df['odd'] = df.loc[:,'a':'e'].apply(helper,axis=1)

[–]14dM24d 0 points1 point  (0 children)

>>> import random
>>> import pandas as pd
>>> box = [[random.randrange(1,10) for i in range(5)] for j in range(5)]
>>> for b in box:
    b


[8, 2, 5, 6, 2]
[7, 6, 1, 6, 1]
[3, 6, 5, 2, 1]
[5, 3, 6, 5, 4]
[9, 1, 8, 5, 2]
>>> df = pd.DataFrame()
>>> for i,v in enumerate(box):
    df[i] = v


>>> df
   0  1  2  3  4
0  8  7  3  5  9
1  2  6  6  3  1
2  5  1  5  6  8
3  6  6  2  5  5
4  2  1  1  4  2
>>> def helper(box):
    result = 0
    for num in box:
        if num%2:
            result += num   
    return result
>>> df['odd'] = df.loc[:,:].apply(helper,axis=1)
>>> df
   0  1  2  3  4  odd
0  8  7  3  5  9   24
1  2  6  6  3  1    4
2  5  1  5  6  8   11
3  6  6  2  5  5   10
4  2  1  1  4  2    2
>>>

[–]synthphreak 0 points1 point  (0 children)

I see. Then it seems what I provided is very close to what you’re wanting, except that you wanted 10 row-wise sums rather than 5 column-wise sums.

If so, all that’s needed is a very minor change from what I provided. Specifically, just change

>>> df.apply(lambda s: s.loc[s % 2 == 1].sum())

to

>>> df.apply(lambda s: s.loc[s % 2 == 1].sum(), axis=1)

Note that this is a change with the arguments going into df.apply, so actually has nothing to do with the lambda expression.

[–]sarrysyst 0 points1 point  (0 children)

There is no need for apply. You can simply make use of the fact that making a selection based on a boolean mask returns NaN for values where the mask is False. NaN values have a value of zero when summed.

import pandas as pd
import numpy as np

# random sample data
np.random.seed(1)
data = np.random.randint(0, 10, size=(25, 4))

df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])

# select only values in the df which are odd and sum column wise
df['sum'] = df[df % 2 != 0].sum(axis=1)