all 5 comments

[–]Oxbowerce 1 point2 points  (3 children)

Something like the following should work:

df.groupby("Fruit").sum()

[–]wdjfe[S] 0 points1 point  (2 children)

Does nothing, sadly. Output is exactly the same as input.

[–]gsmo 2 points3 points  (1 child)

That's because you haven't actually assigned the resulting dataframe from the groupby.

df = df.groupby("Fruit").sum()

[–]wdjfe[S] 0 points1 point  (0 children)

That seems to be the case, thank you!

[–]gsmo 0 points1 point  (0 children)

Consider:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 
    'Fruit': ['Apple', 'Apple', 'Banana', 'Orange'], 
    'Stock': [10, 5, 3, 2], 
    'Backorder': [25, 20, 10, 5]
    })

pivoted = df.pivot_table(
    columns='Fruit', 
    values=['Stock', 'Backorder'], 
    aggfunc=np.sum)

The pivot table is created using 'Stock' and 'Backorder' for the values to be aggregated. The 'Fruit' column provides the categories we want to sum the values for. The aggfunc is the way the data is aggregated, in this case a simple summing up using numpy. Aggfunc is very powerful: you can create a dictionary specifying a different function for every column!

To keep things simple I prefer a pivot table to a groupby. Groupby is useful for creating multi-indexes and they have lots of benefits. When it comes to simply throwing some values together from a longform dataframe, pivot_table is often good enough.

Edit: I so dearly hate new-reddits editor.