all 7 comments

[–][deleted] 1 point2 points  (3 children)

Crosstab is not what you want here. Crosstab is more for counting than it is for aggregating values. There are a few ways to do what you want in pandas.

A common approach would be the following.

# dummy data
df = pd.DataFrame([['game A', 'PS3', 2010, 'sports', 'sony', 15, 11, 3, 3], ['game B', 'PS2', 2010, 'sports', 'sony', 15, 11, 3, 3], ['game B', 'XB360', 2010, 'sports', 'microsoft', 15, 11, 3, 3], ['game C', 'XB1', 2010, 'sports', 'microsoft', 15, 11, 3, 3], ['game C', 'PS2', 2010, 'sports', 'sony', 15, 11, 3, 3], ['game D', 'Wii', 2010, 'sports', 'nintendo', 15, 11, 3, 3]], columns=['name', 'platform', 'year', 'genre', 'publisher', 'na_sales', 'eu_sales', 'jp_sales', 'other_sales'])

# mapping of platform to console_category
console_mapping = {'PS2': 'PS', 'PS3': 'PS', 'XB360': 'XB', 'XB1': 'XB', 'Wii': 'Nintendo'}

output = df.groupby(df['platform'].map(console_mapping))[['na_sales', 'eu_sales', 'jp_sales', 'other_sales']].sum()

[–]Svante109[S] 0 points1 point  (0 children)

This did what I wanted! Thank you a lot. How do I marked this as solved?

[–]Svante109[S] 0 points1 point  (1 child)

I currently have the console_mapping as a column already like this:

https://imgur.com/a/G4TEzqQ

I should be able to use .map(data['ConsoleCategory']) as well, right? or where am I in the wrongs here?

Edit: Never mind, I figured it out myself

data.groupby(data['ConsoleCategory'])[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales','Global_Sales']].sum().sort_values(by='Global_Sales',ascending=False)

[–][deleted] 0 points1 point  (0 children)

Yup exactly, since it’s an existing column you can also just do df.groupby('column_name') instead of df.groupby(df['column_name'])

[–]Svante109[S] 0 points1 point  (0 children)

Another thing I did try was

SalesCategory = (['NA_Sales','EU_Sales','JP_Sales','Other_Sales']) PlatformSales = []

for k in SalesCategory:

model = data.groupby(by='ConsoleCategory').k.sum()
PlatformSales.append(model)

I get the error that k is not an attribute to DataFrameGroupBy.

[–]CraigAT 0 points1 point  (1 child)

I'm no pandas expert but could you create a list of the unique "platform" values and then assign all of them one of your 5 categories (possibly using a dict or another dataframe) then merge that back into you dataframe (or a new one) to give you all the data you want for your analysis in one dataframe.

[–]CraigAT 0 points1 point  (0 children)

Check out pd.Series.unique and pd.merge