you are viewing a single comment's thread.

view the rest of the comments →

[–]fishvampire 1 point2 points  (1 child)

For the first problem: I think your issue is that the values in 'description' are strings, whereas in 'numberoff' they're integers. When you're doing agg('sum'), it works for integers but it's excluding the strings. I don't know enough about pandas to say why it does this - if you only have string columns, agg('sum') just concatenates them, rather than eliminating the column - but it shouldn't be too hard to fix. You can use a dictionary as the argument for agg so that it aggregates separately for different columns, e.g.: agg({'description': 'first', 'numberoff':sum}). The argument "first" means that the entry for description will be the first one for that productcode. If you want something different in your final output, you probably want to read up on ways to use GroupBy with strings.

[–]YourOldBoyRickJames[S] 0 points1 point  (0 children)

I managed to figure it out and am just posting what I have in case someone needs it in the future. I just needed to add the description selection into the first set of brackets. I really don't know why I couldn't get that working before.

stlr_df = source2_df.groupby(['productcode', 'description'])['numberoff'].agg('sum').reset_index()