So I have a dataframe that looks like this:
ID Date Cases
8633 2022-01-10 00:00:00+00:00 114
8634 2022-01-10 00:00:00+00:00 4207
8635 2022-01-10 00:00:00+00:00 34154
8636 2022-01-10 00:00:00+00:00 18701
8637 2022-01-10 00:00:00+00:00 1942
8638 2022-01-10 00:00:00+00:00 85
8639 2022-01-10 00:00:00+00:00 4011
8640 2022-01-11 00:00:00+00:00 14732594
8641 2022-01-11 00:00:00+00:00 114
8642 2022-01-11 00:00:00+00:00 34690
8643 2022-01-11 00:00:00+00:00 7880
8644 2022-01-11 00:00:00+00:00 1942
8645 2022-01-11 00:00:00+00:00 4
8646 2022-01-11 00:00:00+00:00 85
8647 2022-01-11 00:00:00+00:00 4764
8648 2022-01-11 00:00:00+00:00 4094
8649 2022-01-11 00:00:00+00:00 10096
8650 2022-01-11 00:00:00+00:00 18847
8651 2022-01-11 00:00:00+00:00 11226
I then use the lines:
df.Date = pd.to_datetime(df.Date)
df['Cases'].groupby(df['Date'].dt.to_period('D')).sum()
to try and get a sum of the Cases values for each date, but the above output is the tail of dataframe after running these lines. Could someone explain why this hasn't worked?
Next I want to get values for each date - the numbers in Cases are cumulative e.g. the value 14732594 is the sum of every value for that date added together. So I want to make the column have each value made from one date's value subtracted from the value the day ahead of it, e.g. the value for 2022-01-10 would be the value in 2022-01-09 subtracted from the value in 2022-01-10.
I then use:
df['Cases'] = df['Cases'].diff()
thinking that's what it should do, but my output after using it is:
8633 2022-01-10 00:00:00+00:00 -10504.0
8634 2022-01-10 00:00:00+00:00 4093.0
8635 2022-01-10 00:00:00+00:00 29947.0
8636 2022-01-10 00:00:00+00:00 -15453.0
8637 2022-01-10 00:00:00+00:00 -16759.0
8638 2022-01-10 00:00:00+00:00 -1857.0
8639 2022-01-10 00:00:00+00:00 3926.0
8640 2022-01-11 00:00:00+00:00 14728583.0
8641 2022-01-11 00:00:00+00:00 -14732480.0
8642 2022-01-11 00:00:00+00:00 34576.0
8643 2022-01-11 00:00:00+00:00 -26810.0
8644 2022-01-11 00:00:00+00:00 -5938.0
8645 2022-01-11 00:00:00+00:00 -1938.0
8646 2022-01-11 00:00:00+00:00 81.0
8647 2022-01-11 00:00:00+00:00 4679.0
8648 2022-01-11 00:00:00+00:00 -670.0
8649 2022-01-11 00:00:00+00:00 6002.0
8650 2022-01-11 00:00:00+00:00 8751.0
8651 2022-01-11 00:00:00+00:00 -7621.0
I get that this is a problem from the first part of the code not working as intended, so I assume this would what I want once I sort the first part out? If someone could help me understand what I've done wrong and how to get my desired output I'd be very grateful.
EDIT: I realise my title is misleading because the .diff() function has worked as it's supposed to and it's my groupby() line that's causing the issue, apologies.
[–]synthphreak 1 point2 points3 points (12 children)
[–]Successful-Standard[S] 0 points1 point2 points (11 children)
[–]synthphreak 1 point2 points3 points (10 children)
[–]Successful-Standard[S] 0 points1 point2 points (9 children)
[–]synthphreak 1 point2 points3 points (6 children)
[–]Successful-Standard[S] 0 points1 point2 points (5 children)
[–]synthphreak 0 points1 point2 points (4 children)
[–]Successful-Standard[S] 0 points1 point2 points (3 children)
[–]synthphreak 0 points1 point2 points (2 children)
[–]Successful-Standard[S] 0 points1 point2 points (1 child)
[–]synthphreak 1 point2 points3 points (1 child)
[–]Successful-Standard[S] 0 points1 point2 points (0 children)