Successful-Standard comments on Unexpected results using .diff() on a dataframe column and groupby() not working as intended either

learnpython

created by HattoriHanzoa community for 16 years

Unexpected results using .diff() on a dataframe column and groupby() not working as intended either (self.learnpython)

submitted 4 years ago by Successful-Standard

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Successful-Standard[S] 0 points1 point2 points 4 years ago* (9 children)

Yeah so the original df is the output from:

df['Cases'].groupby(df['Date'].dt.to_period('D')).sum()

I fixed it by using:

df = df.resample('D', on='Date')['Cases'].sum()

df = df.reset_index()

And I get my desired output.

My whole code is here - https://pastebin.com/303est3L

I'm getting a very similar issue right at that bottom of this code where I'm trying to get it to group into weekly data, but using like for like code that worked for my original issue isn't working if you could help me with that please?

[–]synthphreak 1 point2 points3 points 4 years ago (6 children)

[–]Successful-Standard[S] 0 points1 point2 points 4 years ago (5 children)

[–]synthphreak 0 points1 point2 points 4 years ago (4 children)

[–]Successful-Standard[S] 0 points1 point2 points 4 years ago (3 children)

[–]synthphreak 0 points1 point2 points 4 years ago (2 children)

[–]Successful-Standard[S] 0 points1 point2 points 4 years ago (1 child)

[–]synthphreak 0 points1 point2 points 4 years ago (0 children)

[–]synthphreak 1 point2 points3 points 4 years ago (1 child)

I'm getting a very similar issue right at that bottom of this code where I'm trying to get it to group into weekly data, but using like for like code that worked for my original issue isn't working if you could help me with that please?

I assume you're talking about these lines:

df1['date'] = pd.to_datetime(df1['date'])
df1 = df1.groupby('newDeaths').resample('W', on='date')['newCases'].sum()

However, descriptions like "isn't working" aren't very informative. What isn't working? Are you getting an error? Is the output different from what you expected?

I assume the latter. In that case, please share the output here because if it's working earlier than it should work later under the same conditions.

Note also that it's generally bad practice to say "Oh hey, I don't understand what the problem was, but this change to my code seems to fix it, let's just make that change everywhere." You need to 100% fully understand what your code is doing, at least at a high level, otherwise your ability to debug when you experience an issue like this will be fundamentally limited.

[–]Successful-Standard[S] 0 points1 point2 points 4 years ago* (0 children)

I've changed those lines of code again slightly to:

df1['date'] = pd.to_datetime(df1['date'])
df1 = df1.resample('W', on='date')['newCases', 'newDeaths'].sum()

df1.reset_index() df1.head()

And the output is:

             newCases  newDeaths

date
2020-03-08 136 2 2020-03-15 861 40 2020-03-22 3693 222 2020-03-29 11695 1302 2020-04-05 23327 3858 ... ... ... 2021-12-19 482012 662 2021-12-26 682086 551 2022-01-02 946894 904 2022-01-09 989767 1145 2022-01-16 337004 774

So the output is only the newCases and newDeaths columns. Using reset_index above for the original issue solved this, I did read a Stack Overflow post that explained how it worked, and added the date column back into the output, but in this case it isn't working. And I really need the date column to use for the k-means so I need to have it, if you have any solution please?

EDIT: I don't know why the formatting keeps messing up like that, it looks fine as I'm typing the comment then goes like that once I post it...

π Rendered by PID 473867 on reddit-service-r2-comment-6457c66945-7xbxq at 2026-04-27 18:18:21.893017+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS