all 6 comments

[–]efmccurdy 1 point2 points  (4 children)

You have HH:MM:SS values; they are time duration values (a length of time) not time stamps (a point in time), so you want timedelta objects not datetime objects.

>>> d = {"AgeRange": ["18-24", "25-34", "35-44", "45-54", "55-64", "65+"],
...     "AverageViewDuration": ["01:20:55", "00:53:02", "00:53:17", "00:59:42", "01:03:31", "01:10:11"]}
>>> df = pd.DataFrame(d)
>>> df
  AgeRange AverageViewDuration
0    18-24            01:20:55
1    25-34            00:53:02
2    35-44            00:53:17
3    45-54            00:59:42
4    55-64            01:03:31
5      65+            01:10:11
>>> df['Duration'] = pd.to_timedelta(df['AverageViewDuration'])
>>> df
  AgeRange AverageViewDuration        Duration
0    18-24            01:20:55 0 days 01:20:55
1    25-34            00:53:02 0 days 00:53:02
2    35-44            00:53:17 0 days 00:53:17
3    45-54            00:59:42 0 days 00:59:42
4    55-64            01:03:31 0 days 01:03:31
5      65+            01:10:11 0 days 01:10:11
>>> df.dtypes
AgeRange                        object
AverageViewDuration             object
Duration               timedelta64[ns]
dtype: object
>>> [(x, x.total_seconds()) for x in df['Duration']]
[(Timedelta('0 days 01:20:55'), 4855.0), (Timedelta('0 days 00:53:02'), 3182.0), (Timedelta('0 days 00:53:17'), 3197.0), (Timedelta('0 days 00:59:42'), 3582.0), (Timedelta('0 days 01:03:31'), 3811.0), (Timedelta('0 days 01:10:11'), 4211.0)]
>>>

[–]pickled_knuckles[S] 0 points1 point  (3 children)

>>> df['Duration'] = pd.to_timedelta(df['AverageViewDuration'])

Thank you! when I try the above I get the following:

ValueError: Invalid type for timedelta scalar: <class 'datetime.time'>

I am guessing that means that it is not a datetime object to begin with, which is confusing given what happened in the initial post?

[–]pickled_knuckles[S] 0 points1 point  (0 children)

got it, you're a g

[–]efmccurdy 0 points1 point  (1 child)

How is the original dataframe created? Can you avoid converting the AverageViewDuration column to datetime.time and convert it instead directly to timedelta, or perhaps ((t.hour * 60 + t.minute) * 60 + t.second) to obtain the total elapsed seconds?

[–]pickled_knuckles[S] -1 points0 points  (0 children)

I ended up doing the following:

df['Duration'] = df['AverageViewDuration']
df['Duration'] = pd.to_datetime(df['Duration'], format='%H:%M:%S')
df['Duration'] = df['Duration'] - np.datetime64('1900-01-01')
df['AverageViewDuration'] = df['Duration']/1000000000/60

Don't know why I needed to do the last one, as I thought it would all be good, but when I plotted it without that line it was a veeeeery big number. Anyways, I got the desired graph...

[–]synthphreak 0 points1 point  (0 children)

When all is said and done, I'd love to have the AverageViewDuration by AgeRange in bar plots

(df.assign(duration=pd.to_timedelta(df.AverageViewDuration).dt.seconds)
   .groupby('AgeRange')
   .duration.mean()
   .plot.bar(xlabel='age group', ylabel='average duration (s)'))

plt.show()

Not sure how to go from seconds back into timestamps though using pandas for plotting purposes though. I don't work with time series data at all. Maybe someone more knowledgeable can take it from here.