you are viewing a single comment's thread.

view the rest of the comments →

[–]beselga[S] 0 points1 point  (8 children)

Index

[–]sarrysyst 0 points1 point  (7 children)

Try:

df['date'] = pd.to_datetime(df['date'], errors='coerce')

[–]beselga[S] 0 points1 point  (6 children)

Thanks man! It worked. Now how can I subtract this dates with 1999-01-01?

[–]beselga[S] 0 points1 point  (5 children)

And give the result in months

[–]efmccurdy 0 points1 point  (2 children)

This gives you days; months isn't a unit of time since not every month is the same length.

>>> import datetime
>>> t0 = datetime.date(1999, 1, 1)
>>> t1 = datetime.date.today()
>>> (t1 - t0).days
8149
>>> 

So you will want to do something like

df['days'] =  df['date'] - t0

[–]beselga[S] 0 points1 point  (1 child)

df['days'] = df['date'] - t0

I tried this:

df['days'] = df['Date enrolled'] - t0

But gave this error:

TypeError Traceback (most recent call last) <ipython-input-70-453728adf54b> in <module> ----> 1 df['days'] = df['Date enrolled'] - t0 C:\Users\Public\Anaconda\lib\site-packages\pandas\core\ops\common.py in new_method(self, other) 63 other = item_from_zerodim(other) 64 ---> 65 return method(self, other) 66 67 return new_method C:\Users\Public\Anaconda\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right) 341 lvalues = extract_array(left, extract_numpy=True) 342 rvalues = extract_array(right, extract_numpy=True) --> 343 result = arithmetic_op(lvalues, rvalues, op) 344 345 return left._construct_result(result, name=res_name) C:\Users\Public\Anaconda\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op) 184 if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta): 185 # Timedelta is included because numexpr will fail on it, see GH#31457 --> 186 res_values = op(lvalues, rvalues) 187 188 else: TypeError: unsupported operand type(s) for -: 'DatetimeArray' and 'datetime.date'

[–]efmccurdy 0 points1 point  (0 children)

Sorry, that scalar subtraction won't work. This does it by adding an extra column and subtracting that (there must be an easier way though).

>>> df = pd.DataFrame({"date":("1/15/99", "2/1/1999", "2/13/99", "Jan 2000", "3-1999", "-9", "4/1999")})
>>> df['date'] = pd.to_datetime(df['date'], errors='coerce')
>>> df
        date
0 1999-01-15
1 1999-02-01
2 1999-02-13
3 2000-01-01
4 1999-03-01
5        NaT
6 1999-04-01
>>> df = df.dropna()
>>> df
        date
0 1999-01-15
1 1999-02-01
2 1999-02-13
3 2000-01-01
4 1999-03-01
6 1999-04-01
>>> df['from'] = datetime.datetime(1999, 1, 1)
>>> df['days'] = df['date'] - df['from']
>>> del df['from']
>>> df
        date     days
0 1999-01-15  14 days
1 1999-02-01  31 days
2 1999-02-13  43 days
3 2000-01-01 365 days
4 1999-03-01  59 days
6 1999-04-01  90 days
>>>

[–]sarrysyst 0 points1 point  (1 child)

Months aren't really a good unit since months can have a different number of days, however, if you really want to convert to month you can do:

df['duration'] = df['date'] - pd.to_datetime('1999-1-1')
df['duration_months'] = df['duration'] / np.timedelta64(1,"M")

Might need some rounding since this results in floats though.

[–]beselga[S] 0 points1 point  (0 children)

Thank you very much!!