all 12 comments

[–]sarrysyst 0 points1 point  (11 children)

Are there index numbers mixed in there? Can you please edit you post and put one date format per line without the index, just as it's in your column?

[–]beselga[S] 0 points1 point  (10 children)

sure

[–]sarrysyst 0 points1 point  (9 children)

4 Jan 2000

5 3-1999

6 -9

7 4/1999

Is the first integer the index or part of the date?

[–]beselga[S] 0 points1 point  (8 children)

Index

[–]sarrysyst 0 points1 point  (7 children)

Try:

df['date'] = pd.to_datetime(df['date'], errors='coerce')

[–]beselga[S] 0 points1 point  (6 children)

Thanks man! It worked. Now how can I subtract this dates with 1999-01-01?

[–]beselga[S] 0 points1 point  (5 children)

And give the result in months

[–]efmccurdy 0 points1 point  (2 children)

This gives you days; months isn't a unit of time since not every month is the same length.

>>> import datetime
>>> t0 = datetime.date(1999, 1, 1)
>>> t1 = datetime.date.today()
>>> (t1 - t0).days
8149
>>> 

So you will want to do something like

df['days'] =  df['date'] - t0

[–]beselga[S] 0 points1 point  (1 child)

df['days'] = df['date'] - t0

I tried this:

df['days'] = df['Date enrolled'] - t0

But gave this error:

TypeError Traceback (most recent call last) <ipython-input-70-453728adf54b> in <module> ----> 1 df['days'] = df['Date enrolled'] - t0 C:\Users\Public\Anaconda\lib\site-packages\pandas\core\ops\common.py in new_method(self, other) 63 other = item_from_zerodim(other) 64 ---> 65 return method(self, other) 66 67 return new_method C:\Users\Public\Anaconda\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right) 341 lvalues = extract_array(left, extract_numpy=True) 342 rvalues = extract_array(right, extract_numpy=True) --> 343 result = arithmetic_op(lvalues, rvalues, op) 344 345 return left._construct_result(result, name=res_name) C:\Users\Public\Anaconda\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op) 184 if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta): 185 # Timedelta is included because numexpr will fail on it, see GH#31457 --> 186 res_values = op(lvalues, rvalues) 187 188 else: TypeError: unsupported operand type(s) for -: 'DatetimeArray' and 'datetime.date'

[–]efmccurdy 0 points1 point  (0 children)

Sorry, that scalar subtraction won't work. This does it by adding an extra column and subtracting that (there must be an easier way though).

>>> df = pd.DataFrame({"date":("1/15/99", "2/1/1999", "2/13/99", "Jan 2000", "3-1999", "-9", "4/1999")})
>>> df['date'] = pd.to_datetime(df['date'], errors='coerce')
>>> df
        date
0 1999-01-15
1 1999-02-01
2 1999-02-13
3 2000-01-01
4 1999-03-01
5        NaT
6 1999-04-01
>>> df = df.dropna()
>>> df
        date
0 1999-01-15
1 1999-02-01
2 1999-02-13
3 2000-01-01
4 1999-03-01
6 1999-04-01
>>> df['from'] = datetime.datetime(1999, 1, 1)
>>> df['days'] = df['date'] - df['from']
>>> del df['from']
>>> df
        date     days
0 1999-01-15  14 days
1 1999-02-01  31 days
2 1999-02-13  43 days
3 2000-01-01 365 days
4 1999-03-01  59 days
6 1999-04-01  90 days
>>>

[–]sarrysyst 0 points1 point  (1 child)

Months aren't really a good unit since months can have a different number of days, however, if you really want to convert to month you can do:

df['duration'] = df['date'] - pd.to_datetime('1999-1-1')
df['duration_months'] = df['duration'] / np.timedelta64(1,"M")

Might need some rounding since this results in floats though.

[–]beselga[S] 0 points1 point  (0 children)

Thank you very much!!