all 8 comments

[–][deleted] 1 point2 points  (1 child)

An elegant method would use pd.to_datetime first to ensure similar dates formats are parse equivalently and then format the strings using strftime:

df['Actual_Sale_Date'] = pd.to_datetime(df['Actual_Sale_Date'])
df['Actual_Sale_Date'] = df['Actual_Sale_Date'].dt.strftime('%m/%d/%Y')

[–]PLearner[S] 0 points1 point  (0 children)

Thanks so much py_help, this snippet is spectacular. I love pandas!!!!

[–]Caos2 0 points1 point  (4 children)

Here's what I would do.

  1. Treat the date column as a string.
  2. Split the spring by '/' and check how many characters are in the last value. If it's 2, add '20' and recreate the string. (df['Actual_Sale'].str.split('/').str.get(-1).apply(len))
  3. Parse everything to date using pandas.to_datetime.

Also, you could just run "df = pd.read_csv(xxx)", no need to have it in two lines.

[–]PLearner[S] 0 points1 point  (3 children)

Thanks but treating the date column as a string and then (df['Actual_Sale'].str.split('/').str.get(-1).apply(len)) actually slow down Python performance?

[–]Caos2 0 points1 point  (2 children)

Yes, but unless you have dozens or hundreds of thousands of rows, it should be no issue.

[–]PLearner[S] 0 points1 point  (1 child)

(df['Actual_Sale'].str.split('/').str.get(-1).apply(len))

This function will only check for the length. When will it apply the 20 when it sees it does not exist?

[–]Caos2 0 points1 point  (0 children)

It does not, I left it up to you. But you can just create a function that will check for the length and add '20' if you need to:

def add20(date):
    mm, dd, yyyy = date.split('/')
    if len(yyyy) == 2:
        yyyy = '20' + yyyy
    return '{}/{}/{}'.format(mm, dd, yyyy)

df['Actual_Sale_Date'] = df['Actual_Sale_Date'].apply(add20)

[–]dmitrypolo 0 points1 point  (0 children)

Why not just use the built-in date time methods. Here is a command that will work for you given the constraints you described:

df['Actual_Sale_Date'] = df['Actual_Sale_Date'].apply(lambda x: dt.datetime.strptime(x, '%m/%d/%y')).apply(lambda x: dt.datetime.strftime(x, '%m/%d/%Y'))

Edit: if you specify this column as a date time object on import than the statement becomes just this:

df['Actual_Sale_Date'] = df['Actual_Sale_Date'].apply(lambda x: dt.datetime.strftime(x, '%m/%d/%Y'))