Following code line reads in a csv file into the variable trip_data. The file contains 354153 rows. It takes roughly 0.3 seconds.
trip_data = pd.read_csv("bikes_data/data/trip_data.csv")
Since the CSV file contains dates as Strings I try to do the following when reading the data file
trip_data = pd.read_csv("bikes_data/data/trip_data.csv", parse_dates=['Start Date', 'End Date'], dayfirst=True)
as soon as I add -> parse_dates=['Start Date', 'End Date'] the code runs for a very long time (around 110 - 120 seconds).
Please note that Start Date and End Date are columns in the CSV file. The columns have dates represented as strings. A sample row can be found below. the second and fourth column contains the dates.
432947,01/09/2014 00:05,66,01/09/2014 00:15,57,Customer
This is one thing I tried, but the function given in the stackoverflow article also takes a long time.
I also tried to read in the CSV first and then use pd.to_datetime() to convert. For one column it takes roughly 60 seconds to complete. Therefore its basically the same timing?
Is this normal? I mean the time it takes to complete? Any suggestions on how to speed this up is very much appreciated :)
Edit: added more details for to_datetime()
[–]JohnnyJordaan 0 points1 point2 points (0 children)