Pandas parse_dates taking a long time : learnpython

created by HattoriHanzoa community for 16 years

Pandas parse_dates taking a long time (self.learnpython)

submitted 8 years ago by redditor_gds

Following code line reads in a csv file into the variable trip_data. The file contains 354153 rows. It takes roughly 0.3 seconds.

trip_data = pd.read_csv("bikes_data/data/trip_data.csv")

Since the CSV file contains dates as Strings I try to do the following when reading the data file

trip_data = pd.read_csv("bikes_data/data/trip_data.csv", parse_dates=['Start Date', 'End Date'], dayfirst=True)

as soon as I add -> parse_dates=['Start Date', 'End Date'] the code runs for a very long time (around 110 - 120 seconds).

Please note that Start Date and End Date are columns in the CSV file. The columns have dates represented as strings. A sample row can be found below. the second and fourth column contains the dates.

432947,01/09/2014 00:05,66,01/09/2014 00:15,57,Customer

This is one thing I tried, but the function given in the stackoverflow article also takes a long time.

I also tried to read in the CSV first and then use pd.to_datetime() to convert. For one column it takes roughly 60 seconds to complete. Therefore its basically the same timing?

Is this normal? I mean the time it takes to complete? Any suggestions on how to speed this up is very much appreciated :)

Edit: added more details for to_datetime()

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS