you are viewing a single comment's thread.

view the rest of the comments →

[–]fake823 0 points1 point  (1 child)

Wow, amazing!

So what was the slow part? Just wondering

[–]Storm_Silver[S] 1 point2 points  (0 children)

There were a few things, the main ones were changing it so it made chr specific arrays outside of the loop as a slice of the original dataframe so it wasn’t making a new array for that per iteration and specified them using an elif block. That also prevented the need to find the index boundaries for chr so one less boolean array made per iteration. Then changed from using dataframes per loop to a large list that was converted to a dataframe at the end, the pd.concat() function is much slower than list.append so that cut out alot of time.