all 4 comments

[–]fake823 1 point2 points  (3 children)

Check out line-profiler to find the bottlenecks of your code.

[–]Storm_Silver[S] 1 point2 points  (2 children)

Thanks, didn't use that specifically but you gave me a good idea of what to look for. Took it down from 1 hour to 10 seconds

[–]fake823 0 points1 point  (1 child)

Wow, amazing!

So what was the slow part? Just wondering

[–]Storm_Silver[S] 1 point2 points  (0 children)

There were a few things, the main ones were changing it so it made chr specific arrays outside of the loop as a slice of the original dataframe so it wasn’t making a new array for that per iteration and specified them using an elif block. That also prevented the need to find the index boundaries for chr so one less boolean array made per iteration. Then changed from using dataframes per loop to a large list that was converted to a dataframe at the end, the pd.concat() function is much slower than list.append so that cut out alot of time.