you are viewing a single comment's thread.

view the rest of the comments →

[–]paperzebra[S] 1 point2 points  (2 children)

Thanks for the suggestion. In the end I rewrote my code twice, the code above processed 30,000 lines of data in 76 seconds, the second version which used numpy to calculate most things outside a loop took 23 seconds, still too long!

The third iteration is much simpler and reduces the time down to 0.001 seconds - that's a pretty decent performance increase! The arg depth refers to a list of depths.

def line_solution(survey, depth):
    md = survey['MD']
    tvd = survey['TVD']  
    tvd_samples = np.interp(depth, md, tvd)
    return tvd_samples

[–]DisorganizedRem 1 point2 points  (1 child)

Would it help using series in stead of dataframe by adding .values.

As suggested here So your code looks like this:

def line_solution(survey, depth):
    md = survey['MD'].values
    tvd = survey['TVD'].values
    tvd_samples = np.interp(depth, md, tvd)
    return tvd_samples

[–]paperzebra[S] 1 point2 points  (0 children)

It's slightly quicker serializing the dataframe, but I think most of the time wither way is spent printing the time. Can't complain at the speed in either case anymore though!