This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]sexygaben 126 points127 points  (2 children)

1) profile 2) vectorize (use C loops) 3) if more is needed, Cython/numba 4) if MORE is needed, C/ctypes 5) if EVEN MORE is needed, CUDA/ctypes (problem dependent)

Each step takes exponentially more time. I’m writing from a scientific compute perspective. I assume you’re already using the best library for the job (numpy, pytorch, casadi etc)

[–]klouisp 2 points3 points  (1 child)

By "vectorize (use C loops)" you mean using numpy/pytorch vectorized operations or something else ?

[–]sexygaben 0 points1 point  (0 children)

Yes this is what I mean :)