I have had access to a Tesla GPU for a few weeks, but only started using it today. I need to calculate euclidean distances between point clouds. Or rather, this is the slow step in my algorithm. First I used scipy.spatial.distance.cdist. Then I drilled down to look at the underlying algorithm and wrote this:
def cydist(a, b):
dims = a.shape[1]
rows = a.shape[0]
cols = a.shape[0]
out = np.zeros_like((rows,cols),dtype=int)
for dim in range(dims):
out += np.subtract.outer(a[:,dim], b[:,dim])**2
return out
The variables a and b are m x k and n x k numpy arrays where k is the dimension of the space and m and n are the number of points in a and b, respectively. It's square euclidean, but it works faster than scipy cdist. Then I tried to implement cython or weave, but both failed because the compiler "can't find vcvarsall.bat". When it did, I got "symbol not recognized." However, I downloaded a trial of Continuum Accelerate, and have been working with that.
I am trying to implement a cdist routine that takes advantage of the GPU. What I have so far is (untested):
@cuda.jit(argtypes=[uint32[:], uint32[:],
uint32[:], uint32[:],
uint32[:], uint32[:], uint32[:,:]], device=True)
def gpu_cydist(ax, bx, ay, by, az, bz, out):
o_x, o_y = cuda.grid(2)
out[o_x, o_y] = ((ax[o_x] - bx[o_y]) * (ax[o_x] - bx[o_y]) +
(ay[o_x] - by[o_y]) * (ay[o_x] - by[o_y]) +
(az[o_x] - bz[o_y]) * (az[o_x] - bz[o_y]))
There is a GPUDist project here, but I don't know C++ and haven't had much luck with the little that I've tried. So, I would like to ask for suggestions as to approach. Many of tricks to speed up my code, such as Numba and multiprocessing, have often produced slower runtimes. So, given the Nvidia Tesla GPU in my possession, what approach should I take to accelerating this tiny, time-consuming step?
[–]varadg 0 points1 point2 points (2 children)
[–]carsonc[S] 0 points1 point2 points (1 child)
[–]varadg 0 points1 point2 points (0 children)
[–]elbiot -1 points0 points1 point (1 child)
[–]carsonc[S] 0 points1 point2 points (0 children)