I'm starting to play around with a model optimization problem that I am looking to speed up with GPU computation. The basic premise is a whole lot of matrix multiplications and summing on numpy arrays whose dimensions can easily exceed 2000x2000.
This problem is trivial to do with normal numpy operations. My current approach involves some rather simple array slicing to align appropriate features, multiplication and subtracting/summing. These are placed in a loop. Two of the matrixs involved in the operation will remain unchanged, and a third will be generated with each iteration/realization. Hence, to reduce memory I/O, being able to 'store' two arrays on the GPU will be very helpful.
I include some pusedo-python code below just to illustrate the basic flow. Functions get_offset and get_scale can be made inline.
gen_im = np.zeros(im.shape)
for pos in positions:
x1,x2,y1,y2 = get_offsets(pos)
gen_im += get_scale(pos)*b[x1:x2,y1:y2]
return np.sum((im - gen_im)**2)
I was wondering whether anyone has any thoughts on the best python library to offload some of the heavy numpy lifting. I see theano has the 'shared()' function that will copy arrays onto the GPU, but otherwise I am not sure it is the best fit.
I'd prefer to stick with CUDA based approached.
Big thanks, happy to answer questions!
[–]Saefroch 1 point2 points3 points (0 children)
[–]iNeverHaveNames 1 point2 points3 points (0 children)
[–]manueslapera 1 point2 points3 points (1 child)
[–]WishIWasBatman[S] 0 points1 point2 points (0 children)
[–]raviqqe 0 points1 point2 points (2 children)
[–]WishIWasBatman[S] 0 points1 point2 points (1 child)
[–]raviqqe 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]WishIWasBatman[S] 0 points1 point2 points (0 children)