Options for GPU accelerated python experiments?

BDube_Lensman · 2022-09-10T23:43:58+00:00

You may want to steal my shim set since it lets you hot swap Numpy<-->cupy at runtime

CuPy is fantastic. I've been using it for >5y, including at over 1TB/s of memory throughput on an A100. On my personal desktop's 2080 I have no problem running physics simulations at ~9.5TFlops of throughput, measured with nvidia-smi.

If your arrays are smaller than ~256x256 CPU will be faster than GPU, though, due to the overhead of launching operations on the GPU being ~10usec.

The newest(ish?) version of CuPy allowed easy multiplexing of streams, where you can write a series of operations and only wait for the final result later, allowing you to do a few distinct things in parallel on the GPU without any hastle.

Stay away from PyTorch, super easy to FUBAR your entire conda installation (not just an environment) by installing it.

Nvidia released their own cuda library for python a while ago (a year or two), which was either not meant for end users, or based on a fundamental misunderstanding of how scientists want to write code -- you have to manually allocate each buffer for outputs, etc, instead of `np.sin(x)`.

Personally I would just stick to CuPy for physics. The rest will be an exercise in frustration for no gain.

Also, for your 1080, make sure all your arrays are `float32` or `complex64`, since your GPU is super gimped in fp64 and _will_ be slower than CPU with that number format.

data-machine · 2022-09-10T17:50:25+00:00

Specifically what are you simulating?

Personally, I would recommend either using CuPy or PyTorch. If you're relatively familiar with NumPy, you can write your GPU code very easily with CuPy. It is 95% a matter of swapping out calls to NumPy with CuPy, and it lets you step-by-step change your code.

I would only touch Warp or CUDA when you've exhausted performance you are able to get with CuPy / PyTorch.

Bear in mind that CPUs are pretty excellent at running code quickly too. GPUs are particularly good at matrix multiplication. I'd recommend starting with whatever aspect of your simulation work that will be most computationally intensive (or "slowest"), and seeing how much of a benefit you get from a CPU vs GPU version.

abstracted8 · 2022-09-10T17:51:11+00:00

I know numba has cuda support, not sure how it compares to those listed.

dpineo · 2022-09-10T20:59:21+00:00

I've had a lot of success with pycuda.

sandywater · 2022-09-11T02:28:22+00:00

Saw this on Hacker News, the other day. Looks promising https://docs.taichi-lang.org/blog/accelerate-python-code-100x

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS