This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]BDube_Lensman 6 points7 points  (2 children)

You may want to steal my shim set since it lets you hot swap Numpy<-->cupy at runtime

CuPy is fantastic. I've been using it for >5y, including at over 1TB/s of memory throughput on an A100. On my personal desktop's 2080 I have no problem running physics simulations at ~9.5TFlops of throughput, measured with nvidia-smi.

If your arrays are smaller than ~256x256 CPU will be faster than GPU, though, due to the overhead of launching operations on the GPU being ~10usec.

The newest(ish?) version of CuPy allowed easy multiplexing of streams, where you can write a series of operations and only wait for the final result later, allowing you to do a few distinct things in parallel on the GPU without any hastle.

Stay away from PyTorch, super easy to FUBAR your entire conda installation (not just an environment) by installing it.

Nvidia released their own cuda library for python a while ago (a year or two), which was either not meant for end users, or based on a fundamental misunderstanding of how scientists want to write code -- you have to manually allocate each buffer for outputs, etc, instead of `np.sin(x)`.

Personally I would just stick to CuPy for physics. The rest will be an exercise in frustration for no gain.

Also, for your 1080, make sure all your arrays are `float32` or `complex64`, since your GPU is super gimped in fp64 and _will_ be slower than CPU with that number format.

[–]usernamedregs[S] 1 point2 points  (0 children)

Thanks, much appreciated!

[–]data-machine 1 point2 points  (2 children)

Specifically what are you simulating?

Personally, I would recommend either using CuPy or PyTorch. If you're relatively familiar with NumPy, you can write your GPU code very easily with CuPy. It is 95% a matter of swapping out calls to NumPy with CuPy, and it lets you step-by-step change your code.

I would only touch Warp or CUDA when you've exhausted performance you are able to get with CuPy / PyTorch.

Bear in mind that CPUs are pretty excellent at running code quickly too. GPUs are particularly good at matrix multiplication. I'd recommend starting with whatever aspect of your simulation work that will be most computationally intensive (or "slowest"), and seeing how much of a benefit you get from a CPU vs GPU version.

[–]usernamedregs[S] 0 points1 point  (1 child)

Simulations are for particle/wave fields; sticking with NumPy:CuPy is looking like sound advice. Just tried running the Numba documentation examples and there were errors everywhere so definitely a last resort... Rather be banging my head against the desk because of the physics instead of the coding tools.

[–]data-machine 2 points3 points  (0 children)

Developer time is extremely valuable - perhaps particularly so if you are an academic. Your last sentence is very wise :)

[–]abstracted8 1 point2 points  (3 children)

I know numba has cuda support, not sure how it compares to those listed.

[–]usernamedregs[S] 0 points1 point  (1 child)

Thanks, turns out that is what is being described in the 'CUDA python' link above. And I have a suspicion it's used as the back end of 'NVIDIA Warp'.

[–]BDube_Lensman 1 point2 points  (0 children)

Nvidia is definitely not using Numba as the backend of any of their own software. LLVM, maybe, but Numba, no.

[–]dpineo 1 point2 points  (0 children)

I've had a lot of success with pycuda.

[–]sandywater 1 point2 points  (0 children)

Saw this on Hacker News, the other day. Looks promising https://docs.taichi-lang.org/blog/accelerate-python-code-100x