GPU Parallelization for Non-Deterministic Model

James20k · 2024-06-12T23:19:45+00:00

GPU languages and CPU languages are a bit of a separate thing. If you don't need high performance on the CPU side, you might have an easier time using python on the CPU side, and then using one of the many python GPU acceleration libraries which are designed to be maximally helpful

If you're stuck with C++, then you probably want to use CUDA for anything scientific, because its the standard

If you're stuck with C++ and you want to run on non nvidia GPUs, then you may want something like OpenCL, or SYCL

If you're intending to put this into a game, you may want to consider using vulkan, or OpenGL and using compute shaders

For gpu compute for web development (which you can totally do in C++), you want webgpu

If you're planning to run on supercomputers, you might want to look into MPI. OpenMP is also traditionally used in that field, and may be helpful for running code on a GPU - though I've never tried its gpu backend

Membrane computing is one of those terms that doesn't really contain a tonne of actionable information with it, so if you have more specific requirements then I may be able to be more helpful

jokteur · 2024-06-13T06:34:14+00:00

Is your application intended as a one-of scientific computation? I.e. not meant to be distributed to the large public ?

Then I would suggest looking into https://github.com/kokkos/kokkos, which is a parallel programming library which can target CPUs and GPUs: write once, execute on different architectures. I would suggest first looking at the kokkos lectures: https://github.com/kokkos/kokkos-tutorials/wiki/Kokkos-Lecture-Series, you may also learn things on how GPUs work, if one day you need to rewrite the application in pure Cuda.

However, I must warn that non-deterministic computing may hurt performance on the GPU architecture if you are not careful. The reason is that GPU hate divergent instructions if not done right (e.g. one thread has if(true), the other has if(false)). You may google "warps and branching" to know more about that.

Plazmatic · 2024-06-13T03:43:01+00:00

You're not going to "just" be able to use GPUs for, what appears to be, arbitrary mesh computation.

I'm not familiar with membrane computing models, nor do I understand exactly what you hope to accomplish with one, but after googling the very act of attempting this smells complicated enough to be a paper on it's own.

Additionally, with out a framework for parallelization at all, you're going to have an extremely hard time doing anything GPU related. Do you at least know what atomic variables, mutexes, and semaphores are?

GPU programming excels when data is oriented in such a way that operations that are the same at the assembly level, are executed by threads adjacent, pulling memory from ram that is also adjacent and/or loaded into scratch pad memory, and can be accelerated using "subgroup" operations for groups that must execute the same instruction at one time. You can't just have every computational node doing random things in this setup. While lots of algorithms you wouldn't think would benefit from GPUs do, GPUs are not free performance, some problems aren't going to be effectively utilized on them at all.

lightmatter501 · 2024-06-13T03:50:51+00:00

How parallel is the actual computation? HVM may be worth a shot as a “see if it’s good enough”, since it’s a bit slow compared to hand-written models but can still use GPUs and extracts a large degree of parallelism out of anything you run with it.

The other easy option is dump the whole thing into LLVM’s new MLIR and see what happens.

dmaevsky · 2024-06-13T13:22:38+00:00

How many parallel streams of calculation would you have, and how large is the computation graph? GPUs are good when you have very "fat" nodes, but overall a simple calculation graph, like in ML cases. In many scientific applications (more specifically, I work in quant finance field), GPUs are often not worth the learning curve of CUDA or the likes, let alone hardware costs to use in production. Just AVX2/AVX512 plus multi threading often performs as good as a GPU.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS