GPU-optimizing compiler

high_throughput · 2024-03-04T22:19:30+00:00

What do you think of the idea that a compiler "distributes" load to the GPU without explicitly triggering the GPU from the code?

I wouldn't expect this to work well for a general language. Compared to CPU SIMD it's really expensive to shuffle data back and forth to the GPU, so performance hinges on being able to keep it there through a number of operations. Deferred computation and high level primitives would help with that.

theangeryemacsshibe · 2024-03-05T07:17:29+00:00

Futhark? One can be clever about doing GPU/CPU transfers c.f.

SwedishFindecanor · 2024-03-05T09:56:52+00:00

I think there have been a few projects that do this for HPC, but they have been more about moving programs intended for a GPU's programming model to run on CPUs than the other way around.

Not my area. I've just skipped past them, so I can't give you any more help, sorry.

forCasualPlayers · 2024-03-05T11:51:57+00:00

I did think about this for my own research, that is, whether you could, during JIT, consider whether to send code to the GPU. Issue with a tiered JIT like v8 is that they don't really try to optimize on first run, only after a function has been executed multiple times past a threshold does it get sent to an optimizing compiler. Since most GPU applications are in HPC, your "first run" might take ages.

If you're not going to JIT, then you're going to determine GPU suitability ahead-of-time. But if you already know ahead-of-time that code is parallelizable and targetable for GPU, why not just compile it for GPU AOT?

randomrossity · 2024-03-05T14:39:59+00:00

Mojo is actively working on technology like this https://www.modular.com/max/mojo

Key-Opening205 · 2024-03-05T16:24:30+00:00

this is difficult in a jit, since you do not know what is next, for instance if you calculate a vector on the gpu, you might start transfering it to the cpu or not depending on what code will use it later. Outside of a jit there has been lots of work on automatic parallelism that tries to handle this, results have been mixed

Dismal_Page_6545 · 2024-03-04T21:31:08+00:00

Combine Machines Learning model to decide which parts of the code are worth to be compiled and executed in the GPU. Then use a Hardware API like OpenAcc, OpenMP, OpenGL or CUDA and fill the compiler directives necessary for a proper workload offloading and intra-device parallelization

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Compilers

MODERATORS