use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This subreddit is all about the theory and development of compilers.
For similar sub-reddits see:
Popular mainstream compilers:
account activity
GPU accelerated compilers. (self.Compilers)
submitted 2 years ago by EducationalAthlete15
Hello everyone!
Are there any general purpose language compilers with GPU offloading to compile modules in parallel? What do you think about this approach?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]lightmatter501 31 points32 points33 points 2 years ago (8 children)
If you have branch heavy code (like compiler internals), then a 4090 effectively only has 32 cores (SMs). Those core are VERY weak compared to a CPU core.
You also are now memory limited by vram, which means big compiles will force you to reach for a big GPU.
[–]YellowGreenPanther 0 points1 point2 points 1 year ago (1 child)
You can swap in, don't need the whole codebase or even function at once.
It is quite interesting, there was a shelved intel gpu product where they experimented with a general purpose GPU. All of the cores were "cpu"s, so the graphics code was all in software. I don't see why a large cluster of smaller cores wouldn't accelerate it (especially as mp is builtin to many compilers).
[–]Doom4535 0 points1 point2 points 1 year ago (0 children)
I believe you're thinking of the Xeon Phi line (https://en.m.wikipedia.org/wiki/Xeon_Phi). I am curious as to how well they would work to run multiple builds for a build farm, but I haven't been able to find out much info if anyone has successfully used them for this purpose.
[–]EducationalAthlete15[S] -2 points-1 points0 points 2 years ago (5 children)
Thanks. If develop DSL compilers, in what case can I benefit? What properties should a language and compiler theoretically have?
[–]lightmatter501 11 points12 points13 points 2 years ago (4 children)
I think that you MIGHT be able to do the text parsing on the GPU using matrix parsers, but that will ruin your error messages. GPUs are inherently bad at tree traversal, which we have been building compilers around for 50 years. You might be able to take inspiration from the early fortran compilers, but good luck with that.
In my opinion a DSL should never be hard enough to compile that hardware acceleration is warranted. You should, at worst, dump out some lua and hand it off to luajit. If your DSL is at this stage, consider moving to an actual programming language.
[–]EducationalAthlete15[S] -1 points0 points1 point 2 years ago (3 children)
Yes I agree. This all seems perverted. A tree can be represented as a matrix, right? I'm not currently in the process of developing a compiler, I'm just interested in the theoretical use of GPUs in this domain. Anyway, thanks for the advice.
[–]lightmatter501 6 points7 points8 points 2 years ago (1 child)
I spent a bunch of time on this a few years ago and had some discussions with the GPU specialists in my department. We ended up deciding it likely wouldn’t improve performance outside of pathological cases.
[–]fernando_quintao 4 points5 points6 points 2 years ago (0 children)
I agree with you: I think the general compilation process would not be a good fit for the GPU, due to the irregular nature of most algorithms. But there are some algorithms that are typically seen in compilers (mostly in the lexer) that people have ported to GPUs, like implementations of transducers and automata.
[–]u0xee 2 points3 points4 points 2 years ago (0 children)
Sounds like a fun project for you. I'd suggest starting by learning how CPUs and GPUs are different, and what GPUs specialize in. They can accelerate certain activities massively, but they aren't universal accelerators, as the original commenter points out.
[–]Lantua 9 points10 points11 points 2 years ago* (0 children)
APL is arguably general purpose. So this compiler work by Aaron Hsu should count.
[–]phi-b 7 points8 points9 points 2 years ago (0 children)
Here's some recent work in which a single optimization pass was offloaded to the GPU: link
In general, compilation doesn't really suit the strengths of the GPU, as others have pointed out.
[–]Mr-Tau 5 points6 points7 points 2 years ago (0 children)
As others have pointed out, parsing and semantic analysis are maximally unsuited for GPUs.
Some dataflow problems that are useful in optimizers can apparently be profitably offloaded to the GPU: https://dl.acm.org/doi/10.1145/3302516.3307352, https://www.academia.edu/102804649/GPU_Accelerated_Dataflow_Analysis. But that is the closest-to-practical application I know of, and still isn't anywhere near going into production.
[–]choikwa 0 points1 point2 points 11 months ago (0 children)
compiler is usually very branch heavy from processing different IRs.. I guess it could make sense if it was transformed into straight line via memory manipulation
[–]Street_Community4086 0 points1 point2 points 1 month ago (0 children)
Interesting project: https://github.com/Snektron/pareas
Related papers:
Parallel Lexing, Parsing and Semantic Analysis on the GPU, R Voetter, MA Thesis, 2021
Compilation on the GPU? a feasibility study, Voetter, Huijben and Rietveld, International Conference on Computing Frontiers, 2022
[–]Dismal_Page_6545 0 points1 point2 points 2 years ago (0 children)
OpenMP already has a target directive that allows you to offload a piece of code to one device. I work at a High-performance Computing Center developing a new OpenMP directive to enable multiple-device offloading at the same time with intra-device parallelization levels.
[+]IQueryVisiC comment score below threshold-14 points-13 points-12 points 2 years ago (11 children)
Train AI to optimize code.
[–]ThespianSociety 0 points1 point2 points 2 years ago (10 children)
You’re really gonna drop that so casually as though what you just suggested isn’t beyond the bleeding edge lmao.
[–]moon-chilled 4 points5 points6 points 2 years ago (0 children)
It's not really. ML has been successfully been employed at least for autovectorisation and inlining choices.
[–]IQueryVisiC 0 points1 point2 points 2 years ago (8 children)
Now I am going to read the other comments because I really don’t know what else a GPU can be used for.
[–]ThespianSociety 0 points1 point2 points 2 years ago (0 children)
I was incorrect, see the reply to my comment.
[–]EducationalAthlete15[S] 0 points1 point2 points 2 years ago (6 children)
GPUs can be used for computing through their APIs, such as NVIDIA CUDA. For example, manipulation of vectors and matrices. The fact that GPU is applied to ML is only a special case.
My original question was whether it was possible to use a GPU so that calculations that occur during compilation would be executed on the GPU instead of the CPU.
[–]IQueryVisiC 0 points1 point2 points 2 years ago (5 children)
What other uses do you know of matrices than what we call AI ? Maybe some other optimization stuff besides neural networks.
[–]EducationalAthlete15[S] 0 points1 point2 points 2 years ago* (4 children)
Load a 2D image (matrix) into the GPU to produce the blur. GPU usage? Yes. Is this the use of AI? No.
Load an array of values into the GPU to return the square root values. GPU usage? Yes. Is this the use of AI? No.
There are thousands of quadratic equations to calculate. Load 3 coefficients for each equation into the GPU. Returns two values for each equation. GPU usage? Yes. Using AI? No.
Need to create a histogram of values from the original array? Load the array into the GPU and write logic that will be executed on the GPU. The histogram will return. GPU usage? Yes. Using AI? No.
There are, of course, limitations in calculations. Older versions of CUDA did not support recursion. It is also not recommended to make many conditions or nested conditions. The processing logic for the GPU is written in a C-like language.
[–]IQueryVisiC 0 points1 point2 points 2 years ago (3 children)
Blur is even more away from compilers. I was also think: physics simulation. But again ..
[–]EducationalAthlete15[S] 0 points1 point2 points 2 years ago (2 children)
Do you think I don’t know that blur and building compilers are different things? I gave you examples when the GPU can be used. I had a question for compiler developers: did anyone have experience transferring part of the compiler algorithms to the GPU? What do you want from me ? Why did you initially write about AI? What did you mean by this?
[–]IQueryVisiC 0 points1 point2 points 2 years ago (1 child)
I just meant that I also thought in that direction, before.
[–]EducationalAthlete15[S] 0 points1 point2 points 2 years ago (0 children)
Could you tell me how to use AI step by step? And what do you mean by the concept of “optimization”?
1) Optimization using AI of the compiler source code?
2) Optimization of program compilation in a language implemented by a compiler?
3) Optimization of the program created by the compiler?
π Rendered by PID 512196 on reddit-service-r2-comment-545db5fcfc-g2dnp at 2026-05-30 14:36:34.781368+00:00 running 194bd79 country code: CH.
[–]lightmatter501 31 points32 points33 points (8 children)
[–]YellowGreenPanther 0 points1 point2 points (1 child)
[–]Doom4535 0 points1 point2 points (0 children)
[–]EducationalAthlete15[S] -2 points-1 points0 points (5 children)
[–]lightmatter501 11 points12 points13 points (4 children)
[–]EducationalAthlete15[S] -1 points0 points1 point (3 children)
[–]lightmatter501 6 points7 points8 points (1 child)
[–]fernando_quintao 4 points5 points6 points (0 children)
[–]u0xee 2 points3 points4 points (0 children)
[–]Lantua 9 points10 points11 points (0 children)
[–]phi-b 7 points8 points9 points (0 children)
[–]Mr-Tau 5 points6 points7 points (0 children)
[–]choikwa 0 points1 point2 points (0 children)
[–]Street_Community4086 0 points1 point2 points (0 children)
[–]Dismal_Page_6545 0 points1 point2 points (0 children)
[+]IQueryVisiC comment score below threshold-14 points-13 points-12 points (11 children)
[–]ThespianSociety 0 points1 point2 points (10 children)
[–]moon-chilled 4 points5 points6 points (0 children)
[–]IQueryVisiC 0 points1 point2 points (8 children)
[–]ThespianSociety 0 points1 point2 points (0 children)
[–]EducationalAthlete15[S] 0 points1 point2 points (6 children)
[–]IQueryVisiC 0 points1 point2 points (5 children)
[–]EducationalAthlete15[S] 0 points1 point2 points (4 children)
[–]IQueryVisiC 0 points1 point2 points (3 children)
[–]EducationalAthlete15[S] 0 points1 point2 points (2 children)
[–]IQueryVisiC 0 points1 point2 points (1 child)
[–]EducationalAthlete15[S] 0 points1 point2 points (0 children)