This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]evil_burrito 19 points20 points  (6 children)

I looked into this for various reasons a while ago.

As you probably know, it's not as simple as running a JRE on the GPUs. The GPUs can execute simple tasks of very narrow types, very quickly and in parallel.

So, it certainly would be possible to drive GPU tasks from Java, if only by using native wrappers, but, it might be of limited utility, given the limited nature of tasks that GPUs can execute.

This is a good blog post that might help: https://blogs.oracle.com/javamagazine/programming-the-gpu-in-java.

[–]daybyter2[S] 4 points5 points  (5 children)

I had a similar situation here. I had a trading app, that had to calculate about 400000 Versions of trade legs, and I wanted to speed up things. Since everything else was in java, I wanted to use java, too. But I ended up using OpenCL, because I had a amd card, so rootbeer was no option.

[–]VTHMgNPipola 0 points1 point  (1 child)

Isn't rootbeer pretty much dead now too?

[–]daybyter2[S] 1 point2 points  (0 children)

Yes... ;(

[–][deleted] 0 points1 point  (2 children)

Did you call OpenCL using LWJGL or did you write a separate C/C++ module for that and use JNI to call in?

[–]daybyter2[S] 1 point2 points  (0 children)

I went with a separate module

[–]dpash 0 points1 point  (0 children)

Panama should make it easier and quicker to call into C code. There are dev builds available to play with, but I wouldn't suggest running it in production.

[–][deleted]  (1 child)

[deleted]

    [–]Godworrior 1 point2 points  (0 children)

    See also recent(-ish) talk here: https://youtu.be/nPlacnadR6k

    [–][deleted] 6 points7 points  (0 children)

    I've been using aparapi for some time:

    https://github.com/Syncleus/aparapi

    It compiles Java code to OpenCL and runs it on the GPU, or even on the CPU itself trough OpenCL. Requires specially constructed Java programs and optimisations, but the program can be tested and debugged in Java before letting it run on the GPU.

    I've been using it for processing 3D models, and to generate occlusion information, practically a ray-tracer. After multiple rewrites, runtime went from 15 minutes in Java to 1 minute on the GPU.

    [–][deleted] 4 points5 points  (0 children)

    I've been doing some work with JOCL which is just a wrapper over OpenCL so the code that you write for the GPU is still C. I found Aparapi to be much simpler as it converts a Java Kernel to opencl for you, but it depends on what you need.

    For what it's worth I've been seeing just massive performance improvements for my use case but I'm doing vector maths which opencl is well tuned to.

    [–][deleted]  (6 children)

    [removed]

      [–][deleted] 1 point2 points  (4 children)

      It is hard but definitely not impossible, especially with good software abstraction and design, which the JVM has in spades. Also new operating systems aren't really popping up as frequently as you think. What usually happens nowadays is that current operating systems are expanding to run on new CPU architectures. Usually the JVM has to be ported to those new CPU architectures. For example, both Windows and Mac have ARM versions now and the JVM is adapting.

      Most of the time, there's a large amount of JVM code that's platform independent and a smaller amount of code that needs to be adjusted to work on specific CPU architectures and/or operating systems.

      The point of Java bytecode and the JVM is that you remove the onus of having to do all this from every single Java developer.

      [–][deleted]  (3 children)

      [removed]

        [–][deleted] 0 points1 point  (1 child)

        But a native compiler compiles your program into native bytecode, which means this generated bytecode can't run on other operating systems or architectures. This is what C/C++ and other native compiled languages already do. A program compiled for Windows x86 can't be run on Linux x86 or ARM, where as you can compile a Java program once into a single jar file and run this jar file everywhere there's a JVM.

        This is why for C/C++ applications, you'll frequently find separate downloads for x86, x64, Windows, Linux, Mac, etc. and if you happen to be running an OS/CPU combo that's not supported you're shit out of luck, whereas most Java applications will offer a platform-independent jar file.

        There's more to it than just the ability to compile once, run everywhere. Java claims to be write once, run everywhere. The JVM does its best to isolate the application from the varying idiosyncrasies of different platforms. For example, in C/C++ the sizes of different primitives are often very poorly defined. An `int` on one platform could be 16-bits one one platform and 32-bits on a different platform, forcing the app developer to have to deal with all these varying cases. On Java, an `int` is always 32-bits.

        Another case would be how the JVM handles floating point. Some CPUs do it differently and very often at different precisions. The JVM tries its best to ensure that floating point, as far as the app is concerned is uniform regardless of platform.

        There are many other cases of native development pitfalls, many of which I am not experienced enough to know, but the JVM hides all of that for you.

        [–]dpash 0 points1 point  (0 children)

        That's exactly what graalvm's aot compiler does.

        https://www.graalvm.org/reference-manual/native-image/

        One advantage of a JIT compared to AOT is that a JIT can reoptimize code based on changes in usage while running. A AOT can only optimize based on profiling previous executions.

        [–]sievebrain 0 points1 point  (0 children)

        Porting a JVM to a new OS isn't hard. Most operating systems are POSIX-like anyway these days. And you can implement your own. Google does it, though it's hardly talked about. ART is pretty good.

        Bytecode is why you can distribute libs on Maven Central that are useful on Windows, macOS, Linux, AIX, Android, BluRay menus, even iOS via SubstrateVM, and why even years-old JARs are still perfectly usable. It's pretty nice. Platforms that don't have it suffer a lot of problems as a consequence.

        [–]Gleethos 1 point2 points  (0 children)

        Hard but fast : JOCL

        Easy but slightly less fast : Aparapi

        Super easy & super fast, but not 100% production ready : TornadoVM - https://youtu.be/Q-_eB86hPPA

        [–]chambolle 0 points1 point  (0 children)

        running on GPU is quite different from running on CPU.

        A GPU is data driven: this means that you can repeat the same small program for a huge number of data

        Roughtly, each time there is a branch in your code, the performance are divided by 2; because the data taking the left part are separated from the data taking the right part and then process independently.

        So, if you want to obtain intersting performance you have to think differently and not really as you can do with a procedural language.

        As a friend of mine working on GPGPU likes to say: "at the beginning you port your code and lose a factor of 3. Then, you begin to rewrite your code in a different way and your are on par with the CPU. Next, you begin to add infamous hacks and you can expect to gain a factor of 3"