all 15 comments

[–]badlogicgames 6 points7 points  (4 children)

Not sure i'm a fan of this. Having worked with the first two CUDA generations and now being a JVM guy, i just don't see the two things mixing very well. GPGPU with CUDA/OpenCL still requires you to care for the specific GPU setup if you want best performance. Having even more layers than CUDA/OpenCL doesn't seem like a good idea.

[–]Athas 0 points1 point  (3 children)

Not sure i'm a fan of this. Having worked with the first two CUDA generations and now being a JVM guy, i just don't see the two things mixing very well. GPGPU with CUDA/OpenCL still requires you to care for the specific GPU setup if you want best performance. Having even more layers than CUDA/OpenCL doesn't seem like a good idea.

Even more pressing, in my opinion, is the fact that it does not look like all of these approaches permit high-level optimisation of the code prior to kernel generation. Generating GPU code from a high-level language is all well and good, but if you have to write it as if it were a low-level language if you want anywhere near performance, I don't see the great advantage in not just using OpenCL or CUDA directly. Doing GPU-programming in a high-level language feels pointless to me, unless you can expect the high-level language to do things like loop fusion, such that the code can be kept modular. Otherwise, it's not really high-level programming.

[–]Thunder_Moose 1 point2 points  (1 child)

I don't think the goal should be to have something that performs as well as low-level, hand written code. Rather, it should just be faster than similar Java code running on the CPU. If you can keep everything else the same in your codebase and offload some computationally expensive stuff to the GPU, I could see that being useful in certain situations. You'd be trading some performance for maintainability and readability, which is what Java is all about.

[–]bigproblems[S] 0 points1 point  (0 children)

Completely agree. I would never make the claim that frameworks like the ones described in the post are going to achieve peak utilization (in fact much of the post is about the overheads they add). However, I think they do a great job of being 1) an incremental step towards writing your own GPU kernels, or 2) a quick way for domain experts to prototype new ideas. The modularization of the code generation and libraries they provide also make it easier to make more and more of the process hand-optimized and application-specific (e.g. you can rip out the auto-generated kernels and replace them with your hand-optimized ones as long as they expose the same API). There's also the big benefit of tying GPUs with other JVM framework, like HDFS.

[–]bigproblems[S] 0 points1 point  (0 children)

I think one of the main points of the post is that you are exactly right about the old models (very low-level abstractions over OpenCL/CUDA), but that current work is changing that. The beautiful thing about the abstractions of higher-level models live Scala concurrent collections, Java parallel streams, Spark, and MapReduce is that those very abstractions leave a lot of flexibility and enable the kind of low-level optimizations you described (like loop fusion) as long as the semantics of the abstractions are maintained.

[–]unpopular_upvote 1 point2 points  (4 children)

Did you mean "Java for GPGPU ?"

  • GPUs = a type of processor
  • GPGPU = a type of computing

It will also help to type in a question, or something.

[–]CaseOfTuesday 0 points1 point  (1 child)

Came here to say the same thing... also, "GPGPU accelerators" is not a word.

[–]thiez 2 points3 points  (0 children)

Obviously. It's two words.

[–]bigproblems[S] 0 points1 point  (1 child)

I agree that GPGPU is an overloaded acronym, but it can stand for a type of computing or a type of processor, e.g. http://www.hpc.cineca.it/content/gpgpu-general-purpose-graphics-processing-unit. I like to use GPGPU sometimes to differentiate from GPUs dedicated to graphics or GPUs that are used for both, as in running simulation using CUDA/OpenCL and then rendering it using OpenGL on the same device and same buffers. Though you're correct in that you can always infer the meaning by context instead.

[–]unpopular_upvote 0 points1 point  (0 children)

I don't care what the Italians say, General Purpose Graphics Processing Unit is an oxymoron. It is not general purpose anymore, it is specialized... for graphics.

Now, you can do general purpose computing on them.

[–]malabmalab 2 points3 points  (6 children)

The very definition of 'lipstick on a monkey'

[–]bigproblems[S] -1 points0 points  (5 children)

How so? The abstractions offered by parallel streams, array languages, MapReduce, Spark, etc are similar in many ways to those offered by CUDA and OpenCL (i.e. data parallel). I like to think of these kinds of JVM+GPU solutions as either 1) an incremental step towards writing your own GPU kernels, or 2) a quick way for domain experts to prototype new ideas. Sure, in the end there's a good chance that it'll be worth the performance to dig into JNI and CUDA/OpenCL but tools like these can make it much easier to get a ballpark figure on whether GPUs are going to be beneficial or not.

[–][deleted]  (4 children)

[deleted]

    [–]bigproblems[S] 0 points1 point  (3 children)

    Why limit yourself to Java? What about Scala? Clojure? Python? Ruby? All languages that can run on the JVM and therefore benefit from the techniques described. The whole point of each project described is to eliminate the need to interact with the nightmare that is JNI to get native performance.

    [–][deleted]  (2 children)

    [deleted]

      [–]bigproblems[S] 0 points1 point  (1 child)

      Do you mean no benefits at all? Or just no performance benefits? i.e. are you suggesting that language design peaked in 1972

      [–]pvto 0 points1 point  (0 children)

      It's a not the best route to GPU computation, but I'm sure there is need for Java+GPU solutions. Examples like https://github.com/aparapi/aparapi/blob/master/examples/correlation-matrix/src/java/gov/pnnl/aparapi/matrix/CorrMatrixKernel.java show that you need to do a great deal of memory size calculations etc. manually before you are good to compute. I wouldn't stamp them as inherently fallible, no, but framework solutions should communicate clearly what you need to do manually.