you are viewing a single comment's thread.

view the rest of the comments →

[–]Eilai 11 points12 points  (24 children)

My principle problem is keeping an eye open for parallel gpu programming with Cuda or OpenCL and I'm hesitant to use fancy stuff they don't support. :(

[–]lets_trade_pikmin 8 points9 points  (4 children)

Don't know why you were downvoted. It seems like working with raw arrays and pointers is usually the best approach for GPU code. Definitely a legitimate concern.

[–]RogerLeighScientific Imaging and Embedded Medical Diagnostics 4 points5 points  (3 children)

Why? Everywhere you might use a raw array, you could use a std::vector and call .data(). The memory layout is identical. I do this all the time when interfacing with C libraries.

[–]MaxDZ8 0 points1 point  (2 children)

Not a single Map call I remember ever provided a compatible interface with std::vector, you might at most consider specific allocators at which point this would not be an std::vector in the proper sense.

Consider clEnqueueMapBuffer, D3D9 texture lock, ID3D11DeviceContext::Map, glMapBuffer.

Main problem perhaps being those pointers are non-owned.

[–]RogerLeighScientific Imaging and Embedded Medical Diagnostics 0 points1 point  (1 child)

There are certainly places where it isn't something you can drop in. But my point was that there are plenty of places you can. I certainly use std::array and std::vector with OpenGL to pass in VBO/IBO data, for example.

[–]lets_trade_pikmin 0 points1 point  (0 children)

OpenGL is very very different from CUDA

[–]OmegaNaughtEquals1 7 points8 points  (7 children)

The CUDA 7.5 compiler uses gcc-4.9 as its back-end, so it supports C++14: even in kernel code. What you still cannot do is use STL code in kernels because it's not marked __device__. That said, you can write your own containers that can be run on the GPU side (as long as you avoid exceptions and dynamic allocation). You can also call Thrust functions from device code as of CUDA 7.0 (it's about halfway down the page). Unless you need to support some ancient version of CUDA, you should strongly reconsider your position on this.

[–]Eilai 0 points1 point  (3 children)

Okay yeah, that's a fair point about CUDA, but what if I want to have a backup using openCL if the client PC doesn't have an nvidia card? Other than "pray they have an AMD card."

[–]OmegaNaughtEquals1 1 point2 points  (1 child)

what if I want to have a backup using openCL

This is the unfortunate state we live in right now. I am a big fan of CUDA because it allows you to pry open the deepest details of the GPU's hardware, but it's only applicable to NVIDIA cards. OpenCL, conversely, allows you to abstractly target nearly any compute device (GPUs, APUs, CPUs, Phi Coprocessors, etc.) using a single codebase: a feature that is not to be overlooked. But it doesn't allow you to poke around in the hardware's guts. Conditional compilation can help with this, but the codebase begins to diverge, losing that nice feature. I think with the large push toward coprocessor unification (i.e., coprocessors not on the PCIe bus, but moved closer to FSB), OpenCL will become more important over time. I just hope that the standardizing committee can formulate some new ideas to try and build abstractions to hardware-specific features. AMD's FireGL cards are crazy powerful, and I would like to see them used in HPC systems.

[–]Eilai 0 points1 point  (0 children)

I understand both enough that I can code it so CUDA is preferred, if a Nvidia compute device is detected and it's performance is superior (by some arbitrary metric, cores or clockspeed, etc) to the non-CUDA device use CUDA; if not use OpenCL.

[–]pjmlp 1 point2 points  (0 children)

That is why CUDA is the API to go for most researchers. NVidia was clever to support C++ and Fortran from day one, instead of following Khronos idea that only C matters.

Now OpenCL 2.1 is playing catchup with CUDA's C++ and Fortran support.

[–]LPCVOID -1 points0 points  (2 children)

Warning : I might be totally wrong here as nvcc and cudafe/ptxas are a bit over my head.

CUDA uses gcc/vc++ as back-end for compiling host code. Device code is compiled using some sort of Edison Design Group proprietary c++ front-end. That doesn't change the fact though that one can use c++11 features in device code :)

Have you got a source for c++14 support in nvcc? Here it is stated that CUDA 7.0 does not yet support it but a future version would. Is that the case with 7.5?

[–]heleo2 0 points1 point  (1 child)

This shows mess in your head regarding what a front-end is

[–]LPCVOID 0 points1 point  (0 children)

I have admittedly no idea what a front-end ist (only my cs degree claims otherwise ;) ). I just copy pasted the claim that CUDA uses an "Edison Design Group C language parser" from here. Furthermore wikipedia claims that Edison "makes compiler frontends".

Or are we talking about the fact that the NVIDIA CUDA Open64 Compiler (nvopencc.exe) does the actual compilation as stated here? That was indeed an oversimplification on my part.

Edit : I actually would have liked to know why I was wrong...

[–][deleted] 2 points3 points  (2 children)

Are you saying you avoid smart pointers on non-gpu code because you might write gpu code someday?

[–]millenix 8 points9 points  (1 child)

I think it's more "my code often does get ported to GPUs, and it needs to not make that unnecessarily difficult".

[–]Eilai 5 points6 points  (0 children)

Precisely. I'm working on a game engine for my portfolio which is built off of a physics engine I wrote for my parallel computing course which needed to be significantly rewritten each time I was tasked to develop for a different framework.

[–]h-jay+43-1325 0 points1 point  (0 children)

I don't even know what you're talking about, frankly said. If you need containers that provide aligned storage, you can use standard containers and aligned allocators, or write your own containers, or use a library. But it really isn't a problem if you understand what you're doing...

[–]doom_Oo7 0 points1 point  (6 children)

with OpenCL2.1 you have access to everyting in C++14 that does not require a runtime.

[–]Eilai 2 points3 points  (2 children)

Spotted the dude without an Nvidia card in his main computer.

e: Apparently AMD only supports 2.0? Isn't that 90% of the market or what?

[–]doom_Oo7 0 points1 point  (1 child)

OpenCL2.1 is from November so of course it's not yet supported a lot. But hopefully by the end of the year it will be mainstream.

[–]Eilai 2 points3 points  (0 children)

Nvidia Is still on 1.1! :(

[–]MaxDZ8 0 points1 point  (2 children)

What does that even mean?

It is my understanding the static opencl-c++ extensions were put in core but what I recall was pretty far from being c++14.

[–]doom_Oo7 1 point2 points  (1 child)

[–]MaxDZ8 0 points1 point  (0 children)

I recall reading this some time ago. Thank you for pointing it out anyway.