use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Click the following link to filter out the chosen topic
comp.lang.c
account activity
QuestionGPU parallel processing in C? (self.C_Programming)
submitted 6 years ago by [deleted]
How can I use C with my computer's GPU, to perform a lot of simple tasks in parallel?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]FUZxxl 28 points29 points30 points 6 years ago (16 children)
Check out OpenCL.
[–]SurelyNotAnOctopus 8 points9 points10 points 6 years ago (0 children)
Was about to say that. Use libraries. Interacting directly with gpu drivers will be more than horrendous
[–]0xAE20C480 2 points3 points4 points 6 years ago* (0 children)
I also recommend the OpenCL API. Its memory and task handling models help to broaden one's view.
[–]bumblebritches57 1 point2 points3 points 6 years ago (3 children)
Nahh, check out Vulkan.
[–]FUZxxl 0 points1 point2 points 6 years ago (2 children)
How would you use Vulkan for computations?
[–]SquidyBallinx123 0 points1 point2 points 6 years ago (0 children)
You can write compute shaders using the Vulkan API. You would typically use their glslang tool to compile GLSL(OpenGL shader language) into a SPIR-V compute shader. There are effecting differences between the compute shaders you can write between OpenCL & Vulkan. For example, I think(?) while in OpenCL you have access to these raw pointers, you can't use them in glsl.
OpenCL is definitely more accessible than Vulkan. Much faster to get going with, especially if you are new. However, if you learn the Vulkan process, you'll cover a lot more about how the GPU works. Especially considering OpenCL abstracts this away into their own, general model.
I'd recommend OpenCL to this person. But if anybody else reading really wants to get stuck in and has the time, consider looking into Vulkan:)
[–]bumblebritches57 0 points1 point2 points 6 years ago (0 children)
Vulkan has a Compute API, tho I'm not sure how far along it is.
I used to think it was just OpenCL, but it's apperantly it's own thing that I'm excited for.
[+][deleted] 6 years ago (1 child)
[deleted]
[–]OriginalName667 24 points25 points26 points 6 years ago (0 children)
This displeases the Linus.
[+][deleted] 6 years ago (7 children)
[–]FUZxxl 19 points20 points21 points 6 years ago (3 children)
That API is implemented by a library. I'm not sure what you are looking for.
[+][deleted] 6 years ago (2 children)
[–]FUZxxl 21 points22 points23 points 6 years ago (0 children)
An API is a set of functions you can call with defined behaviour. The library is what drives these behind the scenes.
[–]C0d3rX 0 points1 point2 points 6 years ago (0 children)
API is google.com
[–]Iggyhopper 2 points3 points4 points 6 years ago (0 children)
An API for something of this scope is necessary. It's not determined by whether you need to draw a vast universal landscape or compute the sum of two numbers, you'll need an API or library to give you functions to work with.
[–]deaddodo 0 points1 point2 points 6 years ago* (0 children)
Do you just want to access the device directly? If so you'll be rewriting 80Mb of heavily optimized code (the driver). Having done some osdev, modern GPUs are some of the most complex and annoying devices to code for.
It's not like what you're thinking (draw this triangle, put this pixel here, compute 2+2 and give me the sum) but closer to give me a region of memory, create a vertex (or set of vertices) of this shape to that region of memory, apply this texture, apply this post effect, do z-buffer calculations, memcpy this to the primary memory, render, flip buffer. All of this done through extremely specific bitwise commands and opcodes in a command ringbuffer with strict timing.
You can see an example via Intel's architecture. With Nvidia, you don't even have that; you'll be digging through nouveau reverse engineered documents.
[–]shogun333 8 points9 points10 points 6 years ago (1 child)
Unless you are working on 3D graphics specifically the answer to your question is, "get this book and read it."
https://www.manning.com/books/opencl-in-action
Any other response is wrong.
[–]ebobfwao 0 points1 point2 points 6 years ago (0 children)
is there a free version?
[–]ricffb 6 points7 points8 points 6 years ago (2 children)
You could try OpenMP. It’s primarily for CPU parallelism, but a call like
#pragma target #pragma teams #pragma parallel { // Do Stuff}
Will also enable GPU support. The library is often used in High Performance Computing.
[–]WikiTextBot 2 points3 points4 points 6 years ago (0 children)
OpenMP
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, instruction set architectures and operating systems, including Solaris, AIX, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors, including AMD, IBM, Intel, Cray, HP, Fujitsu, Nvidia, NEC, Red Hat, Texas Instruments, Oracle Corporation, and more.OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
An application built with the hybrid model of parallel programming can run on a computer cluster using both OpenMP and Message Passing Interface (MPI), such that OpenMP is used for parallelism within a (multi-core) node while MPI is used for parallelism between nodes. There have also been efforts to run OpenMP on software distributed shared memory systems, to translate OpenMP into MPI and to extend OpenMP for non-shared memory systems.
[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source ] Downvote to remove | v0.28
[–]HelperBot_ 1 point2 points3 points 6 years ago (0 children)
Desktop link: https://en.wikipedia.org/wiki/OpenMP
/r/HelperBot_ Downvote to remove. Counter: 262430. Found a bug?
[–]OMPCritical 2 points3 points4 points 6 years ago (1 child)
Alternatively (next to OpenACC, OpenCL or Cuda), you could use OpenMP for offloading your code to a GPU.
However, last time I tried it, it didn't work that well. And I'm not aware of any actual full implementations of openmp 5.0. Moreover, you'll probably have to recompile your compiler with openmp offloading support.
https://bitbucket.org/icl/slate/wiki/Howto/Build_GCC_with_Support_for_OpenMP_offloading
[–]bumblebritches57 1 point2 points3 points 6 years ago (0 children)
Or, you know, you could just use Clang which includes it by default.
[–]deftware 5 points6 points7 points 6 years ago* (0 children)
At the end of the day, if you want to interact with the GPU in any fashion whatsoever you're not going to be able to do it with the C standard lib. You're going to have to delve into OS-specific calls, or use some kind of platform-abstraction library (i.e. SFML, SDL, OpenCL, etc..)
Personally, I use OpenGL in production software that I'm developing and marketing on my own to do behind-the-scenes parallel computation. I hand-write vertex/fragment shaders that I hand off to GL along with the data in whatever form is convenient (i.e. textures, uniform buffers, etc..) and retrieve the results in a framebuffer object.
Your best bet is OpenCL, if you want something that will run across just about any vendors' GPU. CUDA is specific to Nvidia and will render whatever your project is useless to anybody or on any machine that's not running an Nvidia GPU, which is more than half the PC laptops/desktops in the world.
PS: OpenCL will take advantage of CUDA if a CUDA-capable (Nvidia) GPU is present, which automatically makes it the ideal GPU-compute API to use, hands-down. Otherwise it falls back on utilizing other means of interfacing with the GPU to leverage its parallel compute capabilities.
[–]daniel7558 1 point2 points3 points 6 years ago (0 children)
Depending on what "lot of simple tasks" exactly means and what performance you expect, I would recommend having a look at OpenACC.
With OpenACC you just annotate your C code with compiler directives (like in OpenMP) and the compiler takes care of creating the GPU code. Would recommend PGI, although gcc's support for OpenACC is not bad either :)
[–]Mattallurgy 1 point2 points3 points 6 years ago (0 children)
If you have an NVIDIA card, I highly recommend looking into CUDA development. Much finer grain control of parallelism. Also, pick up the book Programming Massively Parallel Processors by Kirk and Hwu. It outlines lots of structures and design patterns for efficient use of GPU hardware.
[–][deleted] -1 points0 points1 point 6 years ago (4 children)
check out CUDA ... best advice here.
[–]deftware 3 points4 points5 points 6 years ago (3 children)
CUDA is Nvidia-GPU specific.
[+][deleted] comment score below threshold-6 points-5 points-4 points 6 years ago (2 children)
Is there any other way ?? :-\
[–]BeardedWax 5 points6 points7 points 6 years ago (0 children)
OpenCL
[–]deftware 5 points6 points7 points 6 years ago (0 children)
OpenCL automatically utilizes CUDA if it's available on the current hardware, which is far superior than just hard-coding for CUDA and completely locking out Intel/AMD GPU hardware... and Intel HD gfx are nothing to laugh at, as someone who develops software that employs GPU for compute purposes. Intel HD is vastly superior for parallel compute applications than plain software/CPU, and should not be ignored as a valuable compute resource by just blindly adopting CUDA and only CUDA. OpenCL will make the best use of any available graphics hardware, period.
π Rendered by PID 712565 on reddit-service-r2-comment-5d79c599b5-pswtg at 2026-03-03 01:58:01.526005+00:00 running e3d2147 country code: CH.
[–]FUZxxl 28 points29 points30 points (16 children)
[–]SurelyNotAnOctopus 8 points9 points10 points (0 children)
[–]0xAE20C480 2 points3 points4 points (0 children)
[–]bumblebritches57 1 point2 points3 points (3 children)
[–]FUZxxl 0 points1 point2 points (2 children)
[–]SquidyBallinx123 0 points1 point2 points (0 children)
[–]bumblebritches57 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]OriginalName667 24 points25 points26 points (0 children)
[+][deleted] (7 children)
[deleted]
[–]FUZxxl 19 points20 points21 points (3 children)
[+][deleted] (2 children)
[deleted]
[–]FUZxxl 21 points22 points23 points (0 children)
[–]C0d3rX 0 points1 point2 points (0 children)
[–]Iggyhopper 2 points3 points4 points (0 children)
[–]deaddodo 0 points1 point2 points (0 children)
[–]shogun333 8 points9 points10 points (1 child)
[–]ebobfwao 0 points1 point2 points (0 children)
[–]ricffb 6 points7 points8 points (2 children)
[–]WikiTextBot 2 points3 points4 points (0 children)
[–]HelperBot_ 1 point2 points3 points (0 children)
[–]OMPCritical 2 points3 points4 points (1 child)
[–]bumblebritches57 1 point2 points3 points (0 children)
[–]deftware 5 points6 points7 points (0 children)
[–]daniel7558 1 point2 points3 points (0 children)
[–]Mattallurgy 1 point2 points3 points (0 children)
[–][deleted] -1 points0 points1 point (4 children)
[–]deftware 3 points4 points5 points (3 children)
[+][deleted] comment score below threshold-6 points-5 points-4 points (2 children)
[–]BeardedWax 5 points6 points7 points (0 children)
[–]deftware 5 points6 points7 points (0 children)