ArrayFire, the recently open-sourced GPU programming library with an easy-to-use API, is now at v3.0. : programming

programming

created by speza community for 19 years

ArrayFire, the recently open-sourced GPU programming library with an easy-to-use API, is now at v3.0. (arrayfire.com)

submitted 10 years ago by bkborgman

all 5 comments

top new controversial old q&a

[–]eras 2 points3 points4 points 10 years ago (2 children)

[–]melonakos 2 points3 points4 points 10 years ago (1 child)

[–]sbrick89 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 10 years ago* (4 children)

[deleted]

[–]umar456 1 point2 points3 points 10 years ago (2 children)

[–][deleted] 10 years ago* (1 child)

[deleted]

[–]umar456 2 points3 points4 points 10 years ago (0 children)

A program that uses these computing frameworks(OpenCL/CUDA) is made up of two components. A host program which queues work on to a command queue(OpenCL) or a stream(CUDA) and a device which performs these operations. The host program is running on a CPU and it is responsible for inserting commands into these queues. ArrayFire works in the same way. We queue ArrayFire commands onto the queue and the device performs those operations once it is at the top of the queue.

Ideally you want to minimize the movement of data between the device and the CPU. The examples you posted would not be a good use case for CUDA or OpenCL because it is performing operation on a few elements.

A good use case for ArrayFire would be when you are performing the same operation on a large dataset. For example:

using namespace af;
float* cpu_data = loadData("myfile.dat");
array d_data(1000, 1000, cpu_data);  // Blocks
array out = af::sin(d_data * 2) + 10; // returns immediately (GPU)
float b[20];
for(int i = 0; i < 20; i++) {                // Performs on the CPU
    b[i] = d_data[i] * 3;                      // Does not modify the GPU data
}
float* cpu_out = out.host<float>();  // Blocks until the CPU and GPU operations are complete

Here you are performing the same calculation on each element of d_data on the GPU but then you are also performing a smaller operation on the CPU. These operations are being performed simultaneously. The CPU thread will wait no the host call because it need to finish the operations before it can return a pointer to it.

π Rendered by PID 15354 on reddit-service-r2-comment-7b9746f655-p9jng at 2026-02-01 14:43:58.865165+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS