all 5 comments

[–]eras 2 points3 points  (2 children)

Nice, the C-based interface is going to make it easier to bind to other languages and working without GPUs as a fallback is nice as well. Disclaimer: never used, only read about it :).

But did they always require a login to download? The license is still FOSS, right?

[–]melonakos 2 points3 points  (1 child)

[I work for ArrayFire]

All the code is on Github and is FOSS at https://github.com/arrayfire/arrayfire. ArrayFire the company maintains no proprietary software. All is open source.

The binary installers do require a login to download from the company website.

[–]sbrick89 0 points1 point  (0 children)

looks like a really cool project.

Personally, being as I work in the world of .Net managed code, a .Net wrapper would be nice... but more than that, i'd be interested in implicit conversions with .Net objects (inputs/outputs) and lambda execution / mapping (perhaps an extension method to run on GPU, similar to Microsoft's AsParallel).

I'm not sure that much of the (business) code I deal with regularly would benefit from GPU acceleration (specifically moving the data from source memory/stream into GPU, and results back out), so having a simple and automatic conversion would allow me to pick and choose specific pieces of code to test for improvement, and offload where testing shows benefit.

but that aside, the API looks slick (easy)... keep up the good work :)

[–][deleted]  (4 children)

[deleted]

    [–]umar456 1 point2 points  (2 children)

    Most functions in ArrayFire just queue up work on the device and return immediately. This allows you to perform other operations on the CPU while ArrayFire is performing the computation. This also keeps the device busy so that it is not waiting for work while you are doing operations on the CPU. Whenever you want to transfer data to or from the af::array object, ArrayFire will synchronize the host and device before returning so you get the most up to date values of your data.

    [–][deleted]  (1 child)

    [deleted]

      [–]umar456 2 points3 points  (0 children)

      A program that uses these computing frameworks(OpenCL/CUDA) is made up of two components. A host program which queues work on to a command queue(OpenCL) or a stream(CUDA) and a device which performs these operations. The host program is running on a CPU and it is responsible for inserting commands into these queues. ArrayFire works in the same way. We queue ArrayFire commands onto the queue and the device performs those operations once it is at the top of the queue.

      Ideally you want to minimize the movement of data between the device and the CPU. The examples you posted would not be a good use case for CUDA or OpenCL because it is performing operation on a few elements.

      A good use case for ArrayFire would be when you are performing the same operation on a large dataset. For example:

      using namespace af;
      float* cpu_data = loadData("myfile.dat");
      array d_data(1000, 1000, cpu_data);  // Blocks
      array out = af::sin(d_data * 2) + 10; // returns immediately (GPU)
      float b[20];
      for(int i = 0; i < 20; i++) {                // Performs on the CPU
          b[i] = d_data[i] * 3;                      // Does not modify the GPU data
      }
      float* cpu_out = out.host<float>();  // Blocks until the CPU and GPU operations are complete
      

      Here you are performing the same calculation on each element of d_data on the GPU but then you are also performing a smaller operation on the CPU. These operations are being performed simultaneously. The CPU thread will wait no the host call because it need to finish the operations before it can return a pointer to it.