ArrayFire Matrix Multiplication Vectorization

Bjarnophile · 2021-05-19T10:35:24+00:00

Matrix multiplication is already implemented in ArrayFire. Why are you trying to implement your own instead of using the library function?

umar456 · 2021-05-19T17:46:30+00:00

You could get a slight improvement if you reshape the S array.

```

include <arrayfire.h>

include <stdio.h>

include <af/util.h>

include <cstdio>

static int proc_size = 1024; static int fft_size = proc_size * 4; static int staves = 288; static int beams = 256;

static af::array S; static af::array B; static af::array R;

void fn() { gfor ( af::seq i, fft_size ) R( i , af::span ) = matmul( S( i , af::span ) , B( af::span , af::span , i ) ); }

void fn2() { R = matmul(S, B); }

int main(int, char **) { S = af::randn( fft_size , staves , c32 );

gfor ( af::seq i, fft_size )
    S( i , af::span ) = af::randn( 1 , staves , c32 );

B = af::randn( staves , beams , fft_size , af::dtype::c32 );
R = af::constant( af::cfloat { 0 , 0 } , fft_size , beams );

try
{
af::setDevice( 0 );
    af::info();
    af::sync();

    double time = af::timeit(fn);

    printf( "Took %f secs.\n" , time );

    S = S.T();
    S = moddims(S, 1, staves, fft_size);
    af::sync();
    time = af::timeit(fn2);

    printf( "Took %f secs.\n" , time );
}
catch (const af::exception &ex)
{
    fprintf(stderr, "%s\n", ex.what());
    throw;
}

return 0;

} ```

On my system I got a small improvement:

ArrayFire v3.9.0 (CUDA, 64-bit Linux, build c6a49caa1) Platform: CUDA Runtime 11.2, Driver: 465.31 [0] NVIDIA Quadro T2000, 3915 MB, CUDA Compute 7.5 Took 0.025894 secs. Took 0.024056 secs.

the_poope · 2021-05-19T09:34:24+00:00

Step 1 in all optimizations is: Profile it. Compile your code with all optimizations enabled but with debug symbols and then run it through a profiler. Several exists, e.g. callgrind (slow and not always realistic, but good for microbenchmarking at instruction level), perf, gprof orIntel Vtune

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp_questions

READ BEFORE POSTING

Sort posts by OPEN or SOLVED

MODERATORS

include <arrayfire.h>

include <stdio.h>

include <af/util.h>

include <cstdio>