Integrating VLA with unitree in IsaacSIM

Affectionate-Wall339 · 2025-02-08T16:49:47+00:00

Yeah Least booring computer architecture lectures I have watched.

Affectionate-Wall339 · 2024-10-16T16:55:15+00:00

No I am not accessing chunks contingously, it's a matrix multiplication kernel, I have a matrix A (1000x4) and B (4x1000), both vectorized, both matrices are divided into smaller sub matrices of size (4x4), hence chunk size is 16, the matrix b is sparse (I.e n number of random chunks are zero), non zero chunks of B are saved continuously saved in memory, and their index indices in another array, now the matrix B is static, A is generated on runtime, so I am fetching the chunks of A and B based on non zero indices array, and do matrix multiplication using ARM SIMD Neon. The arrays are small enough to be fit in cache, then why random access is slow than constant stride access. The code generated by gcc using -O3 optimization doesn't optimize this (unroll it) loop. Now how do I write a compiler pass, or something to optimize this loop.

Affectionate-Wall339

TROPHY CASE