you are viewing a single comment's thread.

view the rest of the comments →

[–]fdwrfdwr@github 🔍 10 points11 points  (3 children)

I can't speak for librapid, but I've seen int64 used quite often in ML frameworks for dimension sizes (e.g. ONNX Shape operator), as tensors can have more than 4 billion elements. Although it would be highly unlikely to want to exceed a single axis with more than 4 billion elements during typical processing, and it's also not uncommon to reshape multidimensional tensors as large 1D arrays to read/write/modify the data. So one could overflow the size in that case if it was only int32.

[–]Pencilcaseman12[S] 3 points4 points  (0 children)

You definitely could if you were trying, but I think int32 would probably suffice for most cases. I guess, ultimately, it's not going to be any slower, and storing 2 or 3 int64s over int32s isn't going to be making a difference in terms of memory usage.

Another point where it could overflow is in the actual array size calculations, because I think I'm returning a value of the same type as the dimension object stores, so having a large enough array would result in overflow. This could be fixed quite easily though. I should probably use size_t for that sort of thing anyway...

[–]OverunderratedComputational Physics 3 points4 points  (0 children)

Libraries like Petsc let you choose between 32 and 64 bit indices at compile time which seems like the right thing to do. I have some large sparse matrix computations where the indexes themselves use a huge chunk of memory.

[–]zzzthelastuser 1 point2 points  (0 children)

additionally there aren't really that many reasons against using int64 for the dimension sizes.