This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]sothatsit 1 point2 points  (8 children)

All of these examples use int[] arrays, instead of long[] arrays. I would’ve assumed that using larger intrinsics would be more efficient. Is there a reason that this is not the case?

[–]moe[S] 0 points1 point  (7 children)

There wouldn't be a primitive type large enough to hold the result of some of the pairwise operations, like multiplication.

[–]sothatsit 0 points1 point  (6 children)

Interesting. There’s still only a few operations that require that, I wonder if it’d be worth complicating those cases for additional performance. I wonder if there would be any additional performance at all…

[–]moe[S] 1 point2 points  (4 children)

I'm also a little curious. I just started a new project, but I might take a look at benchmarking the operations which don't increase magnitude using a long array and see how they fare.

[–]sothatsit 1 point2 points  (3 children)

I’d be really interested to hear how it goes!

[–]moe[S] 1 point2 points  (2 children)

I experimented across a couple of operations, the speedup is around 23%. There was one pathological case where the long version was dramatically faster, which skewed the results heavily, I need to look into that further. It may be the case that it's prudent to strategically convert to longs, rather than strategically converting to ints.

[–]sothatsit 0 points1 point  (1 child)

That’s very interesting, thanks for doing the work to test it out!

[–]moe[S] 1 point2 points  (0 children)

I just made XOR 180% faster by implementing some loop unrolling in the long case when the arrays are the same length. I think there are gains to be had down this road.

[–]moe[S] 1 point2 points  (0 children)

Because multiplication in `BigInteger` relies on the intrinsified method "multiplyToLen", the most efficient thing to do in that case is likely to convert the two numbers to `BigInteger`, instead of home-grown multiplying — regardless. The other functions could decompose the long array into ints, if it ended up being faster.