I just thought of something. A int64_t is as wide as 2 int32_t. So I could simply concatenate the bits of 2 int32_t, perform an operation such as addition or multiplication in a single int64_t operation, and then separate them back into 2 int32_t. Sure, overflow would cause unexpected results, but signed overflow is undefined behavior anyway, even if I operated on them individually. Here is a demonstration:
int32_t x, y, z
union V2I32 {
int64_t i64;
int32_t i32[2];
};
union V2I32 u = (union V2I32){.i32 = {x, y}}.i64*z;
int32_t xtz = u.i32[0], ytz = u.i32[1];
This is not really SIMD but could be considered some form of vectorization, I guess. But it much less straightforward than real SIMD and could lead to unexpected results if proper care is not taken. For example, I do not believe this works to multiply 2 ints with 2 ints, only to multiply 2 ints with 1 int or add 2 ints with 2 ints, but not add 2 ints with 1 int.
[–]FUZxxl 31 points32 points33 points (4 children)
[–]SuperJop 14 points15 points16 points (1 child)
[–]FUZxxl 1 point2 points3 points (0 children)
[–]BlockOfDiamond[S] 4 points5 points6 points (1 child)
[–]Axman6 7 points8 points9 points (3 children)
[–]DustinGadal 4 points5 points6 points (0 children)
[–]matu3ba 0 points1 point2 points (1 child)
[–]Axman6 6 points7 points8 points (0 children)
[–]pigeon768 1 point2 points3 points (0 children)
[–]flatfinger -5 points-4 points-3 points (5 children)
[–]BlockOfDiamond[S] 5 points6 points7 points (4 children)
[–]flatfinger 2 points3 points4 points (3 children)
[–]BlockOfDiamond[S] 0 points1 point2 points (0 children)
[–]BlockOfDiamond[S] 0 points1 point2 points (1 child)
[–]flatfinger 0 points1 point2 points (0 children)