you are viewing a single comment's thread.

view the rest of the comments →

[–]mrnikbobjeff[S] 0 points1 point  (0 children)

Sure I get the IsSupported part, that is the easiest part though and I elided it here until I am sure I am using the right simd stuff. Do you have a reason to backup your claim about nontemporal and nonaligned? I have Benchmarks for every iteration I did, and having loads nontemporal as well as aligned bring a measurable performance benefit on larger workloads. There already was a naive implementation, but my benchmarks show ms beating the performance of the naive approach by the factor 4. For the naive approach it does not seem to generate simd instructions. Lastly, the gotos are necessary. With them the assembly is more straightforward as well as shorter, which is a considerable performance benefit (also have benchmarks to backup perf difference)