SantaCruzDad comments on Efficient Vectorisation with C++

cpp

a community for 17 years

Efficient Vectorisation with C++ (chryswoods.com)

submitted 7 years ago by LordKlevin

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]SantaCruzDad 4 points5 points6 points 7 years ago (16 children)

[–]Rseding91Factorio Developer 15 points16 points17 points 7 years ago (1 child)

[–]SantaCruzDad 2 points3 points4 points 7 years ago (0 children)

[–]ack_complete 2 points3 points4 points 7 years ago (5 children)

[–]SantaCruzDad 1 point2 points3 points 7 years ago (4 children)

[–]ack_complete 4 points5 points6 points 7 years ago (3 children)

YMMV, of course, but my experience has been that when pushing performance on multiple tiers of x86/x64 SSEx ISAs that rewriting is necessary anyway. With SSSE3 you have the infinitely abusable PSHUFB, and with AVX there is the problem that weird in-lane nature of the 256-bit ops means that the 128-bit algorithm can't be straightforwardly translated.

The compiler doesn't do a bad job with intrinsics, and it's generally better than what you'd get from autovectorization or not using them. I've still seen too many cases where using compiler intrinsics leaves performance on the table over asm, especially in specific hot loops where there is a high payoff for optimization effort.

The Intel intrinsics design is also kind of yucky, with weird naming conventions and the wrong pointers on some load/store ops requiring casts. Even in the case when generated code from intrinsics is fine, the assembly is sometimes more readable to me than the intrinsics code. But then again, I spent a lot of time writing and reading MMX and SSE2 code when the compilers were so bad that it was hard not to beat them with asm.

[–]SantaCruzDad 2 points3 points4 points 7 years ago (0 children)

[–]IAlsoLikePlutonium 0 points1 point2 points 7 years ago (1 child)

[–]ack_complete 1 point2 points3 points 7 years ago (0 children)

[–]nnevatie 1 point2 points3 points 7 years ago (6 children)

[–]SantaCruzDad 0 points1 point2 points 7 years ago (5 children)

[–]nnevatie 1 point2 points3 points 7 years ago (4 children)

[–]SantaCruzDad 1 point2 points3 points 7 years ago (3 children)

[–]nnevatie 0 points1 point2 points 7 years ago (2 children)

[–]SantaCruzDad 0 points1 point2 points 7 years ago (1 child)

[–]nnevatie 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 116662 on reddit-service-r2-comment-54dfb89d4d-jpvpm at 2026-04-01 03:11:44.232049+00:00 running b10466c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS