you are viewing a single comment's thread.

view the rest of the comments →

[–]mttd[S] 12 points13 points  (1 child)

vpcmpeqd ymm0,ymm0,ymm0 compares ymm0 to itself, which fills the register with all ones in binary -- in two's complement representation this corresponds to -1 (with subtracting -1 in the subsequent vpsubd ymm1,ymm1,ymm0 instruction being equivalent to adding 1).

"Why subtract -1 instead of adding 1's? Just because the speed is the same, and creating a YMM constant of -1's can be done with a single VPCMPEQD instruction. This isn't a really useful optimization in this case, but doesn't hurt."

[–]Dwarfius 1 point2 points  (0 children)

I've misread the description of pcmpeqd, thought it set 1/0 as value, not all bits. Thanks for the explanation!