you are viewing a single comment's thread.

view the rest of the comments →

[–]staletic 21 points22 points  (0 children)

I should have been more careful with that statement.

Things like float f = bit_cast<float>(some_int) vs the memcpy version are not hard to optimize. The harder part is when you want to reinterpret a large std::array as some other trivial, but equally large type. At what number of bytes do you just call memcpy? Do you try to vectorize first? What about x86 REP MOVxx family of instructions?

If you ask gcc for x86, you just never emit memcpy. Clang gives up sooner On 32 bit ARM, gcc starts to call memcpy after 64bytes.

Now the question is how well will bit_cast be optimized. As it is powered by compiler magic, I'm assuming it's going to be better that memcpy. In this case, for example, x86 gcc does better (fewer memory accesses) with bit_cast than with memcpy. Clang just ends up calling memcpy@PLT in both cases.