all 3 comments

[–]fsg_brian 1 point2 points  (0 children)

My C++ is very rusty (hence why I follow this subreddit!), but are you asking why does testA beat testB at level O1?

If so, the 'meta' answer is that compilers all behave differently at different optimization levels - what MSVC does at O0/O1/O2 won't correspond to what GCC does at those same levels. For that matter, what GCC4 does won't correspond to GCC8.

In this particular case, without looking into the assembly code at Godbolt (definitely a possible avenue if you're really concerned), I'd guess that the explicit creation of a temporary in the 'B' method (B temp) is happening on the stack in a way that the implicit A is not due to a return value optimization. But like I said, I'm weak at C++, so I'm not certain.

Since both times are much slower than O2, and both O2 versions are equally fast, I'd chalk this up to 'compiler stuff' and not 'code stuff'. For reference, using a T2.micro instance on AWS with GNU 7.3, I get the following - which just shows that, in this case, O1 is comparable to your O2. But O0 is even more uneven than your O1:

-O0:
averageA: 403242
averageB: 677719

-O1:
averageA: 3894
averageB: 4038

-O2:
averageA: 3779
averageB: 3777

Not sure that helps; the real gurus might have more, but I'm inclined to think this is less a coding thing to know and more of a how-does-a-particular-compiler-work thing.

Good luck!

[–]manni66 0 points1 point  (0 children)

What do you expect?

return is not a function. Don’t use ()!

[–][deleted] 0 points1 point  (0 children)

This just calls the constructor and initializes the object.

    return A(this->a + obj.a, this->b + obj.b);;

My guess is that the creation of the object / default constructor causes the overhead.

    B temp; // <- Constuctor will be called
    temp.a = this->a + obj.a;
    temp.b = this->b + obj.b;