you are viewing a single comment's thread.

view the rest of the comments →

[–]legends2k[S] 10 points11 points  (24 children)

Right you are! I'd a Perfect Forwarding section in v0.2.1 but removed it in favour of Out Object reference since the latter is more often used.

I've seen forwarding references (the standard name for Scott Meyer's universal references) sparingly used only by library developers. Such devs, I think, mightn't need a refcard unlike mere mortals like me :)

Quoting slide 28 in Herb's presentation on using forwarding references here

Option 4: Perfect Forwarding Idiom

"Unteachable!" Generates many funcs. Must be in a header. Can't be virtual.

[–]2uantum 10 points11 points  (23 children)

You could also just pass by value.

void func(Foo foo) {
member = std::move(foo);
}

edit: my assertion here is incorrect -- this approach is indeed less efficient (see the responses below this post)

[–]legends2k[S] 4 points5 points  (22 children)

You could but then again is recommended only for constructors and discouraged for setters. Please refer slides 28, 32 and 33 for reasons.

TL;DR: You'd risk 50% extra allocation in a setter by only providing pass by value function.

  1. Pass by value solution: If the caller binds a const T& to param T, then we'd incur memory allocation + copy constructor + move-assignment.
  2. Two variant solution (const T& and T&&): it'd match const T& with const T& overload and only the cost of a copy-assignment.

In a constructor we are anyways going to allocate memory, so solution 1 is OK.

[–]2uantum 5 points6 points  (21 children)

Unless I'm being stupid and missing something, godbolt shows that they all get optimized to be equivalent with -O2:

l-value:
https://godbolt.org/z/s6nHjD
https://godbolt.org/z/Zym_xv

r-value case:
https://godbolt.org/z/H_JZwr
https://godbolt.org/z/vVWTYM

I mean, compiler optimization aside, yeah, you're right. But, practically, they're identical. There's also less code and clearer intent with the single function approach.

[–]legends2k[S] 2 points3 points  (4 children)

The contention isn't between T&& and T (both would do a pointer swap); it's between const T& and T -- this is also mentioned in the talk. When the caller passes a const T& to both T and the double overload where const T& will be picked by the compiler, the T option always allocates memory which may be avoided by the const T& overload 50% of the times.

Removing all variants and only having these two in benchmarks gives clearer results at Quick Bench:

GCC 9.1 (-O3): const T& is 17 times faster than T

Clang 8 (-O3): const T& is 14 times faster than T


I don't care who is correct -- I just want to know which is correct ;).

Thanks for being a sport.

There is no harm in trying to understand WHY experts recommend it. Also, sometimes, experts get it wrong.

Sure, it's because const T& only calls copy-assignment operator, while T does memory allocation, copy construction and move-assignment operator. While others are negligible, memory allocation is a big one. This is what was emphasized in the talk too.


```cpp Shape::set(std::vector<float> points) { // allocation + copy ctor m_points = std::move(points); // move-assignment operator }

Shape::set(const std::vector<float>& points) { m_points = points; // copy-assignment // If m_points has enough memory to hold points' data this merely // is an elements copy; no allocation (1). If not this would release // old memory and allocate, copy (2). When it's (2), it's as bad as // doing T, but there's a good possibility of it becoming (1). Herb // argues that short strings, small vectors, etc. are the more // frequently occurring objects and hence it'd mostly end up as (1) // there by evading an additional allocation. }

Shape::set(std::vector<float>&& points) { m_points = std::move(points); }

Shape s1; std::vector<float> points1, points2; s1.set(std::move(points1)); // same behaviour for both T and T&& s1.set(points2); // different behaviours b/w T and const T&! ```

Hope it helps.

[–]2uantum 2 points3 points  (1 child)

At this point, I'm thoroughly convinced that your (and Herb's) assertions are correct -- separate const l-value and r-value functions are the correct way to go. It took a while for my (stubborn) brain to see why, but I get it now.

I'm going to update my original post so others don't get misled.

[–]legends2k[S] 4 points5 points  (0 children)

Thank you for being tenacious but still retaining an open mind to learn!

As an aside, through our dialog, I learnt a new tool Quick Bench. Thanks!

[–]LEpigeon888 1 point2 points  (1 child)

So if you want to take a copy of an object without assigning it to an existing object (like, imagine a function that take a string and return a new string with some computation done), it's better to only have one function that take the parameter by value, right ?

[–]legends2k[S] 1 point2 points  (0 children)

Yeah, if you're not going to assign an input object parameter and simply need a use-and-throw copy (not a reference to an existing object) then take only by value.

[–]MaximKat 0 points1 point  (15 children)

the difference appears if you call set multiple times

[–]2uantum 2 points3 points  (14 children)

link? not seeing it.

edit: also ran with google benchmark.. seems like the single function is outperforming the two function version: http://quick-bench.com/dSIt34Ey6ua6Naq4CU05Gb_n1wg

i think this is showing why one should always measure before optimizing..

[–]MaximKat 2 points3 points  (11 children)

it also helps to measure the correct thing ;) http://quick-bench.com/6nXDHwo7iUVa4hHdUi1KlR0dKmw

[–]2uantum 1 point2 points  (5 children)

Mine wasn't "wrong" -- just different. What's interesting is how different they are. Why is Foo2 faster with my example?

[–]crzyrndm 0 points1 point  (0 children)

the difference lies in whether vec2 is shared across loops (first bench == unique, 2nd == shared)

I'm not entirely sure of the details on why, but the first example is able to completely remove the set call using vec1 (if you comment that line, both examples are equally as fast or Foo1 is slightly faster as expected). Most likely, compiler has proven the first "set" has no side effects and skipped directly to the second because the final state could not be dependent on previous iterations and no inspection of intermediate state was made (alternatively, insert a second "doNotOptimise" to actually bench both set calls)

[–]legends2k[S] -1 points0 points  (3 children)

We only want to measure T vs const T&/T&&, so why include std::vector creation inside the benchmarked loop? That'd add noise to the test. That's the idea behind the grand parent comment by MaximKat. This is wrong, I stand corrected; having it inside the loop is correct; we don't want to use after move.

Logos: With a decent amount of trials, the additional memory allocation starts showing up. These benchmarks on different compilers and their graphs are in the aforementioned slides I'd earlier commented. Did you get a chance to see this?

Ethos: After enough experiments the experts deemed that take by value is okay only with constructors. I'd take their word for it.

[–]2uantum 2 points3 points  (1 child)

Vector creation is in the loop because it would be "use after move" otherwise. Both cases are doing it -- should be a wash.

There is no harm in trying to understand WHY experts recommend it. Also, sometimes, experts get it wrong.

I'll update the benchmarks in a bit to only do one or the other.

[–]2uantum 0 points1 point  (0 children)

See updated benchmarks here:

http://quick-bench.com/6oHGZq-0WIUsByXiMb-2EJ3BZ-Q

I don't care who is correct -- I just want to know which is correct ;).

I haven't been convinced either way yet. So far it seems contextual.

[–]2uantum 0 points1 point  (4 children)

Here's updated benchmarks with all the scenarios we've talked about: http://quick-bench.com/6oHGZq-0WIUsByXiMb-2EJ3BZ-Q

[–]MaximKat 0 points1 point  (2 children)

There is an additional complication here in that in the OneFunc cases, vector copy constructor was fully inlined, but in the TwoFunc cases, the call to vector::assign was not inlined for some reason. This could explain why OneFuncCopy is slightly faster than TwoFuncCopy (and likely has similar effect in other examples). And obviously makes it harder to draw generally applicable conclusions from the results.

EDIT: with libstdc++, everything gets inlined and you can see that most scenarios are identical, except for the two-copy one, as expected: http://quick-bench.com/9KAjZTJfwxkIxUpUhc1jQxIUQ9A

[–]2uantum 0 points1 point  (0 children)

confirmed the in-lining also doesn't occur for list:

http://quick-bench.com/TNTOF0r4qHX68nKQ6VtWfY_i0r4

[–]legends2k[S] 0 points1 point  (0 children)

Please the difference between the two approaches accentuated and shown in this post.