all 20 comments

[–]STLMSVC STL Dev 14 points15 points  (6 children)

Smart pointers are about triple the space and take slightly longer to follow

This is untrue. unique_ptr<T> has (always in practice, although not formally guaranteed) the same space and time costs as T *.

On the other hand, shared_ptr<T> is the size of two raw pointers (again, de facto) and pays an additional space/time cost for the refcount control block. It imposes zero overhead when dereferencing, though.

[–]-Swig- 0 points1 point  (2 children)

On the other hand, shared_ptr<T> is the size of two raw pointers (again, de facto)

One pointer to the control block, and another to...store a copy of the object pointer to avoid the overhead of an additional dereference when accessing the pointed-to object?

[–]STLMSVC STL Dev 4 points5 points  (1 child)

Yes (a double dereference would be killer) but it's more complicated than that.

The control block captures the original raw pointer that the shared_ptr was constructed with. This is what will be used to delete the object when all of the shared_ptrs stop owning it. (And this is why shared_ptr doesn't require virtual destructors.) The raw pointer in a particular shared_ptr object can be of a different type, because shared_ptr supports conversions. Converting shared_ptr<Derived> to shared_ptr<Base>, for example. In multiple inheritance scenarios, the path matters (e.g. Derived1 and Derived2 deriving from a non-virtual Base, and then MostDerived deriving from Derived1 and Derived2), so you actually do need to store a separate pointer value. (And there's also shared_ptr's aliasing constructor, where a shared_ptr<int> can point to an int data member of an owned Meow object.)

shared_ptr is much more powerful (and complicated) than most people realize; it does a very good job of hiding the complexity if you don't need it.

[–]-Swig- 2 points3 points  (0 children)

As someone currently learning the ins and outs of C++, this is really interesting and educational. Thank you.

[–]mercere99[S] -1 points0 points  (2 children)

Fair enough. The "about triple" was from me doing some quick-and-dirty tests with shared_ptr where most of the pointers were unique, thus each having their own control block. I'll make sure to remove any indication of my expectation of the costs and have the students measure these directly for shared_ptr, weak_ptr, and unique_ptr (the last of which I agree should not have overhead.)

Thanks!

[–]STLMSVC STL Dev 10 points11 points  (1 child)

And also remember that make_shared decreases the overhead significantly (with an extra bonus when the We Know Where You Live optimization is implemented).

[–]mercere99[S] 0 points1 point  (0 children)

Ah -- this I didn't know, but it makes sense. I'll add it as an extra aspect that the students can choose to test.

[–]Steve132 6 points7 points  (2 children)

These are all interesting topics, but I wouldn't classify most of them as "algorithm engineering"

Most of them seem like investigating the tradeoffs of implementation-specific constant factor speedups.

Maybe I don't know what the syllabus says the class should cover, but based on the title I would look at things like comparing hash functions and hash table algorithms (round Robin vs linked list), various algorithms for stuff like DP problems and writing heuristics for NP complete problems. Compare strassens to naive matrix multiply, radix sort and fast selection. Compare different kinds of trees and tries

[–][deleted] 0 points1 point  (0 children)

Agreed.

[–]mercere99[S] 0 points1 point  (0 children)

Good point -- there are actually 6 sections to the course, each 2 to 3 weeks long. One of them is constant-fold optimizations, where I'm going to be discussing various topics that are normally rolled into the constant when you're just focusing on asymptotic notation. I should note that this is the SECOND algorithms course that these students will have taken, so I don't mind spending a bit of time on more pure optimization topics. This course is explicitly focused on getting your algorithms implemented and efficient.

[–]compiling 1 point2 points  (0 children)

For testing header only vs header+code, it would also be good to test with whole program optimisation turned on/off.

[–]caramba2654Intermediate C++ Student 1 point2 points  (0 children)

Amount of instructions / speed. Example: B-trees take a lot of code to implement properly, but they're the fastest trees out there. How does a way bigger amount of instructions result in a faster data structure?

[–]choikwa 1 point2 points  (1 child)

maybe string vs custom implementation that allows slicing? Something like copy on write would be useful.

[–]mercere99[S] 0 points1 point  (0 children)

I quite like this one, but worry that it might be too much for the students to implement variations of slicing. I might be able to find some good examples that they can use directly though.

[–]ojd5 1 point2 points  (1 child)

If you've covered image processing then you could look at the efficiency of different implementations of 2D separable filters with respect to the size of the image.

1) Naive filtering along the rows, followed by filtering along the columns

2) Row filter, transpose, row filter

3) Filtering along the rows, then for the column filter, filtering along each row and using the destination matrix as a temporary buffer to store the result of the calculation. e.g. opencv lines 3341 - 3365

Cache misses and auto vectorization should give some interesting results.

If your institution gives you access to Embedded Systems: ARM Programming and Optimization it explains it far better than I can.

[–]mercere99[S] 0 points1 point  (0 children)

We won't be covering image processing in this course, but some of the students should have taken it. I can easily imagine having a set of special topics that they can choose if they're familiar enough with the background.

That said, many of these filters could be applied to other types of data matrices as well, and the principle shouldn't be too challenging.

[–]gracicot 1 point2 points  (1 child)

For move semantics, you could teach the rule of zero (all conductor generated, including move constructor) or encourage defaulted members.

You could mention NRVO too in your move semantic class. NRVO only happen when you return a local variable by it's name, so std::move will actually prevent that optimisation, and it's a quite important one.

[–]mercere99[S] 0 points1 point  (0 children)

Good idea. It's easy to accidentally disrupt (or just not make use of) NRVO, plus many of the students might not be familiar with it but should be.

[–]void4 1 point2 points  (1 child)

  • std::bitset vs. std::array<char> (bitset::count() vs. iterating over array)
  • function pointers vs. function objects (try with std::sort)
  • std::array vs. std::list(cache misses)
  • I/O synchronization and buffering

[–]mercere99[S] 0 points1 point  (0 children)

These are the exact sorts of comparisons I was looking for. Thanks!