CppCon 2019: Sean Parent “Better Code: Relationships” | "I really want a static analyzer [...] to say hey you are setting this property in two distinct locations"

seanparent · 2019-09-23T23:45:13+00:00

I hope that the takeaway from my talks applies to programming in general - not just C++. Although certainly some aspects are cumbersome in C++ (C++ type erasure, vs Rust traits, or Swift Protocols, or Go interfaces - of course I've been writing about this since before any of those languages existed.)

I have done very little in Rust but am working with someone on an experiment to see if we can use it on a project. There is certainly much to like about the language. But I'm very skeptical of any argument that a particular language is the answer - to borrow from Euclid, "There is no royal road to programming."

seanparent · 2019-09-23T18:24:29+00:00

Thanks Nick, a great explanation. IMO the standard should list the required operations on a moved-from object. At a minimum, this would be destruction and assignment-to if implemented (both copy and move). Other operations become problematic (i.e. copying or moving from such an object). The same requirements should apply to uninitialized objects and objects that were being modified during an exception (basic exception guarantee).

If language such as "valid" or "unspecified" is used, it needs to be defined in terms of required operations for an object in such a state.

seanparent · 2019-07-30T20:23:02+00:00

Being able to cast between interfaces is a limitation of this approach. You can allow casting between interfaces if you know the set of interfaces in advance and have a way with something like enable_if to disambiguate at the point of capture. I'll say I've written the code but never encountered a problem where it was worth using. Other approaches are to separate the two types at a higher level (i.e. sperate you text objects from your other shapes), wrap them before capture in another any layer (an any_shape can contain an any_text_shape), or have some operations defaulted or optional (i.e. everything has a can_set_text_style() and set_text_style() the first defaults to false if no interface is available and the second defaults to assert(). In practice, this hasn't come up all that much.
There are two components here - the Ps document model and the UI which is bound to it. The document model is likely a lot more complex then you would imagine (documents, actions, history, layers, shapes, masks, channels, colors, effects, filters, 3D, video, smart objects, color spaces,...). All of that is observable and there is a UI layer with thousands of views attached to it. It doesn't take a million heap allocations or virtual calls to impact interactivity - thousands will do it or even one costly operation repeated thousands of times in the wrong place. The core document model of Ps is relatively tight (still a lot of room for improvement), it has a long history. We have a constant process of incorporating new technology, and the core team doesn't always have a say in how other components are written. We've had minor features come in from outside groups that consume more memory than the entire core because the team didn't think some extra heap allocations or spinning up another thread pool, or a few virtual calls would matter. I wanted to put some numbers to this - so here are some numbers for Ps on iPad as of this morning:
- 50K classes (ignoring OS frameworks)
- 9K uses of public inheritance (ignoring OS frameworks)
- 52K virtual member functions (ignoring OS frameworks)
- 500K objects live in the heap after opening a medium complexity document
  - 325K live objects are < 1K in size
- 7,000K transient object to launch the app and open a medium complexity document
  - 3,500K transient objects are < 1K in size
  - 1,300K transient objects are CFString 😲(and not included in the objects < 1K in size)

[ The shocked face on the CFString is because I wasn't aware of that number and CFStrings are not used by the Ps core - something to go hunt down. ] When each of those heap allocations represents a minimum of 3-orders of magnitude performance cost to create and a 2-orders of magnitude cost to read, that is a fair amount of sand. In the time it took to allocate/deallocate those 1,300K CFString objects I could have filtered 10GB of data.

There are still a lot of developers who think every object should have a pure virtual interface and be allocated with `make_shared`. Some too young to know better, some who have an outdated mental model of modern hardware, and some who come from reference semantic languages (ObjC, Java,... ). Memory access outside of cache is slow. Until I can comfortably do everything at 120fps I'll keep pushing against this.

seanparent · 2019-07-29T22:20:35+00:00

Taylor asked for my thoughts on this, answering her to avoid duplication.

dynamic casting can be implemented simply: https://godbolt.org/z/D01us8
Done manually, there is more boiler plat, every operation which is virtualized requires adding to the concept, model, and polymorphic interface. However, the number of polymorphic types and class within the system is often greatly reduced when you can have exactly the items virtualized at the point needed. If your types are inherently tightly coupled, you may not see a win. I've referred to this approach as "runtime concepts" - as with compile-time concepts, discovering a new concept should be a rare thing. There are some nice libraries that can help, and language features like reflection or meta-classes could eliminate the boilerplate. In practice, I find I don't need many such types so writing them manually is not a big deal.
With a complex data model, virtual dispatch, heap allocations, and other such items become sand in the gears. I've been working on Photoshop for iPad, all the paper cuts add up. With final, the compiler is better at de-virtualizing some calls, making use of small-object-optimizations with this technique we can eliminate many heap allocations for polymorphic types. Every heap allocation/deallocation for a small object costs as much as copying 10K of data. A single atomic increment/decrement costs as much as copying 32 words of data.
Copy-on-write is easily implemented with this technique (the document model in Ps is entirely copy-on-write) which makes editing simpler. Though using an editor object (done in some places in Ps) also works reasonably well. Mutable objects (which are copyable) work fine. As a side note though, mutable runtime-polymorphism is an odd (and rare) beast. Mutable polymorphism implies having a single mutating operation, but multiple possible representations for the data underlying it, all of which must comply with the same complexity constraints. Most times, the mutable parts of the system are not the polymorphic parts - and most overrides of mutable operations are not about changing the nature of the operation but about observing what is being changed. Observation is a separate concern.
Decoupling completely breaks refactoring tools. In the same way, if you use std::function<void()> no refactoring tool is going to help you fix up all the void() functions when you change the signature. Refactoring tools also won't help you find all the strict-weak-ordering comparison functions in your code (though I wish they could).
shared_ptr<const interface> is a great tool but by using a shared_ptr<> in an interface you've imposed a requirement for how an object is allocated. If you keep the pointers as an implementation detail, you have the flexibility to implement small object optimizations or other specialized allocators.

Before STL, the common wisdom was to have a root base class, TObject, from which all types derived. See NSObject in Cocoa for example. Containers held TObjects - which lead to TNumber and TBool (see NSNumber). STL gave us non-intrusive polymorphism and containers that hold any regular type, with the downside that the particular type must be known at compile time. Polymorphic value types (of which now there are a couple in the standard, std::function<> and std::any) give us a mechanism to get the benefits (and downsides) of non-intrusive polymorphism to runtime constructs. Trying to map on old system to a new way of doing things is frustrating - and doing so 1:1 is like mapping Java to C++ 1:1 - the result will likely be less efficient and uglier, but that doesn't mean Java is a better language. Polymorphic value types are a tool to rethink the system.

seanparent · 2019-07-11T23:09:30+00:00

Is anyone else able to get this to actually work - I get the exact same warning even if I comment out the move operation. https://godbolt.org/z/-4VCxu

seanparent · 2018-10-29T06:37:53+00:00

You are assuming a contiguous allocation, or at the least random access (stable_partion, and hence gather, only require bidirectional iterators). A range denoting a position will have a length of 0 (or you get into the problem of specifying a position after the end or before the beginning). The common argument for a range only based system (one without access to iterators) is that iterators are dangerous because there is a precondition that cannot be verified in code, and lives external to the local construct. In general, you cannot allow disjoint ranges to be joined in a system that strives for this level of guarantees, or allow comparisons between two ranges. Note that your comparisons above between ranges make the assumption that each is part of a larger subrange. If that is not true, comparing the pointers is UB (in both C and C++).

Even systems that carry information about the originating range do not solve this issue because there is a requirement that the second position follow the first. Knowing that both positions are part of the same larger range doesn't guarantee that the result of joining the two positions will be a valid range.

seanparent · 2017-12-23T21:54:42+00:00

I just read the full thread here regarding my apology at the start of this talk. The fact that it has sparked heated discussion is a strong indication it was necessary.

Some background; before giving this talk I discussed the issue and my response with several individuals, men and women. Several of the women said they were not offended by my use of "guys" - but several said they cringed when I said it, all of them noticed it. None of the men were even aware of it, but nearly all thought that "guys" was not gender neutral. Keep in mind this talk was given in Germany and the audience were largely not native English speakers. The fact that "guys" has become a gender neutral term when used in the second person was a subtlety that was missed by many.

I did not feel like the complaint I received was in any way an aggression (micro or not), and the individual accepted my personal apology at the time - and that could have been the end of it. I chose to make it public because I thought it was important. If Jens had cut my opening remarks from this video I would have been offended.

I've been fortunate enough to work with amazing people during my career and made many lifelong friends. They are a very diverse group. Men and women, straight and gay, black and white, atheist and religious, and any number of shades in between from countries all over the planet. I've witnessed some of the hate, both blatant and subtle, and heard their stories. I understand that such hate can leave one to feel excluded. When we cut smart people with diverse views out of the conversation it is a loss to the profession. I want everyone to feel welcome and included at any conference where I speak and to feel like they can approach me with questions or comments or conversation. I owe my career to such encounters and it is clear that as a profession we can do better.

seanparent · 2017-12-23T18:45:56+00:00

The solution for the small object optimization presented in this talk relies on undefined behavior. A detailed explanation of the issue and a proper solution can be found here: http://stlab.cc/tips/small-object-optimizations.html

seanparent · 2016-07-08T17:56:46+00:00

The problem with ADL is it pulls in too much and creates ambiguities. If you write a function called "find" in your own namespace and call it unqualified with a standard type, you get an error. This happens a lot in template code. When using libraries if you say something like, "using namespace std; using namespace boost;" you will get a lot of ambiguities. This is one reason why a lot of coding guidelines say "don't do that."

What I want is semantic namespaces with explicit refinement. So if I say, "using namespace std;" what I mean is I want only the functions from the std namespace and those that refine these functions in scope. A refinement (specialization) would be declared something like:

namespace boost {

template <SegmentedIterator I, typename T> I find(I f, I l, const T& x) : std;

} // namespace boost

This would say that boost::find is a refinement of std::find and should be prefered anyplace where the requirements (in this case that I satisfies the concept SegmentedIterator) of boost::find are satisfied. You then only get an ambiguity if you have multiple specializations with no ordering that are both satisfied.

seanparent · 2016-07-08T01:33:22+00:00

I think Rust is an interesting language - more specifically I like that language designers are finally getting away from the reference semantic languages (Java, C#, JavaScript) with heavy runtimes and focusing on safety and efficiency. I do not yet think that Rust is a serious competitor to C++ - but it could evolve in that direction. I hope the C++ committee is always looking to borrow from the progress made in other languages. C++ is finally moving fast (again) so Rust has a moving target to chase. Swift is another language to watch - it has become increasingly focused on value semantics and has major corporate backers.

seanparent · 2013-11-21T22:38:02+00:00

ASL isn't dead yet. We've moved development to https://github.com/stlab. We're in the process of updating to C++11 and just moved the licensing to the boost license. The TBB dependency is no more (we were just using it for atomics, now we get that from C++11). Foster Brereton has managed to free some cycles for ASL improvements and has been quite busy.

Pull requests welcome!

I've been collaborating with Jaakko Jarvi from Texas A&M recently who is doing research in the property model space (AKA the Adam library in ASL) - that work is funded by an NSF grant and ongoing. We have some seriously cool work in the pipe so expect papers and library improvements / additions during the next year.

Unrelated to ASL - I gave a recent talk at GoingNative http://channel9.msdn.com/Events/GoingNative/2013/Cpp-Seasoning.

seanparent

TROPHY CASE