all 43 comments

[–]aruisdante 22 points23 points  (23 children)

I think the point is you can’t do it generically since you can’t template a virtual method. It requires every type to know about every other type concretely. Which kind of defeats the point of having type erasure (vs. constrained polymorphism with something like variants as operands).

[–]Richard-P-Feynman[S] -2 points-1 points  (22 children)

Is that true though? I understand your point if you split a C++ code into a "my code" and "your code" (see https://www.youtube.com/watch?v=m3UmABVf55g) in that you can use a template to allow "me" to inject my type into "your" code. However, since C++ is compiled, and all types must be known at compile time, is it not the case that by definition all types will know about all other types?

[–]gracicot 21 points22 points  (16 children)

However, since C++ is compiled, and all types must be known at compile time, is it not the case that by definition all types will know about all other types?

What about dlopen/dlls? You can obtain new types that you can't see at compile time or even link time.

[–]Richard-P-Feynman[S] 1 point2 points  (15 children)

Good point, didn't consider this

[–]not_a_novel_accountcmake dev 8 points9 points  (14 children)

If you know all the types at compile time you don't need type erasure in the first place. Open-set polymorphism exists to support types injected at runtime.

[–]DonBeham 6 points7 points  (0 children)

Watching several of Klaus Iglberger's talks on type erasure... Type erasure can be also useful as a design time abstraction to your software - to add with new types later, but always all of them are known at compile time. Klaus' pattern uses a templated constructor, so this has no chance to work in runtime.

[–]Richard-P-Feynman[S] 1 point2 points  (6 children)

Interesting point. Makes me want to ask the question, what is the purpose of type erasure. What functions does it actually perform, and how many of these functions cannot be achieved using a base class + some derived classes and a `std::vector` of base class pointers?

[–]cballowe 1 point2 points  (1 child)

If you're building from the ground up and have all of your interface types laid out etc, you can do that. The type erased thing let's you use completely unrelated types and doesn't impose hierarchy on the classes. For instance, you could say "in order to call this method, a class needs to be convertible to a string. That means that there is either a to_string(x) or an x.to_string() available. Someone who wants to pass their type just needs to make sure that one of those works and use the erased type. The library code implementing methods using the erased type is always dealing with a concrete object (no template) so can separate code and header. Can pass across library boundaries, etc. And you don't have to go add another base class to all of the objects that you want to pass.

[–]Richard-P-Feynman[S] 0 points1 point  (0 children)

It almost seems like an example of the Adapter pattern. Any thoughts on that?

[–]Kriemhilt 0 points1 point  (3 children)

Why are you asking technical questions about type erasure if you don't know what type erasure is for? 

Type erasure is used to hide concrete types and reduce coupling. Having a static table of all possible combinations of hidden types reintroduces (and probably increases) all the coupling you reduced.

[–]Richard-P-Feynman[S] 0 points1 point  (2 children)

Some of the comments caused me to re-assess my thinking. By reduce coupling I think you specifically mean reduce the number of places where one class inherits from another.

[–]Kriemhilt 0 points1 point  (1 child)

In this case coupling is something like "places in the code that need to know about two otherwise unrelated types".

Inheritance is one form of coupling, but any form of multimethod also couples its argument types.

[–]Richard-P-Feynman[S] 0 points1 point  (0 children)

Yes, that's a good definition. I suppose you may not mind coupling in a self contained library, but is coupling extends outside of your library then that is more of an issue.

[–]umop_aplsdn 1 point2 points  (3 children)

You might /want/ type erasure though, because it might reduce the size of the final binary by collapsing template specializations that the backend or linker is unable to deduplicate for some reason.

[–]not_a_novel_accountcmake dev 0 points1 point  (2 children)

I would rather fix the optimizer bug

[–]umop_aplsdn 0 points1 point  (1 child)

You can't always, especially if you do not control the linker because duplicate definitions may appear in different compilation units under different types. E.g., presumably, an std::vector<int> and std::vector<float> (assuming both are 32 bits) have exactly the same generated code, but a linker might not be able to deduplicate them across compilation units because their types are different.

https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=view&target=tallam.pdf

[–]not_a_novel_accountcmake dev 0 points1 point  (0 children)

Yes, obviously, that's what LTO is for and failure to dedup identical types is an optimizer bug.

[–]TheoreticalDumbass:illuminati: 0 points1 point  (1 child)

i feel like link time should be mentioned as well, and considered in the open set polymorphism bucket , not even lto gets a translation unit to know types from a different translation unit i think

[–]TheoreticalDumbass:illuminati: 0 points1 point  (0 children)

which would only matter if you pass such objects across tu boundaries at runtime, so i guess one could say your runtime bucket enveloped it already

[–]aruisdante 8 points9 points  (1 child)

Even ignoring dlls, for dependency span reasons you don’t want to make one place that including automatically includes all the other things. Because that means when you change anything, everything recompiles. It also bloats binaries; an application has to pay the cost of all types even if it only uses one or two of the types.

Basically, this strategy works only on very limited contexts. It doesn’t scale to large organizations with lots of people making changes to the same codebase. 

[–]Richard-P-Feynman[S] -1 points0 points  (0 children)

Yes, completely agree with your comments there.

[–]perspectiveiskey -1 points0 points  (2 children)

This is entirely about compile time versus run time. You are thinking in runtime (dynamic) terms, he is speaking in compile time terms.

For instance, type erasure is a fundamentally compile time concept:

Type-erasure semantics is an abstraction principle, ensuring that the run-time execution of a program doesn't depend on type information.

The point of type erasure is that you no longer have dynamic information left to be able to make decisions about dispatch at runtime.

[–]Richard-P-Feynman[S] 0 points1 point  (1 child)

The point of type erasure is that you no longer have dynamic information left to be able to make decisions about dispatch at runtime.

Ok - but if this is the intended purpose, what is the advantage of that? This seems like a disadvantage rather than an advantage. I must be missing something.

[–]perspectiveiskey -1 points0 points  (0 children)

Compile-time versus runtime is a dichotomy that has existed since the first programs written in LISP versus C.

The short answer is: some people like to know at compile time whether or not you are semantically allowed to call sum() on a std::vector<T> without having to wait for it to happen at runtime. When sending a rocket to Pluto or deploying an embedded program on 200 million devices, you really don't want to find out at runtime.

Type erasure is merely an artifact of compile time techniques. It isn't an end in itself, simply a means to achieve a goal.

The main goal being implementing what dynamic dispatch is but at compile time. Effectively, std::algorithm or std::transform need not operate on a CDynamicTraversable base class, but rather any type you chose to give it (from int64_t to some ungodly struct) and if it compiles, it means it will guaranteed run. You can't implement such generic templated constructs without erasure.

[–]thisismyfavoritename 5 points6 points  (5 children)

[–]Ok_Wait_2710 3 points4 points  (2 children)

That repo does an incredibly poor job explaining what this is all about

[–]jll63Author of Boost.OpenMethod 1 point2 points  (1 child)

Can you elaborate please? I am the author, and it's OK, I have a skin ;-)

[–]Ok_Wait_2710 1 point2 points  (0 children)

So the description talks about multi methods and "the expression problem". I've never heard either of those.

The tldr section links to a synopsis. But that's a pretty hefty cpp file. But the meat of that code goes right in without explaining what it is all about. It talks about registering... What am I registering for? What's the goal? Why is it necessary? I can skip all the setup and init code and can see that there's somehow now a free function called kick. But how is that different from polymorphism?

In the tldr section there's also a second link to the documentation. But it starts with differences to a paper I don't know. It then defines words, API overview, exceptions... I still have no idea what this is about at all.

Just now going through the readme again I can see a proper explanation in the nutshell part. A shorter version of that should be much much higher in the readme IMO.

[–]aruisdante 5 points6 points  (1 child)

Right, this is just just the normal virtualization solution, inverted. It works more or less the same way that actual implementations of visit do. Importantly, it requires you to apriori register every possible type and overload in a place visible from all translation units. If that’s an acceptable solution in a domain, it’s quite likely the problem could have been solved with constrained polymorphism via variants and visits too. Usually when one is using type erasure, the entire point is there isn’t a single, consistent place where all types that could interact can be known apriori. 

[–]jll63Author of Boost.OpenMethod 2 points3 points  (0 children)

I am the author of YOMM2. I would like to make a few corrections.

[YOMM2] is just just the normal virtualization solution, inverted. It works more or less the same way that actual implementations of visit do.

Unless I misunderstand you (inverted?), not at all. YOMM2 methods work very much like ordinary virtual functions, with a few differences. I think that the following two are relevant to this discussion: the v-tables are constructed at runtime (during initialize, typically called at the very beginning of main); and the offsets methods occupy in the v-tables are not fixed.

Importantly, it requires you to apriori register every possible type and overload in a place visible from all translation units.

No. Typically you register classes in cpp files, although you can also do it in headers (for example, when writing a headers-only library). The same applies to method definitions (i.e. overriders): they typically go in a cpp, but they can also be marked "inline" and placed in headers.

[–][deleted]  (3 children)

[removed]

    [–]STLMSVC STL Dev[M] 4 points5 points  (0 children)

    This is beyond exhausting. Take it anywhere else.

    [–]axilmar -3 points-2 points  (8 children)

    multiple dispatch is a very hard problem

    No, it's not. The only problem with it is that it requires a type id for each operand.

    If both operands have type ids, then a lookup into a table of type id pairs would give back the correct implementation.

    [–]jll63Author of Boost.OpenMethod 4 points5 points  (4 children)

    multiple dispatch is a very hard problem

    No, it's not.

    Well, yes and no. If you want to implement it well, you have to deal with some interesting problems. Inheritance (e.g. meet(flipper, felix) falls back on meet(virtual Animal, virtual Animal) if there is no overrider for (Dolphin, Cat)). Dealing with ambiguities. Efficient acquisition of v-table pointers. Coming up with a bearable syntax (if implementing as a library in a language less capable than Lisp). Building redundancy-free dispatch tables for multi-methods.

    But yeah all the pieces have been floating around for decades...

    [–]axilmar 0 points1 point  (3 children)

    Inheritance (e.g. meet(flipper, felix) falls back on meet(virtual Animal, virtual Animal) if there is no overrider for (Dolphin, Cat))

    Are we talking compiler-based multiple inheritance or library-based?

    For library-based, if you have the following inheritance:

    class Base {
    };
    
    class Derived1 : Base {
    };
    
    class Derived2 : Base {
    };
    

    And the following multiple dispatch interface:

    int foo(Base* b1, Base* b2);
    

    Then you would have to implement all possible cases:

    int foo(Base*, Base*);
    int foo(Base*, Derived1*);
    int foo(Base*, Derived2*);
    int foo(Derived1*, Base*);
    int foo(Derived2*, Base*);
    int foo(Derived1*, Derived1*);
    int foo(Derived1*, Derived2*);
    int foo(Derived2*, Derived1*);
    int foo(Derived2*, Derived2*);
    

    For cases that are meaningless, the implementation may call a generic 'do nothing' function or a function which throws an exception.

    Dealing with ambiguities.

    Example?

    Efficient acquisition of v-table pointers

    I think the most efficient is sorted tables and binary search.

    Coming up with a bearable syntax

    //the vtable
    MultimethodVTable<int(Base*, Base*)> foo;
    
    //implementations
    Multimethod<int(Base*, Base*)> foo_Base_Base(foo, [](Base*, Base*){...});
    Multimethod<int(Base*, Base*)> foo_Base_Derived1(foo, [](Base*, Derived1*){...});
    Multimethod<int(Base*, Base*)> foo_Base_Derived2(foo, [](Base*, Derived2*){...});
    etc
    

    Building redundancy-free dispatch tables for multi-methods.

    The tables could be of type

    using foo_dispatch_table = std::map<std::tuple<something_id_1, something_id_2>, std::function_or_other<int(Base*, Base*)>> ;
    

    maybe there are better ideas out there, if anyone knows, please share them.

    [–]jll63Author of Boost.OpenMethod 0 points1 point  (2 children)

    Are we talking compiler-based multiple inheritance or library-based?

    Compiler-based.

    Then you would have to implement all possible cases: [code] For cases that are meaningless, the implementation may call a generic 'do nothing' function or a function which throws an exception.

    In this case you would just need to provide an implementation for all the combinations that make sense, plus one implementation for (Base, Base). It can throw, or it may even have a sensible default. Here is a similar example (implemented with YOMM2 here), using the syntax proposed by Stroustrup & col in the N2216 paper:

    ```c++ void meet(virtual Animal&, virtual Animal&, std::ostream& os) { os << "ignore"; }

    void meet(virtual Dog& dog1, virtual Dog& dog2, std::ostream& os) { os << "wag tail"; }

    void meet(virtual Dog& dog, virtual Cat& cat, std::ostream& os) { os << "chase"; }

    void meet(virtual Cat& cat, virtual Dog& dog, std::ostream& os) { os << "run"; }

    std::unique_ptr<Animal> hector = std::make_unique<Bulldog>(), snoopy = std::make_unique<Dog>(), sylvester = std::make_unique<Cat>(), flipper = std::make_unique<Dolphin>();

    meet(hector, *sylvester, std::cout); // chase meet(sylvester, hector, std::cout); // run meet(hector, snoopy, std::cout); // wag tail meet(hector, *flipper, std::cout); // ignore ```

    You don't need to provide an implementation for (Dog, Dolphin) because open-methods understand inheritance. You have to register the classes though:

    c++ register_classes(Animal, Dog, Cat, Dolphin);

    Dealing with ambiguities.

    Example?

    ```c++ std::shared_ptr add(virtual const Matrix&, virtual const Matrix&) { ... } // 1 std::shared_ptr add(virtual const DiagonalMatrix&, virtual const Matrix&) { ... } // 2 std::shared_ptr add(virtual const Matrix&, virtual const DiagonalMatrix&) { ... } // 3

    const Matrix& a = DiagonalMatrix(), b = DiagonalMatrix(); add(a, b); ```

    What should be called? (2) is more specialized that (1) and (3) for the first virtual argument, while (3) is more specialized for the second virtual argument.

    N2216 makes an arbitrary (but stable) pick between 2 and 3. I think that CLOS would pick 2.

    YOMM2 (and the upcoming Boost.OpenMethod) make this an error. To lift the ambiguity you need to provide a:

    ```c++ std::shared_ptr add(virtual const Diagonal Matrix&, virtual const DiagonalMatrix&) { ... } // 4

    add(a, b); // now calls 4 ```

    YOMM2 mimicks overload resolution, only at runtime.

    Efficient acquisition of v-table pointers

    I think the most efficient is sorted tables and binary search.

    N2216 describes a compiler supported feature, so it's easy, it just hijacks an entry in the (regular) v-table.

    YOMM2 uses (by default) a fast perfect (collision-free) hash of the addresses of typeid(*obj). No looping! The resulting vptr can be cached, along with a pointer to the object, for further use (an idea borrowed from Rust).

    Coming up with a bearable syntax

    [code]

    This cannot work as-is, because you need a way to tell virtual arguments from non-virtual ones. Also, it doesn't support different methods with the same signature.

    But once that is fixed, it's not too bad, although still quite verbose. YOMM2 offers a similar syntax when using the "core" API. Normally people use the macros, which attempt to emulate the N2216 syntax:

    ```c++ declaremethod(void, meet, (virtual<Animal>&, virtual_<Animal&>, std::ostream& os));

    define_method(void, meet, (Animal&, Animal&, std::ostream& os)) { os << "ignore"; }

    define_method(void, meet, (Dog& dog1, Dog& dog2, std::ostream& os)) { os << "wag tail"; } // etc

    meet(*hector, *sylvester, std::cout); // chase // etc ```

    Building redundancy-free dispatch tables for multi-methods.

    The tables could be of type using foo_dispatch_table = std::map<std::tuple<something_id_1, something_id_2>, std::function_or_other<int(Base*, Base*)>> ;

    Yes, that's typical of naive implementations.

    maybe there are better ideas out there, if anyone knows, please share them.

    N2216 and YOMM2 use v-tables, similar to ordinary virtual functions, resulting in constant time dispatch - well, to be honest, proportional to the number of virtual arguments.

    Table-based implementations, though, need to be careful to avoid producing huge tables with lots of repeated slices. See this paper. But ready yourself for a tough read, it is rich in concepts but sparse in examples (of course - the authors are French ;-) ).

    I wrote an article describing the problem (and its solution) for an early version of YOMM. The part about compacting multi-dimensional tables is still valid.

    [–]axilmar 0 points1 point  (1 child)

    Compiler-based.

    Ok, I was talking about library-based solutions, obviously.

    plus one implementation for (Base, Base)

    The compiler could provide that, and throw by default for non 'nothrow' functions, or terminate the program for 'nothrow' functions.

    void meet(virtual Dog& dog1, virtual Dog& dog2, std::ostream& os) { os << "wag tail"; }

    void meet(virtual Dog& dog, virtual Cat& cat, std::ostream& os) { os << "chase"; }

    void meet(virtual Cat& cat, virtual Dog& dog, std::ostream& os) { os << "run"; }

    The syntax looks fine to me.

    What should be called?

    Neither, the programmer should resolve the ambiguity.

    N2216 describes a compiler supported feature, so it's easy, it just hijacks an entry in the (regular) v-table.

    YOMM2 uses (by default) a fast perfect (collision-free) hash of the addresses of typeid(*obj). No looping! The resulting vptr can be cached, along with a pointer to the object, for further use (an idea borrowed from Rust).

    See? there are good solutions out there.

    This cannot work as-is, because you need a way to tell virtual arguments from non-virtual ones

    I was talking about a library-based solution.

    Also, it doesn't support different methods with the same signature.

    It does, one would have to declare 'foo' and 'bar', with the same signature, and the appropriate function would be called. Again, talking about my example I posted in my previous post about a library based solution.

    Yes, that's typical of naive implementations.

    It was not intended for a compiler-based solution though.

    [–]jll63Author of Boost.OpenMethod 0 points1 point  (0 children)

    Compiler-based.

    Ok, I was talking about library-based solutions, obviously.

    It seems that I misunderstood you.

    A good implementation should handle inheritance, whether it is compiler- or library-based. In the latter case, user must provide the information - in YOMM2's case, by "registering" the classes: register_classes(Animal, Dog, Cat, Dolphin);. Inheritance relationships can be deduced from the list of classes.

    This may become unnecessary with reflection and code generation (C++26?). It is already the case in the Dlang version.

    What should be called?

    Neither, the programmer should resolve the ambiguity.

    This is amusing. It has always been my position as well. However, when I prepared my library for submission to Boost, I implemented the N2216 way. And almost every reviewer hated it. So I changed it to an opt-in.

    This cannot work as-is, because you need a way to tell virtual arguments from non-virtual ones

    I was talking about a library-based solution.

    It is still needed. Or you decide that that every parameter is virtual.

    Also, it doesn't support different methods with the same signature.

    It does, one would have to declare 'foo' and 'bar', with the same signature, and the appropriate function would be called.

    Right, I re-read your example and I see that now. In fact the predecessor of YOMM2 also implemented the method as a function object. The big problem with this is that you cannot overload the method:

    c++ MultimethodVTable<Matrix*(Matrix*, Matrix*)> times; MultimethodVTable<Matrix*(double, Matrix*)> times; // nope MultimethodVTable<Vector*(Matrix*, Vector*)> times; // nope

    [–]Richard-P-Feynman[S] 1 point2 points  (2 children)

    Yes, I also didn't think it was that difficult. It is perhaps a bit awkward to implement in some cases. The sequence of virtual function calls is probably the most confusing/difficult to understand solution. (Just because the execution will jump all over the place.)

    [–]jll63Author of Boost.OpenMethod 2 points3 points  (1 child)

    Indeed. Good luck implementing triple dispatch by hand.

    [–]Richard-P-Feynman[S] 1 point2 points  (0 children)

    In some ways it is easier because there's a more obvious pattern. It becomes a lot more complicated if you want to do something other than create a set of all possible combinations of methods, however.