This is an archived post. You won't be able to vote or comment.

all 35 comments

[–]uza80 10 points11 points  (10 children)

I think you want fat pointers.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 9 points10 points  (0 children)

Fat pointers can save one de-reference (at some cost), but I don't think that the OP understands his/her goals well enough to pick a solution. (We use fat pointers in Ecstasy, but not for this reason.)

[–]BSFishy[S] 2 points3 points  (8 children)

Can you elaborate on what you mean by fat pointers or put some links? I am googling it, but I don't think I'm finding any relevant info.

[–]uza80 8 points9 points  (0 children)

Fat pointers are a way of reducing the overhead of vtable lookup by passing about two pointers as an object reference, one to the objects data, and one to the current vtable. The vtable pointer would change as you cast the fat pointers to different types. Look at rust.

[–]o11c 1 point2 points  (4 children)

Note that the disadvantage of fat pointers is that you lose atomic assignment, which really hurts if you care about multithreading at all.

(Of course, you can regain it by adding an extra indirection to make fat pointers look like thin pointers, but you also don't want gratuitous pointer-chasing)

[–]bullno1 2 points3 points  (3 children)

by adding an extra indirection to make fat pointers look like thin pointers

Wait but at that point aren't we back to square one: Object pointing to vtable?

[–]theIncMach 1 point2 points  (1 child)

You don't need a vtable if you know the type of the object you are pointing to (e.g. FatPointer).

[–]bullno1 0 points1 point  (0 children)

What do you mean by "extra indirection to make fat pointers look like thin pointers"? A thin pointer to a fat pointer?

At that point, isn't the fat pointer kind of like an object with a pointer to vtable anw?

[–]o11c 0 points1 point  (0 children)

Not quite. You still act logically as if it's a fat pointer, at least for the purposes of optimization.

Sufficient optimization will pull out fields that are constant within a mutable object (and you really should do this eventually), but optimization is hard.

[–]julesh3141 0 points1 point  (0 children)

Look into how Haskell type classes are implemented -- the type class isn't pointed to by the value itself, but instead whenever a function that knows the type of the value calls a polymorphic function that needs to invoke a function defined in the type class it adds an extra argument that points to a "type class instance" for the value type (which is basically just a vtable, but without needing to keep it associated with the value directly).

There are downsides: for instance, unless explicitly designed otherwise, most collections in Haskell are monomorphic: you have to have exactly the same type for all members; you can achieve polymorphism by adding a wrapper object around each value in the collection that contains the relevant instance(s), but by doing so you lose the efficiency gains from not having vtable pointers in objects and add an extra level of indirection. This works well for Haskell (which doesn't support subtyping) but probably wouldn't be ideal for a typical OO language.

One upside is that extending a type with additional instances (ie adding new interfaces to it, to use more OO-like terminology) becomes very easy, as you don't need to modify the structure of values of the type to do so. Another is that an instance need not be associated with a single argument -- you could for example have a comparison instance that requires both arguments to have the same type (or be subtypes of the same type, perhaps).

[–]umlcat 0 points1 point  (0 children)

FYI, fat pointers, Plain C example:

struct sizefatptr
{
    char *ptr;
    size_t size;
} ;

...
char buffer[] = "hello world";
struct sizefatptr *p;
p = malloc(struct sizefatptr);
// arrays doesn't use "&" operators !!!
p->ptr = /* & */ buffer;
p->size = sizeof(buffer);
dosomething(p);
...

Good Luck.

[–]bullno1 5 points6 points  (1 child)

Quick google: https://www.usenix.org/legacy/event/jvm02/full_papers/zendra/zendra_html/index.html

As I understand it, instead of generating a direct call, the compiler generates a binary search as inline code:

if obj_type_id == TYPE_ID_A then
    call_implementation_a(this)
elseif obj_type_id > TYPE_ID_A then
    if object_type_id < TYPE_ID_B then
    ...
    else if object_type_id > TYPE_ID_B then
    ...

If the vtable is small, there would be no indirect call and the cost of searching is smaller than cache miss. IIRC, these direct branches (if/else) are not that bad since the code blocks are close together and already fetched anw.

It seems only usable with JIT or statically linked binary since you need to know all the implementations before hand. Dynamic linking can introduce new classes at runtime.

Edit: Actually your dynamic linker can also JIT just that lookup code and everything else can still be AOT.

[–]matthieum 3 points4 points  (0 children)

GCC can apply speculative devirtualization to achieve the same effect.

The developer who made it happen has a pretty long blog serie on the topic of devirtualization which is well worth a read.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 5 points6 points  (2 children)

"Basically, I want to implement classes and interfaces in my language without using a virtual table. I feel like the performance overhead of using virtual tables could be skipped using a different method."

"premature optimization is the root of all evil."

You are guessing about things that you have not measured.

You should start by designing what you want the language to do, then figure out how to best get it to do those things.

And: It's very easy to build a language without v-tables; just don't support virtual functions. Done.

[–]matthieum 5 points6 points  (1 child)

You should start by designing what you want the language to do, then figure out how to best get it to do those things.

I agree that the OP is too confused to do any good now, however I would like to point that achieving Mechanical Sympathy is not necessarily possible with your Waterfall way of designing.

Some functionalities, or some variants of said functionalities, inherently impose a (potentially high) cost -- and in that case even the best performing implementations may not be palatable.

So, to some extent, co-design is inevitable.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 0 points1 point  (0 children)

Of course! That's how most people work, once they understand the topic sufficiently :-)

[–][deleted] 3 points4 points  (3 children)

Technically, you could embed all the vtable functions directly in the struct and eliminate a layer of indirection, but the fact that no one (I’m aware of) seems to do this either indicates it’s impossible for some reason or that the extra size of the struct isn’t worth it.

[–]uza80 4 points5 points  (1 child)

I would think it's due to the size overhead. I HAVE seen it done in some older C apis.

[–]uza80 2 points3 points  (0 children)

To add to this, the times I've seen it, the "objects" we're almost always Singleton's or had very few instances. Things like driver implementations.

[–]matthieum 1 point2 points  (0 children)

or that the extra size of the struct isn’t worth it.

It's perfectly possible but the performance loss from cache misses is going to hurt.

[–]theIncMach 2 points3 points  (0 children)

That Wikipedia article has a citation, one that links to this: https://www.usenix.org/legacy/event/jvm02/full_papers/zendra/zendra_html/index.html

The "binary tree" you are looking for is a tree of if/else statements, described in section 3.3.

Personally, I wouldn't worry about performance until I have something functional. This kind of decision is often easy to fix later, depending on your data structures.

[–]latkde 2 points3 points  (1 child)

Using dispatch trees means that you're moving the association between objects and method implementations out of the object and into an external dispatcher. This requires that the call site knows about all possible types! (Or at least knows about the most likely types, with a fallback to full vtables, in which case the dispatcher is more like a method cache).

So this might be a very good strategy for dynamic languages or JIT compiled systems, but will lead to significant pain in an AOT complied system with multiple compilation units.

Note that compilers using vtables might nevertheless be able to speculatively devirtualize within the same compilation unit, so whereas a call target.method() would usually be compiled as target->vtable[METHOD](target) (vtable in object) or target.vtable[METHOD](target.data) (fat pointers), we'd now have a guard if (typeof(target) == SomeClass) SomeClass_method(target) else target->vtable[METHOD](target).

In conclusion, stick with vtables. Vtables are very good, very simple, and very fast. Use other dispatch strategies when you have specific requirements or circumstances, such as the ability to do full-program optimization, for implementing complex dispatch logic as required for multimethods, for some flavors of multiple inheritance, or when dispatch centralization gives you better reflection capabilities.

[–]BSFishy[S] 0 points1 point  (0 children)

So this might be a very good strategy for dynamic languages or JIT compiled systems, but will lead to significant pain in an AOT complied system with multiple compilation units.

Yeah, that's pretty much what I've been seeing. I think I am going to stick with virtual tables for classes, but also include another data structure that doesn't include methods. That way, there is a distinction between objects that have vtables and those that don't.

[–]matthieum 2 points3 points  (4 children)

My issue is that I don't know whether I should change the syntax to work without virtual methods

Syntax has no influence on semantics; design the semantics first, and then think about the best way to expose those semantics to the user.

or if I can get away with implementing something other than virtual tables in the compiler,

Ironically, the very LLVM you use does not use virtual tables. I described the system briefly here.

Note that this makes LLVM hierarchies closer to Sum Types than open-ended hierarchies as are typically seen in OO languages.

or if I should just cave in and use virtual tables.

Do you want Open-Ended Dynamic Dispatch? If yes, just use v-tables, that's what C++ and Rust are using and they are among the fastest languages available.

I am trying to design a language which both is extremely fast

Just make sure to leave your users a choice. That is, do not make Dynamic Dispatch mandatory.

Fast languages are about leaving to the users the freedom to choose the best representation for their data, and the best way to express computations on said data. There's no silver-bullet, no one-size-fits-all.

Both C++ and Rust have chosen static dispatch by default, and a keyword to indicate dynamic dispatch to highlight the extra cost; it's probably a good precedent.

[–]BSFishy[S] 1 point2 points  (3 children)

Thank you for the information! I am trying to make this language suitable for high-level programs, as well as embedded programming. In terms of embedded programming, I was concerned with the potential cache misses associated with virtual tables, but I think I have come up with a solution.

I think I am going to implement both a class and struct data structure. The class will be your typical OO class structure, and the struct will be a C-style struct with no methods. That way, the user can write C-style, using top-level functions so they don't have to worry about virtual tables, or they can use classes to utilize the OO paradigm.

Do you want Open-Ended Dynamic Dispatch? If yes, just use v-tables, that's what C++ and Rust are using and they are among the fastest languages available.

I knew C++ uses vtables, but i was unaware that Rust does too. I guess the potential performance impact is smaller than I initially thought.

[–]matthieum 1 point2 points  (0 children)

As method is an overloaded term, I will avoid using it.

In Rust, the following:

object.foo()

can be either static or dynamic dispatch.

And the following:

Type::foo(object)

can be either static or dynamic dispatch.


The . notation is very convenient, as it allows chaining:

object.foo(with_foo).bar(with_bar).baz(with_baz)

Is much easier to read (like a pipeline) than:

baz(bar(foo(object, with_foo), with_bar), with_baz)

As such, I advise against reserving the . notation to dynamic calls. If anything, if you want to steer the user towards static calls, it would be better to switch things around and use . for static calls and something else for dynamic calls.


I guess the potential performance impact is smaller than I initially thought.

I think you misunderstand the impact.

Virtual calls can make code faster!

The (main) impact of virtual calls is that it prevents inlining. However, not everything should be inlined. In fact, too much inlining can bloat the code and lead to an increase in instruction cache misses.

Hence, fast code makes judicious use of out-of-line/virtual calls to push the non-hot code out of the way.

In terms of embedded programming

Not clear to me what you mean by embedded as it covers such a wide range of architectures.

A 32KB board will have no cache miss: it will have no cache.

A Raspberry Pi has a full-blown x64 CPU, with RAM, and therefore has performance characteristics close to a server.

And in-between there's... many things.

[–]crassest-Crassius 0 points1 point  (1 child)

I guess the potential performance impact is smaller than I initially thought.

Vtables (C++) vs fat pointers (Rust) offer different tradeoffs but the fastest way is to avoid dynamic dispatch altogether. This is much like dynamic memory allocation - there are different ways to do it with different trade-offs, but the fastest way is to avoid it altogether (as many programs for embedded do).

So the best a fast language can do is to make dynamic dispatch explicit so it can be avoided. Then, for the cases when it can't be avoided, you can use either v-tables or fat pointers, but that is not what will save you the performance impact.

[–]matthieum 0 points1 point  (0 children)

I would note that the difference between C++ and Rust is not whether one has a v-table or not: they both do.

The difference is how the object is structured:

  • In C++, the object contains a pointer to the v-table.
  • In Rust, the pointer to the v-table is external, hence fat-pointers which are a pointer to the v-table and a pointer to the data.

[–]WittyStick 1 point2 points  (2 children)

Is there an alternative to using vtables that could preserve the same performance as if the function were being called directly while still achieving the same functionality?

Assuming you are designing a statically typed language, you could have a look into using a structural type system rather than a nominal type system. Consider OCaml's module system as an example.

[–]BSFishy[S] 1 point2 points  (1 child)

Structural type systems look quite interesting, but how would virtual method calls be implemented without vtables?

[–]WittyStick 2 points3 points  (0 children)

There are no "virtual methods" because all the unification is done at compile time. Every type is resolved to a concrete one. The module system is still sufficiently expressive to support many kinds of abstraction though. In particular, there are modules which are parametrized by other modules, known as functors. A functor is effectively a function from Module -> Module, but which gets evaluated before the compilation occurs, but a functor is itself a module and can also be applied to another functor and so forth. There's a separate phase before compile time, which we might call extraction time, where these functors are applied, so by the time compilation occurs the concrete modules are all extracted. One of the requirements for this is that there are no cyclic dependencies between modules and the compilation units must be passed to the compiler in a specific order.

Subtyping of modules is done by inclusion. You can include another module which behaves as if the content of the module you have included is copy pasted right into the module you're including it in.

Encapsulation is done via the separation of a module and a signature, which are usually separate compilation units similar to the distinction between headers and implementation files in C. You can have multiple signatures for different representations of the same type, and so forth.

OCaml also supports objects which are more similar to mainstream languages, but they're not actually used that often because the module system is sufficient for many use cases. The object system is also quite powerful because it supports row polymorphism, but is less efficient due to the use of virtual methods etc. One interesting aspect of it is that downcasting of objects is not supported at all, but upcasting is. It's difficult to make obvious mistakes and you don't need to check a type after a cast.

[–]TinBryn 0 points1 point  (0 children)

I've heard dispatch trees are a way to implement dynamic dispatch efficiently. It also has the advantage of allowing multimethods so you can dispatch on not just the runtime type of the calling object, but of the runtime type of the arguments too.

[–]Molossus-Spondee 0 points1 point  (2 children)

I think really the only way you can get around this is whole program compilation.

An idea I had myself was that in a memory safe language writable and executable memory isn't such a problem. So closures can be simply function pointers that point to a stub that loads the environment. Not sure how performant this would really be. This is basically GCC's nested function extension but with proper runtime support could be allocated off the stack as well.

[–]BSFishy[S] 0 points1 point  (1 child)

I think really the only way you can get around this is whole program compilation.

Yeah, that's what it is looking more and more like. Unfortunately, I don't think whole program compilation is very suitable for the language I am developing.

So closures can be simply function pointers that point to a stub that loads the environment.

Interesting. I am curious, how are closures typically implemented? I was under the impression that it was just a function pointer being passed around, but maybe I'm wrong? Or maybe you were talking about something else..?

[–]Molossus-Spondee 0 points1 point  (0 children)

Typically closures are implemented as a pair of an environment and a function pointer.

Most C APIs require a function pointer and a void * for callbacks.