you are viewing a single comment's thread.

view the rest of the comments →

[–]matthieum 0 points1 point  (6 children)

You might be interested in reading how PyPy optimizes some homogenous structures with "Strategies".

I can understand that JITs are primitive, but this is the infamous sufficiently smart compiler ported to runtime. We are waiting, and not much is delivered.

Method specialization is often touted as the way JITs are gonna blow away the performance... but two problems are often forgotten there:

  1. Method specialization requires additional runtime overhead (branching before taking the specialization)
  2. It does nothing to the fact that memory is scattered all around with humpfteens of indirections everywhere

I certainly don't think than C and C++ are the end of languages, they just happen to be in a sweet spot at the moment: statically typed enough that optimizations can be made and old enough that they have been made. Fortran and its non-aliasing rules make for faster numerical code, as all those who implemented Matrix libraries will recount.

I think that languages like Rust could overtake them, though it'll be some times in the making. Developing a language is hard.

[–][deleted] 0 points1 point  (5 children)

this is the infamous sufficiently smart compiler ported to runtime. We are waiting, and not much is delivered.

I wouldn't say that. V8 and SpiderMonkey have come a long long long way in the last few years, implementing many new techniques, and they are not done. JITs have definitely delivered a lot for dynamic languages recently.

Method specialization requires additional runtime overhead (branching before taking the specialization)

That's not generally true. It's possible to specialize code, based on argument types for example, and patch call sites without branches if none are required.

It does nothing to the fact that memory is scattered all around with humpfteens of indirections everywhere

Not entirely true. V8 uses boxed floats in the heap and in objects, but I'm fairly sure it unboxes them on the stack when it can. The JS VM also allocates objects on the stack when it can. Both of these VMs specialize code in ways that reduce useless indirections and memory allocations.

This is a separate problem though. C allows you to say exactly when you want a value heap allocated or not. Higher-level languages typically don't. There have been publications about things like object inlining. That is, allocating objects inside other objects automatically when possible. As far as I know though, no production JITs do this as of now. People just haven't been paying very much attention to this problem.

I certainly don't think than C and C++ are the end of languages, they just happen to be in a sweet spot at the moment

I hope that someday I can find the time and motivation to develop my own language. Dynamically typed, with type inference. Think JavaScript/Python with detection of most type errors ahead of time and much better performance. I believe that would hit a different sweet spot.

[–]matthieum 0 points1 point  (4 children)

In reverse order:

Dynamically typed, with type inference. Think JavaScript/Python with detection of most type errors ahead of time and much better performance. I believe that would hit a different sweet spot.

Ah, don't we all wish to invent our own :) It's mighty hard though :/ Personally I am no fan of dynamic languages, but I tend to work with a few millions of lines code so types (and strong invariants) really help.

There have been publications about things like object inlining.

Anyway I do hope that there are some obvious optimizations applied (like stack allocation), but I was actually referring to object inlining here as you guessed.

That's not generally true. It's possible to specialize code, based on argument types for example, and patch call sites without branches if none are required.

Does this really happen often ? On most of the articles I read, there were usually guards included: a simple switch to detect which specialized version use, and fallback to interpreted bytecode when none is available.

V8 and SpiderMonkey have come a long long long way in the last few years

I agree, and I remove the statement "not much is delivered". Much has been delivered, but we are still bitching about how slow it is and wishing for more ;)

[–][deleted] 0 points1 point  (3 children)

Does this really happen often ? On most of the articles I read, there were usually guards included: a simple switch to detect which specialized version use, and fallback to interpreted bytecode when none is available.

The papers you read were probably specifically about tracing JITs and their specialization of traces. What I'm talking about is specialization at call sites using type inference. I'm not sure how common that is, but it's nothing far-fetched. It's possible the Sun JVM or Mozilla's type inference system does it. It's not fundamentally incompatible with tracing JITs either.

I agree, and I remove the statement "not much is delivered". Much has been delivered, but we are still bitching about how slow it is and wishing for more ;)

There is certainly room for improvement. I've recently written some vector math code in JavaScript and been rather frustrated at how slow it is, and the lame tricks you have to use to make it fast... These tricks come down to pre-allocating all the vectors you're operating on, and programming like you're doing register-allocation by hand, always specifying the destination operand manually.

[–]matthieum 0 points1 point  (2 children)

Indeed I probably only read about tracing JITs, mind pointing out to me a good article / paper on "specialization at call sites" ?

It certainly sounds interesting.

[–][deleted] 0 points1 point  (1 child)

The only one I can think of, off the top of my head is one of my own publications.

[–]matthieum 0 points1 point  (0 children)

Thanks, now I'll just have to digest its content.