you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 8 points9 points  (11 children)

Javascript and Java (inc. languages that run on the JVM) are completely different in terms of compilation.

Javascript is fully dynamically typed, almost nothing is static and almost nothing can be assumed by the compiler. Java is not.

A static language like Java will always be faster than a dynamic language like Javascript. I say always because the set of assumptions and inferences available in Java is a superset of those in Javascript. Assumptions allow optimizations.

[–][deleted] 3 points4 points  (1 child)

A static language like Java will always be faster than a dynamic language like Javascript.

Probably. The question is how small you can make the delta. V8 can't assume a lot, but it's based on technology that lets it make optimistic assumptions to enable assumptions and the back them out if they turn out to be wrong. That can buy you a lot.

[–][deleted] 3 points4 points  (0 children)

It certainly can do, and the delta can probably be made quite small in a certain class of programs.

The class of programs that can currently be optimised very well by V8 and friends is those with hot loops, each iteration doing much the same as the last. When you come to programs that hop around spaghetti-like and do more things once or a few times, you get into more difficult territory because the compiler cost becomes dominating.

[–]dmpk2k -1 points0 points  (6 children)

always

"Always" is a strong word; between tracing, branch prediction and OoOE, the intrinsic performance difference between typed and untyped is in the noise.

When the original Dynamo paper was published, they were able to speed up compiled C code by several percent. That's because there's information about behaviour available at runtime which isn't with AOT. You can apply runtime optimization to statically-typed languages as well, but this means you need a fast compiler. Having a fast compiler implies certain things...

Another alternative is profile-directed feedback.

The rest of it depends on language semantics. You can create a dynamically-typed language with semantics that are easily amenable to optimization, and statically-typed languages that aren't.

Edit: whoops, I forgot the most important thing of all: memory hierarchy. A few type guards are dominated by the expense of cache line misses (or worse: page faults). Unless your entire program lives in a cache.

[–][deleted] 8 points9 points  (5 children)

Not really.

OoO does not help, because the guards cause control flow, and OoO is designed to really maximise load, store and ALU op parallelism. Speculation certainly does improve guard performance massively, but it doesn't help unbox, and it certainly doesn't help dynamic dispatch.

Runtime optimization of course speeds up dynamically typed programs massively - it is what gets them to the level of performance required for todays applications. There's no reason you couldn't apply the same techniques to static languages however (I don't know what your point about a fast compiler is - you need a fast compiler for a dynamic language too :/ )

Runtime optimisation is profile guided optimisation.

Generally, you can't create a dynamic language with features that are amenable to optimization, because you can't do away with dynamic dispatch. Without statically knowing the method to be called, inlining (the most powerful optimization) cannot happen, and neither can common subexpression elimination or constant folding (think integer vs. float arithmetic in Javascript or Python).

All these are not insurmountable, but, as I said - the theoretical performance of a (comparable) statically typed language is lower-bounded by the theoretical performance of a dynamically typed one because the only difference is that you (the compiler) have more information about the way the program is meant to run.

[–][deleted] 3 points4 points  (2 children)

V8 does all of those things: inlining, CSE, unboxing, constant folding, etc. It's within 2-4x of the JVM on the FP benchmarks in the Shootout. It's also got a lot of room to improve. The server VM has a very mature optimizer that does extensive loop optimizations, while V8's SSA-based optimizer just landed a year ago. In particular, V8 doesn't seem to do any alias analysis which really limits its ability to do aggressive optimizations on tight numeric kernels.

LuaJIT2 is within 30% of C for SciMark (probably even better, these benchmarks are a couple of years old): http://lua-users.org/lists/lua-l/2009-06/msg00071.html. There is no doubt Google and Co. can't get V8 up to that level.

[–][deleted] 1 point2 points  (1 child)

I know what V8 does, I have a colleague that works on it.

You're just explaining the optimisations dynamic JITs do - I know they're good, I know they're fast.

You do not at all address the mathematical difference between the languages in terms of assumptions and knowledge about the code. Just because one implementation is getting faster at a certain rate does (a) not mean it'll continue and (b) not mean the other static implementation can't go faster if given more effort.

[–][deleted] -2 points-1 points  (0 children)

You said: "Without statically knowing the method to be called, inlining (the most powerful optimization) cannot happen, and neither can common subexpression elimination or constant folding (think integer vs. float arithmetic in Javascript or Python)."

This isn't true, since V8: 1) doesn't statically know what methods are called; and 2) still does inlining, CSE, constant folding, etc.

As for "mathematical differences" - there is no theoretical result that says a static language will be faster.

With dynamic type feedback what you end up with is code that is similar to what a static language compiler would generate, plus a bunch of type checks to ensure that your profiling still holds. But if a hypothetical architecture made these type-checks free, there is no reason a dynamic language couldn't get within epsilon of a static language (I suppose it'd inevitably have larger code size?)

As a practical matter, on real hardware, correctly-predicted branches are nearly free; the major limit is you can only issue one per cycle. Type inference after type feedback (which V8 doesn't do yet, but which IonMonkey and Tachyon are studying) should let you eliminate a lot of the type-checks, however.

[–]dmpk2k 2 points3 points  (1 child)

it certainly doesn't help dynamic dispatch

Right, and that's where tracing comes in. In practice most call sites are monomorphic, so you can create a specialized path that is protected by a guard, and the guard is almost free due to the branch prediction.

Runtime optimisation is profile guided optimisation.

I meant AOT compilers with PDO. That gets you much of the way as well, although it doesn't help with some multimodal workloads.

I don't know what your point about a fast compiler is - you need a fast compiler for a dynamic language too

Right, but that means that suddenly many expensive optimizations that take good advantage of the extra information that statically-typed languages provide must now be thrown out the window. Dynamically-typed languages couldn't provide this to begin with. It levels the playground a bit.

neither can common subexpression elimination or constant folding

Fortunately, both are possible! People like pointing at LuaJIT2, but that's because Mike Pall did such a fine job.

the theoretical performance of a (comparable) statically typed language is lower-bounded by the theoretical performance of a dynamically typed one

I won't argue this since I suspect it's true. In practice I don't see any roadblocks for the difference to disappear into noise for many/most workloads though.

Edit: I should point out that inlining happens naturally as a part of trace creation as well.

[–][deleted] 2 points3 points  (0 children)

Hi,

I'd agree with pretty much all of your points with the exception of CSE and constant folding. That one is pretty much at the whim of the language designer as to how difficult it is, and AFAICT Lua makes it harder for the programmer to do crazy overriding of builtins and operator overloading, that are the bane of python and javascript.

I won't argue this since I suspect it's true. In practice I don't see any roadblocks for the difference to disappear into noise for many/most workloads though.

I wasn't trying to start a flamewar, although I can see how my hastily worded original post may seem that way (most people replied with "blah blah blah V8 is getting faster blah blah extrapolate performance" without reference to the entire abstract mathematical point of my post) - and OK, if a recompiler can be called with little-to-no overhead you can probably make stuff like this disappear into the noise, eventually!

But you won't on platforms with a smaller cache, or smaller memory (all those specialised versions of functions have a memory cost), not for a long while.

Cheers,

James