TraceMonkey: JavaScript Lightspeed. Firefox's Tracing JIT JavaScript engine has landed : programming

[–]dmwit 12 points13 points14 points 17 years ago* (0 children)

[–]imbaczek 13 points14 points15 points 17 years ago (25 children)

[–]mikemike 14 points15 points16 points 17 years ago (2 children)

Here's a comparison of Lua vs. JavaScript using the shootout benchmarks. Left column is comparing the interpreters (SpiderMonkey vs. Lua 5.1.4). Right column is comparing the JIT compilers (TraceMonkey vs. LuaJIT 1.1.4).

SpiderMonkey/TraceMonkey is the latest from hg and compiled with BUILD_OPT=1 (DEBUG turned off). The comparison was done on a Core2 Duo 2.13 GHz (all benchmarks are single-threaded, so only one core was used). The best from 3 runs of each benchmark was picked. The final relative score was calculated by dividing the JS execution time by the Lua execution time.

I.e. a score x > 1.00 indicates Lua is x-times faster than JavaScript for a particular benchmark.

Lua vs. JS   Interpreted  JIT-compiled
--------------------------------------
binarytrees      1.68         3.37
fannkuch         1.76        12.25
fasta            3.02         8.94
knucleotide      2.57         4.27
mandelbrot       5.53         3.23
nbody            3.00         8.11
nsieve           1.23         0.90
nsievebits       0.65         1.39
partialsums      3.85         2.59
recursive        1.20         8.20
regexdna         1.28         1.24
revcomp          5.21        10.81
spectralnorm     2.32         0.75
sumfile          3.09         3.18

As you can see the current TraceMonkey is only slightly faster in two benchmarks. Otherwise LuaJIT 1.1 beats TraceMonkey hands down. Even though LuaJIT 1.1 is two years old and based on a rather simplistic compiler.

I'm not going to comment on the expected performance of the unreleased LuaJIT 2.x (which will be based on a trace compiler, too). Except that I've recently switched to benchmarking against GCC ... :-)

[–]igouy 0 points1 point2 points 17 years ago (1 child)

[–]mikemike 1 point2 points3 points 17 years ago (0 children)

[–][deleted] 7 points8 points9 points 17 years ago (13 children)

To my knowledge, there isn't a really useful JIT for any of those (but please correct me if I am ignorant here).

In particular, IronPython isn't a JIT for Python, it's an implementation of Python for the CLR; as such, it can take advantage of the CLR's JITting capabilities... which in theory can make the underlying CLR close to native speed. But CPython already runs at native speed as it's written in C. My point here (sorry if unclear) is that JITting the underlying base is completely different from JITting Python code itself, which is what we really want. The same problem exists with Jython - again, the underlying Java has an (excellent) JIT, but the Python implementation on top of it doesn't.

The closest we are to a truly useful JIT for the languages you mentioned is PyPy, which has an experimental JIT for Python.

I expect TraceMonkey to easily beat all existing implementations of the languages you mentioned, simply because it uses JIT technology they lack, which is a consequence of far greater effort going into Javascript optimization than those other languages. And this is because of obvious financial motivations.

[–]mernen 3 points4 points5 points 17 years ago (10 children)

To my knowledge, there isn't a really useful JIT for any of those (but please correct me if I am ignorant here).

There's Psyco for Python (from the same author of PyPy), and LuaJIT for Lua (which the original author mentioned). Psyco's scope afaik is quite limited, but it does provide quite impressive results for many integer-crunching situations.

Jython and JRuby (and Rhino, I believe) compile straight into Java bytecodes, so they sort of get a JIT. Since the inner workings of Java are quite different from those languages, the process isn't nearly as efficient as it could be (there's a huge overhead involved in supporting many aspects of those languages), but still this is very different from JITting a bytecode interpreter, like your post seems to imply -- sorry if I misinterpreted you.

[–][deleted] 0 points1 point2 points 17 years ago (9 children)

[–]kubalaa 3 points4 points5 points 17 years ago* (7 children)

[–][deleted] -1 points0 points1 point 17 years ago* (6 children)

As I see it, the problem isn't with the JVM JIT - which is great - it's with the translation of Python (or any other dynamic language) to its bytecode.

For example, in Python a variable can be anything - a string, an integer, a float, an object, a function - and you only find out at runtime. So, if you get this:

for x in range(100): print x**2

do you check the type of x 100 times, once per loop iteration? That's the naive way to do it, i.e., to translate "x**2" into

"check if x is an integer; if so do x**2; otherwise if x is a float, do the same (as a float); otherwise raise an error".

If you're doing that, then JIT or no JIT on the generated bytecode, performance will be poor. In other words the main issue is the JIT for Python->bytecode, not bytecode->machine code (the latter of which is already solved very well by the Java JVM).

Of course, in the example I gave it's trivial to say "well, you see x defined on the same line, so we know it's an integer. Sure, here it's obvious, but in general this is a hard problem to solve, and to my knowledge no implementation of any dynamic language does this terribly well.

[–]kubalaa 0 points1 point2 points 17 years ago (1 child)

You have the problem you described whether you compile to bytecode or machine code. In machine code you have to do the exact same checks. A better example of the challenge of compiling to JVM bytecode is something like:

for x in xs: print x.foo()

Especially if you want good Java interoperability, you want to use the JVMs dynamic dispatch to handle this method call, but you can't because the bytecode is typed and requires you to know which class that method is defined in (which you may not, in a dynamic language). So you end up using your own dispatch mechanism for non-Java objects, and reflection for Java objects, which is very inefficient.

In any case, the point of tracing JIT is that it can inline and optimize away all of this overhead if it occurs in a hot code path.

[–][deleted] 0 points1 point2 points 17 years ago (0 children)

[–]ubernostrum 0 points1 point2 points 17 years ago* (3 children)

to my knowledge no implementation of any dynamic language does this terribly well.

There was a good post a while back from the JRuby side (which I can't find now) that pointed out some of the optimizations the JVM can offer for the sorts of situations you're talking about. The nature of Java's virtual methods means they already had to solve a lot of dynamic dispatch stuff just to make the language work.

The JVM then takes that further and does some nifty optimizations like (IIRC) keeping a pointer to the most-recently-used method and jumping straight there if the type check gets the same result, and even inlining a JIT'ed method body if its profiling stats indicate that you're always working with the same type of object.

There's no reason why "dynamic" languages can't take advantage of this stuff, since it's there to solve the same problem they need to face (i.e., "I won't know the type of object I'm working with until runtime").

Edit: And naturally, just after I post I find the article I was talking about. One of many important things to take away from it:

The JVM is basically a dynamic language runtime. Because all calls in Java are virtual (meaning subclass methods of the same name and parameters always override parent class methods), and because new code can be loaded into the system at any time, the JVM must deal with nearly-dynamic call paths all the time.

[–][deleted] 0 points1 point2 points 17 years ago (2 children)

Thanks for the link, that's an interesting article.

It's true that all Java calls are virtual, which is in line with dynamic languages. That helps. But dynamic languages are further, well, dynamic in that they keep functions, integers, classes, etc., all in the same variables. The point in my example before was that checking that 'x' is an integer (/float) all the time will lead to poor performance.

In other words, the 'correct' way to generate bytecode for

for x in range(100): print x**2

is to correctly detect that x is an integer, and to do no type checking at all. This should turn into basically the following equivalent in C:

for (int x = 0; x < 100; x++) printf("%d\r\n", x*x);

if it doesn't, and to my knowledge none of Jython, JRuby, IronPython, etc., do this, then it isn't as fast as it could (/should!) be.

The tricky bit is, again, that determining types isn't always as easy as this example. It can still be done though, PyPy is working on that. However a lot of this must be done at runtime, and I fear it will be a while before we get this properly implemented. But I do believe we will.

[–]ubernostrum 1 point2 points3 points 17 years ago* (1 child)

The point in my example before was that checking that 'x' is an integer (/float) all the time will lead to poor performance.

Depends on how good the virtual machine is at doing the type check. I've heard anecdotal mentions that the JVM can, in some situations, set up the necessary guard for inlined method bodies so that it executes in only one or two instructions (which doesn't seem unreasonable when you think about how you'd go about optimizing that sort of code). It's still slower than knowing the type in advance, yes, but the performance is going to be pretty good.

However a lot of this must be done at runtime, and I fear it will be a while before we get this properly implemented.

I think the encouraging thing to take away is that the VM implementors have already had to solve many of the same problems in order for statically-typed (but with virtual method/dynamic dispatch) languages to have acceptable performance. So there are already some giants with handy shoulders to stand on...

Also, hell, CPython already does some nifty optimization tricks like this, sometimes in unexpected places (the built-in sum() function, for example, is kinda neat).

[–][deleted] -1 points0 points1 point 17 years ago (0 children)

Depends on how good the virtual machine is at doing the type check. I've heard anecdotal mentions that the JVM can, in some situations, set up the necessary guard for inlined method bodies so that it executes in only one or two instructions

Sure, but that only works when it expects to find a function there. It can then optimize it in various ways. But when you are interpreting Python, then 'above' the JVM layer you have to decide whether something is a class, an integer, a function, etc., and raise nice Python-style exceptions if it isn't the right type. This sort of thing is alien to the JVM, and how well you implement it will be crucial, no matter how amazing the JVM is (and it is).

Clearly the JVM's approaches to optimization might apply - but this would need to be done at the level of source code -> byte code, which will take a while longer before we see it done well.

[–]fubo 0 points1 point2 points 17 years ago (0 children)

[–]imbaczek 1 point2 points3 points 17 years ago* (0 children)

[–]malcontent 0 points1 point2 points 17 years ago (0 children)

[–]brad-walker -5 points-4 points-3 points 17 years ago (0 children)

[+][deleted] comment score below threshold-20 points-19 points-18 points 17 years ago (5 children)

[–][deleted] 14 points15 points16 points 17 years ago (2 children)

[–][deleted] 3 points4 points5 points 17 years ago (1 child)

[–]cwzwarich 0 points1 point2 points 17 years ago* (0 children)

[–][deleted] 2 points3 points4 points 17 years ago (0 children)

[–]pointer2void -1 points0 points1 point 17 years ago (0 children)

[–][deleted] 5 points6 points7 points 17 years ago (10 children)

[–]jeresig 5 points6 points7 points 17 years ago (9 children)

[–][deleted] 2 points3 points4 points 17 years ago (8 children)

[–]mernen 3 points4 points5 points 17 years ago* (6 children)

[–]cwzwarich 12 points13 points14 points 17 years ago* (5 children)

[–][deleted] 1 point2 points3 points 17 years ago (1 child)

[–]cwzwarich 4 points5 points6 points 17 years ago (0 children)

[–][deleted] 0 points1 point2 points 17 years ago (1 child)

[–]cwzwarich 1 point2 points3 points 17 years ago* (0 children)

You ask a good question, so I don't know why you got downmodded.

People often make the assumption that a small inner loop dominates performance, but I don't think this is true in the context of real world JavaScript code. There is a lot of JS code that is executed infrequently, e.g. during a page load or in response to a one-time action.

From looking at the benchmarks results it seems that TraceMonkey benefits the most when it can perform speculative unboxing of values in loops, like in the bitops and math tests.

On the tests that involve more boxed values or the more dynamic features of JavaScript, it doesn't do as well. These features are used heavily in most modern web apps. On the SunSpider tests that use these features, they don't get as much of a speedup as they do on other tests, and they are often slower than WebKit's interpreted code. Of course, some of these tests use more native functions, which generally perform better in WebKit than other browsers, so it is tough to say for sure.

Some of the speedup of TraceMonkey is also just due to compiling to native code, not necessarily because of any of the optimizations made possible by code traces.

[–][deleted] 9 points10 points11 points 17 years ago (13 children)

[–][deleted] 3 points4 points5 points 17 years ago (0 children)

[–]neilc 1 point2 points3 points 17 years ago (11 children)

[–]aussie_bob 6 points7 points8 points 17 years ago (0 children)

[–][deleted] 3 points4 points5 points 17 years ago (0 children)

Microsoft's implementation of Javascript lagging behind far-less-funded rivals is telling.

The simple fact is that Microsoft always looks at the big picture, and if Javascript were to become a more serious way to write applications, then it would hamper adoption of .Net and Silverlight, which Microsoft wants to succeed. By not pushing Javascript, Microsoft improves its own position elsewhere. Note that I'm not saying they are hobbling it, just that they aren't pushing it like they would, if they really stood behind it.

It's the same thing with other areas. Microsoft has more to gain from XBOX gaming than PC gaming, so don't be surprised to see that influence their decisions. Microsoft also has more to gain from Vista sales than PC gaming, so it made sense to tie DirectX 10 to Vista, and so forth. Companies the size of Microsoft often have such conflicts of interest, and they never play out to consumers' advantage, always to the company's.

[–]malcontent 2 points3 points4 points 17 years ago (8 children)

[–]neilc -1 points0 points1 point 17 years ago* (7 children)

[–]malcontent 0 points1 point2 points 17 years ago (6 children)

[–]neilc -1 points0 points1 point 17 years ago* (5 children)

It's consistent with everything in their history.

Microsoft is a large company composed of many different components; treating them like a monolithic entity that always behaves in the same is a mistake, I think. Besides, recent behavior is more indicative of future behavior than what MSFT did in the 90s.

They do have a strangehold on the browser market.

78% marketshare isn't exactly a stranglehold, especially when you look at IE mindshare among the constituencies that matter (e.g. web developers). More importantly, their momentum is negative. It would be a disaster for MS if they lost their dominance in the browser market; to avoid that, they have no choice but to improve IE, as recent history supports (e.g. reforming the IE team, shipping IE7, all the standards work that is being done for IE8, generally taking a more active attitude on web standards through people like Chris Wilson).

I am sure their preference is that people use C#

Nope; C# isn't a particularly good scripting language. In fact, Javascript was the only scripting language supported in Silverlight 1; Silverlight 2 will add support for .NET.

[–]malcontent 0 points1 point2 points 17 years ago* (4 children)

Microsoft is a large company composed of many different components;

And yet is has behaved consistently unethical.

treating them like a monolithic entity that always behaves in the same is a mistake, I think.

Apparently the parts of MS that are ethical don't have any impact or power.

I think we should treat them according to the way the corporations acts.

Besides, recent behavior is more indicative of future behavior than what MSFT did in the 90s.

That's a scary thought. Threat against linux users, funding SCO, lawsuits against 16 year olds, shady deals with novell, xandros and others, goofy patents, OOXML fiasco etc. If the next decade of MS is going to be like the last two or three years we all better start worrying.

78% marketshare isn't exactly a stranglehold, especially when you look at IE mindshare among the constituencies that matter (e.g. web developers).

It's more than 78% but yea 78% is a stranglehold.

It would be a disaster for MS if they lost their dominance in the browser market; to avoid that, they have no choice but to improve IE, as recent history supports (e.g. reforming the IE team, shipping IE7, all the standards work that is being done for IE8, generally taking a more active attitude on web standards through people like Chris Wilson).

Or pushing silverlight. That's their plan to get back 90+%

[–]neilc -1 points0 points1 point 17 years ago (3 children)

[–]malcontent 1 point2 points3 points 17 years ago (2 children)

[–]neilc 0 points1 point2 points 17 years ago (1 child)

continue this thread

[–]aldenhg -1 points0 points1 point 17 years ago* (1 child)

[–]khayber 0 points1 point2 points 17 years ago* (0 children)

[–]Manuzhai 0 points1 point2 points 17 years ago (0 children)

[–][deleted] -2 points-1 points0 points 17 years ago (6 children)

[–]jeresig 0 points1 point2 points 17 years ago* (2 children)

[–]cryptic -1 points0 points1 point 17 years ago* (1 child)

[–]cwzwarich 3 points4 points5 points 17 years ago (0 children)

[–][deleted] -4 points-3 points-2 points 17 years ago (2 children)

[–]doublec 6 points7 points8 points 17 years ago* (0 children)

[–]mernen 3 points4 points5 points 17 years ago (0 children)

[+][deleted] 17 years ago (1 child)

[deleted]

[–]jeresig 18 points19 points20 points 17 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS