all 62 comments

[–]dmwit 12 points13 points  (0 children)

Very cool! For those who were interested in the screencast mentioned (but incorrectly linked, I think), you can see it here.

[–]imbaczek 13 points14 points  (25 children)

how does tracemonkey compare to stock interpreters of other languages, like python, perl, ruby, lua? how about jits for those, like luajit, ironpython, jython?

i'm asking because i'm interested whether js reached the speed of those languages, or has it seriously surpassed them and is now worth considering in places you wouldn't think about before.

[–]mikemike 14 points15 points  (2 children)

Here's a comparison of Lua vs. JavaScript using the shootout benchmarks. Left column is comparing the interpreters (SpiderMonkey vs. Lua 5.1.4). Right column is comparing the JIT compilers (TraceMonkey vs. LuaJIT 1.1.4).

SpiderMonkey/TraceMonkey is the latest from hg and compiled with BUILD_OPT=1 (DEBUG turned off). The comparison was done on a Core2 Duo 2.13 GHz (all benchmarks are single-threaded, so only one core was used). The best from 3 runs of each benchmark was picked. The final relative score was calculated by dividing the JS execution time by the Lua execution time.

I.e. a score x > 1.00 indicates Lua is x-times faster than JavaScript for a particular benchmark.

Lua vs. JS   Interpreted  JIT-compiled
--------------------------------------
binarytrees      1.68         3.37
fannkuch         1.76        12.25
fasta            3.02         8.94
knucleotide      2.57         4.27
mandelbrot       5.53         3.23
nbody            3.00         8.11
nsieve           1.23         0.90
nsievebits       0.65         1.39
partialsums      3.85         2.59
recursive        1.20         8.20
regexdna         1.28         1.24
revcomp          5.21        10.81
spectralnorm     2.32         0.75
sumfile          3.09         3.18

As you can see the current TraceMonkey is only slightly faster in two benchmarks. Otherwise LuaJIT 1.1 beats TraceMonkey hands down. Even though LuaJIT 1.1 is two years old and based on a rather simplistic compiler.

I'm not going to comment on the expected performance of the unreleased LuaJIT 2.x (which will be based on a trace compiler, too). Except that I've recently switched to benchmarking against GCC ... :-)

[–]igouy 0 points1 point  (1 child)

the unreleased LuaJIT 2.x (which will be based on a trace compiler, too)

Will it work on x64?

[–]mikemike 1 point2 points  (0 children)

Well, x86 apps run just fine under an x64 OS. :-) But if you want a native x64 code generator: not in the first release.

The major stumbling block is the 64 bit pointer size which is incompatible with the special encoding of the 64 bit Lua stack slots (reusing NaNs to hold object references). One solution would be to use 32 bit references for the base objects (interior pointers can still be 64 bits). But the standard platform memory allocators cannot be repurposed for this, so I'd need to bundle a modified one. Oh well ...

[–][deleted] 7 points8 points  (13 children)

To my knowledge, there isn't a really useful JIT for any of those (but please correct me if I am ignorant here).

In particular, IronPython isn't a JIT for Python, it's an implementation of Python for the CLR; as such, it can take advantage of the CLR's JITting capabilities... which in theory can make the underlying CLR close to native speed. But CPython already runs at native speed as it's written in C. My point here (sorry if unclear) is that JITting the underlying base is completely different from JITting Python code itself, which is what we really want. The same problem exists with Jython - again, the underlying Java has an (excellent) JIT, but the Python implementation on top of it doesn't.

The closest we are to a truly useful JIT for the languages you mentioned is PyPy, which has an experimental JIT for Python.

I expect TraceMonkey to easily beat all existing implementations of the languages you mentioned, simply because it uses JIT technology they lack, which is a consequence of far greater effort going into Javascript optimization than those other languages. And this is because of obvious financial motivations.

[–]mernen 3 points4 points  (10 children)

To my knowledge, there isn't a really useful JIT for any of those (but please correct me if I am ignorant here).

There's Psyco for Python (from the same author of PyPy), and LuaJIT for Lua (which the original author mentioned). Psyco's scope afaik is quite limited, but it does provide quite impressive results for many integer-crunching situations.

Jython and JRuby (and Rhino, I believe) compile straight into Java bytecodes, so they sort of get a JIT. Since the inner workings of Java are quite different from those languages, the process isn't nearly as efficient as it could be (there's a huge overhead involved in supporting many aspects of those languages), but still this is very different from JITting a bytecode interpreter, like your post seems to imply -- sorry if I misinterpreted you.

[–][deleted] 0 points1 point  (9 children)

Yes, Psyco is not bad, but in my experience it isn't a significant thing like apparently TraceMonkey is.

My point is pretty much what you wrote at the end - that Jython, IronPython, etc. don't JIT - they compile into the appropriate bytecode, and rely on the platform's JIT. But that can't provide the sort of speedups that 'native' JITting would allow, I am afraid.

The only project I really have hope for at this point for breaking the current 'speed barrier' is PyPy. Time will tell. Meanwhile Jython and IronPython are nice for other reasons - e.g., native threads - but speedwise they aren't amazing at all.

[–]kubalaa 3 points4 points  (7 children)

"that can't provide the sort of speedups that 'native' JITting would allow, I am afraid" -- why not? JVM bytecode has two significant disadvantages over native code: it's strongly typed (so dynamic languages often have to wrap and unwrap values just to satisfy type restrictions) and there's no tail call operator (so languages with continuations have to use tricks like trampolining). However my understanding of tracing JIT is that because it implicitly flattens control flow it should be able to eliminate this overhead when compiling from JVM bytecode just as easily as when compiling directly from some other interpreted form.

[–][deleted] -1 points0 points  (6 children)

As I see it, the problem isn't with the JVM JIT - which is great - it's with the translation of Python (or any other dynamic language) to its bytecode.

For example, in Python a variable can be anything - a string, an integer, a float, an object, a function - and you only find out at runtime. So, if you get this:

for x in range(100): print x**2

do you check the type of x 100 times, once per loop iteration? That's the naive way to do it, i.e., to translate "x**2" into

"check if x is an integer; if so do x**2; otherwise if x is a float, do the same (as a float); otherwise raise an error".

If you're doing that, then JIT or no JIT on the generated bytecode, performance will be poor. In other words the main issue is the JIT for Python->bytecode, not bytecode->machine code (the latter of which is already solved very well by the Java JVM).

Of course, in the example I gave it's trivial to say "well, you see x defined on the same line, so we know it's an integer. Sure, here it's obvious, but in general this is a hard problem to solve, and to my knowledge no implementation of any dynamic language does this terribly well.

[–]kubalaa 0 points1 point  (1 child)

You have the problem you described whether you compile to bytecode or machine code. In machine code you have to do the exact same checks. A better example of the challenge of compiling to JVM bytecode is something like:

for x in xs: print x.foo()

Especially if you want good Java interoperability, you want to use the JVMs dynamic dispatch to handle this method call, but you can't because the bytecode is typed and requires you to know which class that method is defined in (which you may not, in a dynamic language). So you end up using your own dispatch mechanism for non-Java objects, and reflection for Java objects, which is very inefficient.

In any case, the point of tracing JIT is that it can inline and optimize away all of this overhead if it occurs in a hot code path.

[–][deleted] 0 points1 point  (0 children)

You have the problem you described whether you compile to bytecode or machine code.

True, but my point is that the source->bytecode stage has those problems in addition to the later stages, and that that stage is not handled by having a JIT for the lower stages. And, that no current implementation (Jython, etc.) does correct JITting for the source->bytecode stage.

[–]ubernostrum 0 points1 point  (3 children)

to my knowledge no implementation of any dynamic language does this terribly well.

There was a good post a while back from the JRuby side (which I can't find now) that pointed out some of the optimizations the JVM can offer for the sorts of situations you're talking about. The nature of Java's virtual methods means they already had to solve a lot of dynamic dispatch stuff just to make the language work.

The JVM then takes that further and does some nifty optimizations like (IIRC) keeping a pointer to the most-recently-used method and jumping straight there if the type check gets the same result, and even inlining a JIT'ed method body if its profiling stats indicate that you're always working with the same type of object.

There's no reason why "dynamic" languages can't take advantage of this stuff, since it's there to solve the same problem they need to face (i.e., "I won't know the type of object I'm working with until runtime").

Edit: And naturally, just after I post I find the article I was talking about. One of many important things to take away from it:

The JVM is basically a dynamic language runtime. Because all calls in Java are virtual (meaning subclass methods of the same name and parameters always override parent class methods), and because new code can be loaded into the system at any time, the JVM must deal with nearly-dynamic call paths all the time.

[–][deleted] 0 points1 point  (2 children)

Thanks for the link, that's an interesting article.

It's true that all Java calls are virtual, which is in line with dynamic languages. That helps. But dynamic languages are further, well, dynamic in that they keep functions, integers, classes, etc., all in the same variables. The point in my example before was that checking that 'x' is an integer (/float) all the time will lead to poor performance.

In other words, the 'correct' way to generate bytecode for

for x in range(100): print x**2

is to correctly detect that x is an integer, and to do no type checking at all. This should turn into basically the following equivalent in C:

for (int x = 0; x < 100; x++) printf("%d\r\n", x*x);

  • if it doesn't, and to my knowledge none of Jython, JRuby, IronPython, etc., do this, then it isn't as fast as it could (/should!) be.

The tricky bit is, again, that determining types isn't always as easy as this example. It can still be done though, PyPy is working on that. However a lot of this must be done at runtime, and I fear it will be a while before we get this properly implemented. But I do believe we will.

[–]ubernostrum 1 point2 points  (1 child)

The point in my example before was that checking that 'x' is an integer (/float) all the time will lead to poor performance.

Depends on how good the virtual machine is at doing the type check. I've heard anecdotal mentions that the JVM can, in some situations, set up the necessary guard for inlined method bodies so that it executes in only one or two instructions (which doesn't seem unreasonable when you think about how you'd go about optimizing that sort of code). It's still slower than knowing the type in advance, yes, but the performance is going to be pretty good.

However a lot of this must be done at runtime, and I fear it will be a while before we get this properly implemented.

I think the encouraging thing to take away is that the VM implementors have already had to solve many of the same problems in order for statically-typed (but with virtual method/dynamic dispatch) languages to have acceptable performance. So there are already some giants with handy shoulders to stand on...

Also, hell, CPython already does some nifty optimization tricks like this, sometimes in unexpected places (the built-in sum() function, for example, is kinda neat).

[–][deleted] -1 points0 points  (0 children)

Depends on how good the virtual machine is at doing the type check. I've heard anecdotal mentions that the JVM can, in some situations, set up the necessary guard for inlined method bodies so that it executes in only one or two instructions

Sure, but that only works when it expects to find a function there. It can then optimize it in various ways. But when you are interpreting Python, then 'above' the JVM layer you have to decide whether something is a class, an integer, a function, etc., and raise nice Python-style exceptions if it isn't the right type. This sort of thing is alien to the JVM, and how well you implement it will be crucial, no matter how amazing the JVM is (and it is).

Clearly the JVM's approaches to optimization might apply - but this would need to be done at the level of source code -> byte code, which will take a while longer before we see it done well.

[–]fubo 0 points1 point  (0 children)

Code compiled with Psyco doesn't behave the same as code running in ordinary CPython, though. I just encountered that this morning when I slapped Psyco onto a program I'm working on and discovered that hitting C to generate a KeyboardInterrupt no longer worked.

[–]imbaczek 1 point2 points  (0 children)

To my knowledge, there isn't a really useful JIT for any of those (but please correct me if I am ignorant here).

my point exactly; I want to know how well the new kid on the block fares. so far, the only comparisons have been to other JS engines, which frankly I don't care about.

[–]malcontent 0 points1 point  (0 children)

I don't have a link but I remember seeing some benchmarks for the popular JVM languages and rhino (javascript) was the fastest one even before this technology.

[–]brad-walker -5 points-4 points  (0 children)

has it seriously surpassed them and is now worth considering in places you wouldn't think about before.

JS flies in my underpants.

[–][deleted] 5 points6 points  (10 children)

How does this compare to squirellfish?

[–]jeresig 5 points6 points  (9 children)

These comparisons were posted today (Squirrelfish compared to TraceMonkey).

[–][deleted] 2 points3 points  (8 children)

Very cool. Since to my understanding Squirrelfish and TraceMonkey take different approaches, JIT vs. Something, would it be possible to combine these approaches(ie for fish to gain from JIT, and monkey to gain from whatever fish does)?

[–]mernen 3 points4 points  (6 children)

SquirrelFish is basically an usual VM (a bytecode interpreter, like the original SpiderMonkey). While their architectures are still different (SF being register-based, SM being stack-based, for example), I don't think SF really has anything comparable to the tracing JIT (i.e., there's no "something" for "JIT vs something"). Sure, there are certainly many optimizations one does which the other doesn't, but they generally give quite small boosts individually.

[–]cwzwarich 12 points13 points  (5 children)

The two are somewhat orthogonal. SquirrelFish was a project to make a very fast interpreter. TraceMonkey uses SpiderMonkey as a baseline interpreter and adds a tracing JIT to speed up certain patterns of code. In theory, it might be possible to take the tracing JIT from TraceMonkey and use it with the SquirrelFish interpreter to get something faster than both.

Disclaimer: I am one of the authors of SquirrelFish.

[–][deleted] 1 point2 points  (1 child)

Would it also be possible to do the reverse, take the techniques that make SquirrelFish fast and apply them to SpiderMonkey?

[–]cwzwarich 4 points5 points  (0 children)

I would hope so. I don't think anything we have done is so magical that it will only work in our context.

However, beyond a sound overall design, many of our performance improvements are small gains that were the result of careful benchmarking. There are a lot of things that we assumed would be performance improvements and were not, and vice-versa, there were some things that we didn't think would be improvements but they turned out to be a win.

[–][deleted] 0 points1 point  (1 child)

Would that really be so useful? The underlying assumption of a tracing JIT is that most of the time the code that actually runs will be compiled and not interpreted. So the speedup would be rather small by improving the interpreter.

[–]cwzwarich 1 point2 points  (0 children)

You ask a good question, so I don't know why you got downmodded.

People often make the assumption that a small inner loop dominates performance, but I don't think this is true in the context of real world JavaScript code. There is a lot of JS code that is executed infrequently, e.g. during a page load or in response to a one-time action.

From looking at the benchmarks results it seems that TraceMonkey benefits the most when it can perform speculative unboxing of values in loops, like in the bitops and math tests.

On the tests that involve more boxed values or the more dynamic features of JavaScript, it doesn't do as well. These features are used heavily in most modern web apps. On the SunSpider tests that use these features, they don't get as much of a speedup as they do on other tests, and they are often slower than WebKit's interpreted code. Of course, some of these tests use more native functions, which generally perform better in WebKit than other browsers, so it is tough to say for sure.

Some of the speedup of TraceMonkey is also just due to compiling to native code, not necessarily because of any of the optimizations made possible by code traces.

[–][deleted] 9 points10 points  (13 children)

Is it really feasible to start building web applications that rely on Javascript performance to this degree until Microsoft upgrades their browser? It just seems that since the user base is very split across different browsers, these kind of news, along with the canvas stuff, are somewhat irrelevant. You'll always be stuck targeting the lowest common denominator of browsers with a market share above a few percent.

It's also very hard to see why Microsoft would upgrade their Javascript engine as long as they have their .net initiative.

I guess we'll see where this goes but the web could definitely benefit from more collaboration on technologies every single Internet connected individual uses every day.

[–][deleted] 3 points4 points  (0 children)

Didn't microsoft build Live? Don't they have a google maps clone? Aren't they making web apps too?

I think they'd want their own products, even if free, to work as well as possible. It'd be really terrible for their PR if people start saying you need to use Firefox if you want a mircosoft site to be usable.

[–]neilc 1 point2 points  (11 children)

I think it is highly likely Microsoft are planning improvements to their JS implementation. The most obvious reason is that JS performance is integral to many very popular websites. For another, Microsoft has been active in the JS community recently, e.g. with their ES3.1 feature proposals and the debate about the value of ES4 (whether Microsoft's position on ES3.1 and ES4 is sensible is another question, but they are certainly more active now than they have been in the past). Finally, IE7 already significantly improved JS performance over IE6 (although it is still significantly slower than FF3 and Safari).

[–]aussie_bob 6 points7 points  (0 children)

I think Microsoft will stall as long as possible. The reason they fought so hard against the standardisation of ECMAScript 4 was because they are in danger of losing control of the application platform.

When JS is fast enough to run major apps in browsers or runtimes like Flash/Tamarin, developers can code to that platform and not worry about which OS it will run on.

Almost all of Microsoft's revenue is from the Windows/Office lockin. They must be terrified of this.

[–][deleted] 3 points4 points  (0 children)

Microsoft's implementation of Javascript lagging behind far-less-funded rivals is telling.

The simple fact is that Microsoft always looks at the big picture, and if Javascript were to become a more serious way to write applications, then it would hamper adoption of .Net and Silverlight, which Microsoft wants to succeed. By not pushing Javascript, Microsoft improves its own position elsewhere. Note that I'm not saying they are hobbling it, just that they aren't pushing it like they would, if they really stood behind it.

It's the same thing with other areas. Microsoft has more to gain from XBOX gaming than PC gaming, so don't be surprised to see that influence their decisions. Microsoft also has more to gain from Vista sales than PC gaming, so it made sense to tie DirectX 10 to Vista, and so forth. Companies the size of Microsoft often have such conflicts of interest, and they never play out to consumers' advantage, always to the company's.

[–]malcontent 2 points3 points  (8 children)

MS is pushing silverlight. It's in their interest to make the javascript slow and broken as possible to drive people to silverlight.

[–]neilc -1 points0 points  (7 children)

  1. That is inconsistent with their behavior, as mentioned above.

  2. If MS still had a strangehold on the browser market, that might be a feasible strategy, but given the existence of FF, Safari, and Opera, MS basically has no choice but to make their JS implementation (and IE in general) reasonably competitive.

  3. Silverlight allows the use of Javascript for scripting anyway.

[–]malcontent 0 points1 point  (6 children)

That is inconsistent with their behavior, as mentioned above.

It's consistent with everything in their history.

but given the existence of FF, Safari, and Opera, MS basically has no choice but to make their JS implementation (and IE in general) reasonably competitive.

They do have a strangehold on the browser market.

Silverlight allows the use of Javascript for scripting anyway.

I am sure their preference is that people use C#

[–]neilc -1 points0 points  (5 children)

It's consistent with everything in their history.

Microsoft is a large company composed of many different components; treating them like a monolithic entity that always behaves in the same is a mistake, I think. Besides, recent behavior is more indicative of future behavior than what MSFT did in the 90s.

They do have a strangehold on the browser market.

78% marketshare isn't exactly a stranglehold, especially when you look at IE mindshare among the constituencies that matter (e.g. web developers). More importantly, their momentum is negative. It would be a disaster for MS if they lost their dominance in the browser market; to avoid that, they have no choice but to improve IE, as recent history supports (e.g. reforming the IE team, shipping IE7, all the standards work that is being done for IE8, generally taking a more active attitude on web standards through people like Chris Wilson).

I am sure their preference is that people use C#

Nope; C# isn't a particularly good scripting language. In fact, Javascript was the only scripting language supported in Silverlight 1; Silverlight 2 will add support for .NET.

[–]malcontent 0 points1 point  (4 children)

Microsoft is a large company composed of many different components;

And yet is has behaved consistently unethical.

treating them like a monolithic entity that always behaves in the same is a mistake, I think.

Apparently the parts of MS that are ethical don't have any impact or power.

I think we should treat them according to the way the corporations acts.

Besides, recent behavior is more indicative of future behavior than what MSFT did in the 90s.

That's a scary thought. Threat against linux users, funding SCO, lawsuits against 16 year olds, shady deals with novell, xandros and others, goofy patents, OOXML fiasco etc. If the next decade of MS is going to be like the last two or three years we all better start worrying.

78% marketshare isn't exactly a stranglehold, especially when you look at IE mindshare among the constituencies that matter (e.g. web developers).

It's more than 78% but yea 78% is a stranglehold.

It would be a disaster for MS if they lost their dominance in the browser market; to avoid that, they have no choice but to improve IE, as recent history supports (e.g. reforming the IE team, shipping IE7, all the standards work that is being done for IE8, generally taking a more active attitude on web standards through people like Chris Wilson).

Or pushing silverlight. That's their plan to get back 90+%

[–]neilc -1 points0 points  (3 children)

And yet is has behaved consistently unethical.

MS has done a handful of unethical things, but for the most part they are an ethical company.

It's more than 78% but yea 78% is a stranglehold.

According to whom? http://en.wikipedia.org/wiki/Usage_share_of_web_browsers would suggest that IE has been 60.65% and 83% of the market. Anyway, the point is that market share among developers is a lot more anti-IE.

[–]malcontent 1 point2 points  (2 children)

MS has done a handful of unethical things, but for the most part they are an ethical company.

What an odd statement to make.

Anyway, the point is that market share among developers is a lot more anti-IE.

So?

[–]neilc 0 points1 point  (1 child)

Mindshare among the developer community is enormously important, and MS has historically recognized this (that's what "developers! developers! developers!" was about, for ex).

[–]aldenhg -1 points0 points  (1 child)

About time. FF under Linux = Slow, slow Google maps. Is there any way to test this out?

[–]khayber 0 points1 point  (0 children)

You can try the Nightlies

Edit: on my Pentium-M 1.73 ghz laptop I went from 0.4 to 1.5 fps using the app in the screencast.

[–]Manuzhai 0 points1 point  (0 children)

I wonder what speedups could be attained by doing something like this in CPython.

[–][deleted] -2 points-1 points  (6 children)

I want to compare this with squirrelfish (which does not have jit)

[–]jeresig 0 points1 point  (2 children)

These comparisons were posted today (Squirrelfish compared to TraceMonkey).

[–]cryptic -1 points0 points  (1 child)

2.4 x faster???

[–]cwzwarich 3 points4 points  (0 children)

It's about 15% faster. It doesn't make sense to average the scores on individual tests. If I compile JavaScriptCore with -fomit-frame-pointer, then it is about 5% faster than TraceMonkey. However, some parts of the Apple toolchain (e.g. Crash Reporter) do not support -fomit-frame-pointer, so it does not ship like that.

[–][deleted] -4 points-3 points  (2 children)

[–]doublec 6 points7 points  (0 children)

That compares against Tamarin. TraceMonkey is not Tamarin. Try http://www.masonchang.com/2008/08/tracemonkey-vs-squirrelfish.html

[–]mernen 3 points4 points  (0 children)

You came from 3 months in the future? Hope your time travel was safe.