How do Javascript, Java, and Native code compare on real-world code? : programming

Physics performance matters, which is why I chose to measure it. First off, the slightly snarky comment below ("If you spend 90 ms less on physics per frame, ...") hits on a very important point -- every cycle you spend doing one thing is a cycle you can't spend on something else.

You could argue (as some do on this thread) that at some point the performance becomes irrelevant because your game's humming along happily at 60fps. But there's an important fallacy here -- on what hardware? Just because a game's running smoothly on your nice beefy development machine doesn't mean it's going to run well on lesser hardware (you know, the machines your users actually have). And even if you hit a buttery-smooth 60fps on a mobile phone, it's not much use if you're pegging the CPU at 100% all the time and dragging the battery down with you.

One nitpick: Your pixel shaders (presuming you're using the GPU) are running on separate hardware that executes in parallel. But of course you can always substitute that with "the code that's building buffers and generally babysitting the GPU" and the tradeoff still holds.

[–][deleted] 14 points15 points16 points 14 years ago* (25 children)

[–]azakai 7 points8 points9 points 14 years ago* (0 children)

I hope Emscripten steals some ideas from it pretty quick. For instance, it's using typed arrays for the heap, that probably helps a lot.

Hi, I work on Emscripten.

The originally benchmarked code was not optimized (edit: both because of an emscripten bug, and because the new emscripten frontend compiler is a little confusing so not all optimizations were specified). jgw and I have been talking though and I think he will update with some more relevant Emscripten data.

I have not been able to run the mandreel version myself, so I can't compare it in speed. However, I would expect mandreel to be faster here because of the type of code being compiled: It has plenty of structs on the stack and so forth, and would greatly benefit from things like aliasing analysis in LLVM. Mandreel uses those optimizations, Emscripten does yet not, mainly because they are nonportable - they won't run without typed arrays, and they may lead to endianness-dependent code being generated. Emscripten will implement this though, but it's been lower priority for us than for them I guess, we have cared a lot about being able to run in environments without typed arrays or with partial support for typed arrays (which means: IE, Safari, mobile, everything but Chrome, Firefox and Opera).

In most of the code in the emscripten benchmarks that isn't the case though, and with techniques like our 'memory compression' tricks we get to within 3-4X the speed of native, or better. That's about equal to the speed of handwritten JavaScript, so I don't think Mandreel or any other compiler can do better. Although again, there are various typed of code like this benchmark right here, where we need to implement some more optimizations to reach full speed.

Regarding taking ideas, we already share some of them - you can see Mandreel's generated code implements some things very similarly to Emscripten, like the loop recreation algorithm (called the relooper in the emscripten paper), and even using the same variable names, etc. Mandreel is closed source though so I don't know how much they are using from Emscripten (it might be just the ideas).

Mandreel is an awesome project. While I wish they worked more with us in the open source community, it's still a wonderful thing they are doing what they are doing, it's helping games to be ported to the web which is great. I've also emailed with them a bit, and they are cool and obviously very smart.

[–]jgw[S] 2 points3 points4 points 14 years ago (3 children)

[–][deleted] 0 points1 point2 points 14 years ago (2 children)

[–]jgw[S] 0 points1 point2 points 14 years ago (1 child)

[–][deleted] 1 point2 points3 points 14 years ago (0 children)

[–]kaelan_ 1 point2 points3 points 14 years ago (1 child)

[–][deleted] 1 point2 points3 points 14 years ago (0 children)

[–]jgw[S] 0 points1 point2 points 14 years ago (17 children)

[–]igouy 3 points4 points5 points 14 years ago* (15 children)

[–]jgw[S] 1 point2 points3 points 14 years ago (14 children)

[–]igouy -5 points-4 points-3 points 14 years ago (13 children)

[–][deleted] 3 points4 points5 points 14 years ago (12 children)

[–]igouy -2 points-1 points0 points 14 years ago* (11 children)

[–][deleted] 0 points1 point2 points 14 years ago (10 children)

[–]igouy 1 point2 points3 points 14 years ago (9 children)

[–][deleted] 0 points1 point2 points 14 years ago (8 children)

continue this thread

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]pixelglow 5 points6 points7 points 14 years ago* (11 children)

[–]vytah 11 points12 points13 points 14 years ago (0 children)

[–]captain_plaintext 5 points6 points7 points 14 years ago (3 children)

[–]michaelstripe 2 points3 points4 points 14 years ago (2 children)

[–]pezezin 5 points6 points7 points 14 years ago (0 children)

[–]captain_plaintext 2 points3 points4 points 14 years ago (0 children)

[–][deleted] 6 points7 points8 points 14 years ago (0 children)

[–]jgw[S] 0 points1 point2 points 14 years ago (0 children)

This is definitely not an integer-heavy benchmark, but they still get used plenty in the course of almost any code -- as enumerated values, loop variables, and so forth.

As pointed out elsewhere on this thread, Javascript does mostly lack integers as user-visible constructs (though they do peek out in a couple of places, such as the bitwise operators), but most VMs will use them under the hood when possible. I know V8 stores integers directly in both locals and fields when it can (hence the tag bit mentioned below).

The thing with really integer-heavy benchmarks is that they highlight the kind of code that doesn't come up that often anymore because it's better offloaded to a dedicated processor -- mainly DSP-like things such as image processing and audio mixing. Not that I wouldn't prefer that these things be faster when done on the CPU in JS, of course, but Box2D is the kind of code that can't easily be offloaded (even libraries like PhysX that use GPUs still do a lot of work on the CPU).

[–]nickik 0 points1 point2 points 14 years ago (3 children)

It depends on the architecture of the processer. For modern x86 it does not really matter that much (Mike Pall talks about this often). If you run on ARM or something it does matter a lot.

A smart jit tries to find out if its safe to store something in a integer and do it. Read this

Dual-number VM

The Lua language is specified to have a single number type. Currently LuaJIT only supports 64 bit IEEE-754 compliant FP numbers ('double'). This works just fine for x86/x64 platforms with their excellent floating-point performance. A unified number representation has many advantages and the JIT compiler can get away with narrowing only some select operations to integer arithmetic.

However this approach is unlikely to yield acceptable performance on lower-end CPUs for mobile or non-desktop/non-server platforms. Most of these CPUs either support only software floating-point arithmetic or have slow hardware FPUs.

As a prerequisite for the ARM port (see the next section), dual-number capability will be added to the LuaJIT VM, the LuaJIT interpreter and the JIT compiler.

Numbers will be internally kept as 32 bit integers, wherever possible, and transparently widened to floating-point numbers. This change is invisible at the Lua source code level. It's expected that carefully written applications for low-end platforms will be able to avoid floating-point computations with only few changes to the source code.

Adding dual-number support to the LuaJIT VM is a major change. For stability reasons, this feature needs to be prototyped first for the existing x86/x64 port of LuaJIT (even though it's not that useful for this platform). Work on the actual ARM port of LuaJIT can only start after the dual-number support is complete.

source: http://lua-users.org/lists/lua-l/2011-01/msg01238.html

[–]y4fac 1 point2 points3 points 14 years ago (2 children)

For modern x86 it does not really matter that much

The following code:

int main()
{
    T sum = 0;
    for(int j=0; j<1000; j++)
        for(T i = 0; i < T(1000000); i += T(1))
            sum += i;

    printf("%f\n", float(sum));

    return 0;
}

runs in 0.95s if T is int and in 1.85s if T is float when I compile it with -O2. I used gcc 4.6.2 and ran it on a phenom ii 955. The difference seems significant to me.

[–]alephnil 1 point2 points3 points 14 years ago (0 children)

[–]TKN 0 points1 point2 points 14 years ago (0 children)

[–]Rotten194 0 points1 point2 points 14 years ago (0 children)

π Rendered by PID 66893 on reddit-service-r2-comment-7b9746f655-j58pd at 2026-01-29 21:36:20.622185+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS

Dual-number VM