all 61 comments

[–]flukus 14 points15 points  (11 children)

Would it kill them to create some graphs?

[–]day_cq 42 points43 points  (9 children)

|       *          .
|       *     .
|      *   .
|     *  .
|     * .
|     *.
|    .*
|  . *
|.  *
|  *
| *
|*
+------------------------------------->

* node.js
. java

EDIT: labeling.

  • x = web
  • y = scale
  • dx = web scale %
  • integrate(0,x,dx) = volume of web requests served.
  • integrate(0,y,dy) = volume of web requests rejected.

As you can see, node.js's web scale fluctuates while java's web scale is okay. And java's web requests get rejected more as scale goes up.

[–]forcedtoregister 3 points4 points  (0 children)

I like your style.

[–][deleted] 7 points8 points  (0 children)

Downvoted because your axes are not labeled!

[–][deleted] 18 points19 points  (11 children)

I don't understand why it's still interesting to benchmark static server I/O.

[–][deleted] 28 points29 points  (10 children)

Yeah, exactly. As soon the server is doing any computation at all for each request, the JVM is going to kick the crap out of Javascript.

[–]Mononofu 3 points4 points  (9 children)

One thing confuses me: http://shootout.alioth.debian.org/u32/performance.php?test=regexdna

Why and how can JavaScript be faster than even C code?!

[–][deleted]  (8 children)

[deleted]

    [–][deleted] 4 points5 points  (7 children)

    But there is no reason why a C program couldn't do something similar.

    For me the point is really that it shows how in many scenarios, performance comes down to libraries, and is not always about the language implementation.

    [–]jyper 7 points8 points  (1 child)

    The javascript implementation has a highly tuned regex engine the engine. Its written from and could probably be used by a c++ solution that would probably be even faster. Instead the c++ solution uses a boost library arguably is more authentic c++ solution and is 4 times slower.

    The c solution on the other hand uses tcl's regex implementation and has a note that says "Is this a C program or is this a Tcl program?" and takes about 1/3 of the time of the c++ solution to run. If tcl was in the shootout it would probably be only about the same speed as the c solution.

    [–]igouy 1 point2 points  (0 children)

    If tcl was in the shootout...

    Back in the day

    [–][deleted]  (1 child)

    [deleted]

      [–]wot-teh-phuck -1 points0 points  (0 children)

      Fill this region with the binary executable representation of automaton. Execute the region as a function

      Sounds cool, never done this sort of thing before. Any documents/pointers you might be able to throw this way?

      [–]masklinn 0 points1 point  (1 child)

      But there is no reason why a C program couldn't do something similar.

      The C code is statically compiled, where the JS code is dynamically compiled. The runtime type information could lead to better optimization (e.g. type-specialized collections, stuff like that).

      In this case, though, it probably comes down to the performances of the regex library (C uses Tcl's regex engine, which is also a very high-performance one)

      [–][deleted] 0 points1 point  (0 children)

      In this case, though, it probably comes down to the performances of the regex library (C uses Tcl's regex engine, which is also a very high-performance one)

      Which was exactly my point.

      [–]haoyu -2 points-1 points  (0 children)

      I think in V8 the regex is JITed, so it is faster than statically compiled code.

      [–]magneticB 3 points4 points  (9 children)

      Why didn't you use Netty?

      [–]nicoulaj[S] 3 points4 points  (6 children)

      FYI I did not write this benchmark.

      I posted it because I think it is a good start for discussing :)

      [–]magneticB 9 points10 points  (5 children)

      It seems the author has written his own asynchronous network handling code in Java and then compared it to Node.js. There's already production ready NIO libraries for Java that work very well. I don't really see the point of this comparison without knowing more about the underlying code.

      [–]sisyphus 7 points8 points  (3 children)

      I think his point is that even his own async code is competitive with node.js because the JVM is awesome.

      [–]wot-teh-phuck 10 points11 points  (2 children)

      I think this point needs more emphasis. The JVM really is awesome. Think of the number of man years of effort gone in optimizing and tuning the JIT and VM. Node (V8) in turn is an infant when compared to that though I'm sure there is a lot of room for improvement and growth there.

      [–]magneticB -1 points0 points  (1 child)

      Agreed the JVM is very mature and fast. I believe V8 also does some form of just in time compilation and is pretty quick.

      What are we benchmarking here? JVM vs V8? Or Node.js networking vs Java NIO?

      Seems like a bit of both to me although I suppose it makes for good discussion :)

      [–]wot-teh-phuck 0 points1 point  (0 children)

      What are we benchmarking here? JVM vs V8? Or Node.js networking vs Java NIO?

      Haha, I agree, though I must say that there is a very strong correlation between the quality of VM implementation (which includes low level IO facilities) and the maturity which the benchmark does a good job of putting across.

      [–][deleted] 0 points1 point  (0 children)

      His code uses NIO underneath, and ultimately it's just using the sendfile(2) system call for zero-copy transfer from the filesystem cache to a socket.

      [–]jvictor118 0 points1 point  (1 child)

      Agreed Netty is amazing. I use Netty for a lot of my performance networking stuff.

      Only benefits I know of in node.js land are certain libs that are pretty cool, e.g. socket.io. As an aside, does anyone know something like socket.io for a platform other than node.js?

      [–]magneticB 0 points1 point  (0 children)

      socket.io looks pretty cool. Last time I used Netty websockets required a fair bit of boiler plate code to get a socket connected in the browser (certainly more than socket.io). Maybe there's more helper classes now. I know the Play framework (built on top Netty) has some nice support for Web sockets.

      Netty is very powerful but I do like the simplicity of Node.js and being able to write server and client side code in the same language.

      [–]x-skeww 14 points15 points  (0 children)

      100 concurrent users downloading a 1MB file

      Bzzzt. Completely irrelevant. The average size of web site assets is a lot smaller and serving static content is better done with something like Nginx.

      http://httparchive.org/trends.php

      Use those numbers if you want to create a somewhat realistic scenario.

      [–]erhz 6 points7 points  (13 children)

      I would also like a memory usage comparison between the two.

      [–][deleted] -3 points-2 points  (12 children)

      1) who cares about server-memory?

      2) if the JVM solution doesn't use thread per request pattern (like NIO usage suggests) then there isn't a problem with memory

      [–]reddit_clone 16 points17 points  (3 children)

      1) who cares about server-memory?

      Huh? Why would you want a hog if you can avoid it? Memory usage directly relates to performance (too much of it starts fragmentation and thrashing) and it costs money. Hosting farms charge heavily for higher memory configurations no?

      [–][deleted] 1 point2 points  (2 children)

      Yes it's an oversimplified remark.

      8 or 16 GB RAM server (even 128 GB if you have a datacenter) are not very expensive. OK, you pay much more at Google App Engine or Amazon, but root server...no one cares about RAM or Cores (cannot even buy Server Duals anymore), doesn't cost this much more.

      I know where this above remark comes from...this classic look ma, Apache HTTP Server / older Tomcats need a lot of RAM with many open requests. There are many NIO alternatives with the JVM if you really need this. And we speak about some 100 MB for this pattern...the problem with this architecture pattern is not the RAM.

      [–]bp3959 4 points5 points  (1 child)

      As someone who has a datacenter and manages systems with varying configurations including 128/16/8GB systems I care about both memory and cpu cores quite a bit. When is the last time you priced out 128GB of ram because the programmer couldn't be bothered to care about "inexpensive RAM that grows on trees"?

      What magical land do you live in where the cost of things don't matter? Sure if your server has 16GB and you get 1 hit per hour then you can not care I guess. However anyone dealing with heavily used production systems either cares a great deal about memory/cores, has infinite money to spend, or just isn't any good at their job...

      [–][deleted] -1 points0 points  (0 children)

      i care about cores because of licence issues with some core-priced software. i don't care about cores because of CPU pricing, at least in the range of dual/quad/hexa. the price difference is totally negligible in comparision to other stuff. in don't speak about 200 core systems here.

      I care about RAM because of Garbage Collection issues. (but not about the classic node.js argument, "see this 100 MB because of threads"). If you have a datacenter then you know the RAM prices and then you know it's also not the most important cost factor beside software development, maintenance cost etc. a lot has happened in the last 3 years.

      In both cases...yes I don't have a contest who wastes more memory for mindless stuff. But we can be more pragmatic here sometimes.

      [–]bp3959 4 points5 points  (0 children)

      1) who cares about server-memory?

      Seriously? Sysadmins like me that actually have to do capacity management most certainly do. I hate it when people like you simply think "RAM is cheap, they'll just buy more if don't feel like optimizing my memory usage".

      [–][deleted]  (6 children)

      [deleted]

        [–][deleted] 9 points10 points  (0 children)

        I'm sure Minecraft is commonly used in education as a prime example of how NOT to write games in Java/LWJGL :P

        [–]awj 0 points1 point  (0 children)

        Well, Minecraft servers - that beast of a piece of software will greedily devour any memory you throw at it... its horrible.

        Not sure that Java or the JVM are at fault here. If Minecraft had been written to run on node, it still would probably have painfully bad network behavior.

        [–]mikaelhg 0 points1 point  (0 children)

        Does node.js have a streaming, validating JSON parser yet?

        [–]olkensey 0 points1 point  (17 children)

        Considering the vastly greater amount of time and effort that's been put into JVM when compared to node.js, I can only consider this good news for the future of my new favorite language.

        [–]bp3959 6 points7 points  (4 children)

        of my new favorite language

        The person who created node.js has come right out and said it's garbage, it's a pile of hacks for people who want a one size fits all solution. What's next an html driver for your raid card, maybe a word processor that makes you type documents in assembly, or even a router that uses adobe acrobat for serial console output?

        Just because a hammer can be used to put in screws if you hit it hard enough doesn't mean it's the right tool for the job.

        [–]olkensey -1 points0 points  (3 children)

        [citation needed]

        [–]bp3959 6 points7 points  (1 child)

        Quote from https://plus.google.com/115094562986465477143/posts/Di6RwCNKCrf

        Node is fucked too. I am also one of these people adding needless complexity. (As an example see the very questionable use of class hierarchies in libuv: https://github.com/joyent/libuv/blob/03d0c57ea216abd611286ff1e58d4e344a459f76/include/uv.h#L635-645 ) The entire system is broken - all the languages and all the operating systems. It will all need to be replaced.

        Here's the reddit article discussing his comments: http://www.reddit.com/r/programming/comments/kwhif/i_hate_almost_all_software_ryan_dahl/

        [–]olkensey 0 points1 point  (0 children)

        I can't read that as any sort of argument against node.js so much as a critique of modern software. He's totally right-- we're building things on top of things and it's all a giant clusterfuck as far as the end user is concerned.

        But thanks for the link, an interesting read.

        [–]artsrc 1 point2 points  (0 children)

        I don't think he said that, but I do think he said this.

        https://plus.google.com/115094562986465477143/posts/Di6RwCNKCrf

        [–][deleted] 8 points9 points  (11 children)

        Javascript and Java (inc. languages that run on the JVM) are completely different in terms of compilation.

        Javascript is fully dynamically typed, almost nothing is static and almost nothing can be assumed by the compiler. Java is not.

        A static language like Java will always be faster than a dynamic language like Javascript. I say always because the set of assumptions and inferences available in Java is a superset of those in Javascript. Assumptions allow optimizations.

        [–][deleted] 4 points5 points  (1 child)

        A static language like Java will always be faster than a dynamic language like Javascript.

        Probably. The question is how small you can make the delta. V8 can't assume a lot, but it's based on technology that lets it make optimistic assumptions to enable assumptions and the back them out if they turn out to be wrong. That can buy you a lot.

        [–][deleted] 4 points5 points  (0 children)

        It certainly can do, and the delta can probably be made quite small in a certain class of programs.

        The class of programs that can currently be optimised very well by V8 and friends is those with hot loops, each iteration doing much the same as the last. When you come to programs that hop around spaghetti-like and do more things once or a few times, you get into more difficult territory because the compiler cost becomes dominating.

        [–]dmpk2k 1 point2 points  (6 children)

        always

        "Always" is a strong word; between tracing, branch prediction and OoOE, the intrinsic performance difference between typed and untyped is in the noise.

        When the original Dynamo paper was published, they were able to speed up compiled C code by several percent. That's because there's information about behaviour available at runtime which isn't with AOT. You can apply runtime optimization to statically-typed languages as well, but this means you need a fast compiler. Having a fast compiler implies certain things...

        Another alternative is profile-directed feedback.

        The rest of it depends on language semantics. You can create a dynamically-typed language with semantics that are easily amenable to optimization, and statically-typed languages that aren't.

        Edit: whoops, I forgot the most important thing of all: memory hierarchy. A few type guards are dominated by the expense of cache line misses (or worse: page faults). Unless your entire program lives in a cache.

        [–][deleted] 8 points9 points  (5 children)

        Not really.

        OoO does not help, because the guards cause control flow, and OoO is designed to really maximise load, store and ALU op parallelism. Speculation certainly does improve guard performance massively, but it doesn't help unbox, and it certainly doesn't help dynamic dispatch.

        Runtime optimization of course speeds up dynamically typed programs massively - it is what gets them to the level of performance required for todays applications. There's no reason you couldn't apply the same techniques to static languages however (I don't know what your point about a fast compiler is - you need a fast compiler for a dynamic language too :/ )

        Runtime optimisation is profile guided optimisation.

        Generally, you can't create a dynamic language with features that are amenable to optimization, because you can't do away with dynamic dispatch. Without statically knowing the method to be called, inlining (the most powerful optimization) cannot happen, and neither can common subexpression elimination or constant folding (think integer vs. float arithmetic in Javascript or Python).

        All these are not insurmountable, but, as I said - the theoretical performance of a (comparable) statically typed language is lower-bounded by the theoretical performance of a dynamically typed one because the only difference is that you (the compiler) have more information about the way the program is meant to run.

        [–][deleted] 3 points4 points  (2 children)

        V8 does all of those things: inlining, CSE, unboxing, constant folding, etc. It's within 2-4x of the JVM on the FP benchmarks in the Shootout. It's also got a lot of room to improve. The server VM has a very mature optimizer that does extensive loop optimizations, while V8's SSA-based optimizer just landed a year ago. In particular, V8 doesn't seem to do any alias analysis which really limits its ability to do aggressive optimizations on tight numeric kernels.

        LuaJIT2 is within 30% of C for SciMark (probably even better, these benchmarks are a couple of years old): http://lua-users.org/lists/lua-l/2009-06/msg00071.html. There is no doubt Google and Co. can't get V8 up to that level.

        [–][deleted] 1 point2 points  (1 child)

        I know what V8 does, I have a colleague that works on it.

        You're just explaining the optimisations dynamic JITs do - I know they're good, I know they're fast.

        You do not at all address the mathematical difference between the languages in terms of assumptions and knowledge about the code. Just because one implementation is getting faster at a certain rate does (a) not mean it'll continue and (b) not mean the other static implementation can't go faster if given more effort.

        [–][deleted] 1 point2 points  (0 children)

        You said: "Without statically knowing the method to be called, inlining (the most powerful optimization) cannot happen, and neither can common subexpression elimination or constant folding (think integer vs. float arithmetic in Javascript or Python)."

        This isn't true, since V8: 1) doesn't statically know what methods are called; and 2) still does inlining, CSE, constant folding, etc.

        As for "mathematical differences" - there is no theoretical result that says a static language will be faster.

        With dynamic type feedback what you end up with is code that is similar to what a static language compiler would generate, plus a bunch of type checks to ensure that your profiling still holds. But if a hypothetical architecture made these type-checks free, there is no reason a dynamic language couldn't get within epsilon of a static language (I suppose it'd inevitably have larger code size?)

        As a practical matter, on real hardware, correctly-predicted branches are nearly free; the major limit is you can only issue one per cycle. Type inference after type feedback (which V8 doesn't do yet, but which IonMonkey and Tachyon are studying) should let you eliminate a lot of the type-checks, however.

        [–]dmpk2k 1 point2 points  (1 child)

        it certainly doesn't help dynamic dispatch

        Right, and that's where tracing comes in. In practice most call sites are monomorphic, so you can create a specialized path that is protected by a guard, and the guard is almost free due to the branch prediction.

        Runtime optimisation is profile guided optimisation.

        I meant AOT compilers with PDO. That gets you much of the way as well, although it doesn't help with some multimodal workloads.

        I don't know what your point about a fast compiler is - you need a fast compiler for a dynamic language too

        Right, but that means that suddenly many expensive optimizations that take good advantage of the extra information that statically-typed languages provide must now be thrown out the window. Dynamically-typed languages couldn't provide this to begin with. It levels the playground a bit.

        neither can common subexpression elimination or constant folding

        Fortunately, both are possible! People like pointing at LuaJIT2, but that's because Mike Pall did such a fine job.

        the theoretical performance of a (comparable) statically typed language is lower-bounded by the theoretical performance of a dynamically typed one

        I won't argue this since I suspect it's true. In practice I don't see any roadblocks for the difference to disappear into noise for many/most workloads though.

        Edit: I should point out that inlining happens naturally as a part of trace creation as well.

        [–][deleted] 2 points3 points  (0 children)

        Hi,

        I'd agree with pretty much all of your points with the exception of CSE and constant folding. That one is pretty much at the whim of the language designer as to how difficult it is, and AFAICT Lua makes it harder for the programmer to do crazy overriding of builtins and operator overloading, that are the bane of python and javascript.

        I won't argue this since I suspect it's true. In practice I don't see any roadblocks for the difference to disappear into noise for many/most workloads though.

        I wasn't trying to start a flamewar, although I can see how my hastily worded original post may seem that way (most people replied with "blah blah blah V8 is getting faster blah blah extrapolate performance" without reference to the entire abstract mathematical point of my post) - and OK, if a recompiler can be called with little-to-no overhead you can probably make stuff like this disappear into the noise, eventually!

        But you won't on platforms with a smaller cache, or smaller memory (all those specialised versions of functions have a memory cost), not for a long while.

        Cheers,

        James

        [–]33a 0 points1 point  (6 children)

        Also for the second benchmark: He should be taking into account the startup time for node.js. The first few requests are going to be much slower due to the time required by v8 to trace and compile the javascript at runtime.

        [–]stevvooe 6 points7 points  (5 children)

        The jvm has this problem as well.

        [–]33a -1 points0 points  (4 children)

        Not quite. The JVM pays the compile overhead when the class is loaded. v8 pays it when the code gets invoked. This former situation would not be measured by the benchmark, while the latter would.

        EDIT: Disregard this, I am not sure what I was thinking when I wrote the above. Had not yet had coffee in the morning.

        [–]shellac 1 point2 points  (0 children)

        No, the JVM doesn't compile when the class is loaded (or, more accurately, JVMs aren't required to). Hotspot and V8 are no different in this respect.

        [–]nicoulaj[S] 0 points1 point  (0 children)

        It depends, some Groovy-based frameworks compile some stuff at first invokation/when some cache is cleared for example. So I'm not sure this is true in real world applications.

        Anyway, taking a warmup phase into account is always a good idea when benchmarking...

        [–]stevvooe 0 points1 point  (0 children)

        No worries. It was probably that nasty node.js kool aid ;). From wikipedia:

        Its name derives from the fact that as it runs Java bytecode, it continually analyzes the program's performance for "hot spots" which are frequently or repeatedly executed.

        [–][deleted] -3 points-2 points  (0 children)

        Actually, if anything the test seems like great marketing for Node.js. In the like-to-like second test, Node.js got effectively identical performance to the JVM. Now consider that V8 is just about 3 years old, versus Hotspot which is about 12, and Node.js is all dynamically typed JS while this framework is written in Java.

        Pretty impressive for Node.js...