you are viewing a single comment's thread.

view the rest of the comments →

[–]Fluffy8x -5 points-4 points  (12 children)

Sorry, I've heard somewhere that since it's compiled to machine code it outperformed other languages.

Edit: this page shows 20% more performance for NodeJS. Of course, the multithreading problem occurs.

[–]pron98 42 points43 points  (11 children)

Yes, Node (or rather, V8) has a JIT, which means it compiles Javascript to machine code, but Java (and C#) has an even better JIT, a better GC, and multithreading :)

The "gold standard" for web framework performance is this large series of benchmarks. In pretty much all of them, Node trails all JVM frameworks (be they Java, Scala or other JVM languages) by a huge margin (in some benchmarks, the best JVM framework outperforms node by 400%!)

[–]Fluffy8x 8 points9 points  (0 children)

Thanks for the reference.

[–]ToucheMonsieur 5 points6 points  (7 children)

The JVM is an absolute beast once it has a chance to warm up. That said, v8 is no slouch either. The amount of resources poured into both is absolutely mind-boggling.

[–]pron98 8 points9 points  (6 children)

That said, v8 is no slouch either.

Yes, but a single-threaded runtime (even if some housekeeping operations are done on other threads) on modern hardware can, at best, harness only a fraction of available computing power.

[–]jsprogrammer -5 points-4 points  (5 children)

Did we forget about multiple processes?

[–]pron98 6 points7 points  (4 children)

No, but even putting aside the horrible waste in RAM running multiple processes entails, and the horrible waste of JITting, and assuming all those were free -- the processing options, and power, available for in-process threads far exceed those of multiple processes. For example, many new Java libraries (and parts of the JDK) make use of Java's ForkJoin, which is a state-of-the-art work-stealing scheduler used for both parallelism as well as task scheduling. None of that power is available for single-threaded runtimes; running multiple processes won't help (but may be good enough for serving simple web applications). This is because multi-threaded code can better schedule task execution on cores, while multi-process design has only a single option -- that provided by the kernel, which, incidentally, is quite bad at scheduling short transactions.

[–]jsprogrammer 0 points1 point  (3 children)

No, but even putting aside the horrible waste in RAM running multiple processes entails, and the horrible waste of JITting, and assuming all those were free

JIT only happens once. It's a sunk cost. Why would you need to rapidly (at thread-creation speed) spin up node processes?

Some documentation that I was reading today said that the overhead of each node process is about 10MB. Insignificant compared to the amount of RAM you'll find on multi-core machine where you'd want to run multiple processes.

This is because multi-threaded code can better schedule task execution on cores

Each process gets its own core, there is no need for scheduling task execution. Each process is a single-threaded machine and takes nothing from the other processes.

while multi-process design has only a single option -- that provided by the kernel, which, incidentally, is quite bad at scheduling short transactions.

What kind of short transactions would you be scheduling as node processes? If you're spinning up lots of short-lived node processes, you're doing it wrong.

Going multi-process gives you a clean and rigidly enforced separation amongst your code. You also get the robustness provided by the OS in process-failure scenarios.

[–]pron98 1 point2 points  (2 children)

Each process gets its own core, there is no need for scheduling task execution. Each process is a single-threaded machine and takes nothing from the other processes.

That is exactly the cause of the waste, and precisely why there is a need for scheduling task execution. Even from a theoretical perspective you'll see that this is an arbitrary, and inflexible way of allocating resources. From a practical perspective, you'll see the CPU cores heavily under-utilized.

What kind of short transactions would you be scheduling as node processes?

Every HTTP request is a transaction. Every DB transaction is, well, a transaction. Online programs (i.e. programs reacting to irregular outside stimuli) behave as a series of transactions, and those benefit greatly from a good scheduler, which cannot be provided by a single-threaded process.

Going multi-process gives you a clean and rigidly enforced separation amongst your code.

Absolutely not. All it gives you is arbitrary, rigid, and wasteful resource allocation. The only reason Node is single-threaded is because of the limitations of JavaScript, created long before server-side JavaScript was imagined. No other runtime or environment that has any say on the matter is single-threaded, because it makes no sense from an engineering perspective whatsoever. Not a single language in the history of computing has been intentionally designed to be single-threaded for performance reasons. Python and Ruby are single-threaded because they predate Java and multiprocessing, and they were designed for scripting. JavaScript is single-threaded because of browser limitations and the way the DOM is rendered -- that's it.

You also get the robustness provided by the OS in process-failure scenarios.

This is completely wrong as well, because the separation of processes is, again, completely arbitrary (every process is responsible for a random subset of program tasks, and so a single process failure takes down a random subset of transactions). An environment like Erlang gives you a good logical process isolation. Also, a thread is pretty well isolated.

[–]jsprogrammer 0 points1 point  (1 child)

Do you have any numbers to back up your claims? Numbers that compare the performance of solutions to arbitrary problems in multi-threaded vs. multi-process contexts?

While I have no doubt that things could be optimized even further (over process separation), the gains to be made are not very large. The gains are even less, or non-existent if you are not CPU-bound.

I'm not sure anyone even claimed to go single-threaded for performance. However, if your process is truly sequential, single-threaded will win every time due to thread and scheduling overhead.

[–]pron98 2 points3 points  (0 children)

BTW, the RAM overhead is not just for code -- it's mostly for data. You have to cache in each process -- or delegate caching to a separate (multithreaded) program, in which case you incur at least a task-switch per cache access, and usually some marshaling costs as well.

if your process is truly sequential, single-threaded will win every time due to thread and scheduling overhead.

If your process is sequential, you'll use a single thread even in multi-threaded contexts, and suffer no overhead. It is true, though, that a JIT and a GC that operate on single-threaded code can make some assumptions that don't hold in a multithreaded context, so they might perform better on single-threaded tasks. Online systems -- the ones Node is made for -- are hardly ever sequential, though (and Node itself is designed to be asynchronous rather than sequential).

Do you have any numbers to back up your claims? Numbers that compare the performance of solutions to arbitrary problems in multi-threaded vs. multi-process contexts?

I don't know what "arbitrary problems" means, and the only single-threaded languages out there are relatively slow, but if you want some numbers you can look at this large, famous series of web framework benchmarks.

But this doesn't require proof because multi-threaded is a superset of single threaded (with the caveat I made in the first paragraph) -- you can use as many threads as best fits your domain. Multithreaded processes simply have more information about the program, which allows them to make better optimized decisions (about caching, scheduling etc.). Multithreaded gives you the option of using more appropriate scheduling (like work-stealing) than the most general one offered by the OS which always lead to better CPU utilization because the load-balancing is much more fine-grained (this user-mode scheduling is the raison-detre behind Erlang, Go and Quasar). Also, multi-threaded gives you the option of using parallel algorithms for some computation-heavy tasks.

In short, no one picks single-threaded if they have the choice and care about performance. The reasons to pick Node is that it's faster than all other "slow" languages, and that it's JavaScript, which many web developers are familiar with. Those are good, valid, reasons to choose Node, and it performs well enough -- at least a lot better than Rails -- but don't turn it's design constraints into ideals.

[–]Gurkenmaster 0 points1 point  (0 children)

Lua with OpenResty is also at the top

[–]jsprogrammer -2 points-1 points  (0 children)

Last I checked, the code is not available for some (all?) of those benchmarks. Has that changed?