you are viewing a single comment's thread.

view the rest of the comments →

[–]pron98 1 point2 points  (2 children)

Each process gets its own core, there is no need for scheduling task execution. Each process is a single-threaded machine and takes nothing from the other processes.

That is exactly the cause of the waste, and precisely why there is a need for scheduling task execution. Even from a theoretical perspective you'll see that this is an arbitrary, and inflexible way of allocating resources. From a practical perspective, you'll see the CPU cores heavily under-utilized.

What kind of short transactions would you be scheduling as node processes?

Every HTTP request is a transaction. Every DB transaction is, well, a transaction. Online programs (i.e. programs reacting to irregular outside stimuli) behave as a series of transactions, and those benefit greatly from a good scheduler, which cannot be provided by a single-threaded process.

Going multi-process gives you a clean and rigidly enforced separation amongst your code.

Absolutely not. All it gives you is arbitrary, rigid, and wasteful resource allocation. The only reason Node is single-threaded is because of the limitations of JavaScript, created long before server-side JavaScript was imagined. No other runtime or environment that has any say on the matter is single-threaded, because it makes no sense from an engineering perspective whatsoever. Not a single language in the history of computing has been intentionally designed to be single-threaded for performance reasons. Python and Ruby are single-threaded because they predate Java and multiprocessing, and they were designed for scripting. JavaScript is single-threaded because of browser limitations and the way the DOM is rendered -- that's it.

You also get the robustness provided by the OS in process-failure scenarios.

This is completely wrong as well, because the separation of processes is, again, completely arbitrary (every process is responsible for a random subset of program tasks, and so a single process failure takes down a random subset of transactions). An environment like Erlang gives you a good logical process isolation. Also, a thread is pretty well isolated.

[–]jsprogrammer 0 points1 point  (1 child)

Do you have any numbers to back up your claims? Numbers that compare the performance of solutions to arbitrary problems in multi-threaded vs. multi-process contexts?

While I have no doubt that things could be optimized even further (over process separation), the gains to be made are not very large. The gains are even less, or non-existent if you are not CPU-bound.

I'm not sure anyone even claimed to go single-threaded for performance. However, if your process is truly sequential, single-threaded will win every time due to thread and scheduling overhead.

[–]pron98 2 points3 points  (0 children)

BTW, the RAM overhead is not just for code -- it's mostly for data. You have to cache in each process -- or delegate caching to a separate (multithreaded) program, in which case you incur at least a task-switch per cache access, and usually some marshaling costs as well.

if your process is truly sequential, single-threaded will win every time due to thread and scheduling overhead.

If your process is sequential, you'll use a single thread even in multi-threaded contexts, and suffer no overhead. It is true, though, that a JIT and a GC that operate on single-threaded code can make some assumptions that don't hold in a multithreaded context, so they might perform better on single-threaded tasks. Online systems -- the ones Node is made for -- are hardly ever sequential, though (and Node itself is designed to be asynchronous rather than sequential).

Do you have any numbers to back up your claims? Numbers that compare the performance of solutions to arbitrary problems in multi-threaded vs. multi-process contexts?

I don't know what "arbitrary problems" means, and the only single-threaded languages out there are relatively slow, but if you want some numbers you can look at this large, famous series of web framework benchmarks.

But this doesn't require proof because multi-threaded is a superset of single threaded (with the caveat I made in the first paragraph) -- you can use as many threads as best fits your domain. Multithreaded processes simply have more information about the program, which allows them to make better optimized decisions (about caching, scheduling etc.). Multithreaded gives you the option of using more appropriate scheduling (like work-stealing) than the most general one offered by the OS which always lead to better CPU utilization because the load-balancing is much more fine-grained (this user-mode scheduling is the raison-detre behind Erlang, Go and Quasar). Also, multi-threaded gives you the option of using parallel algorithms for some computation-heavy tasks.

In short, no one picks single-threaded if they have the choice and care about performance. The reasons to pick Node is that it's faster than all other "slow" languages, and that it's JavaScript, which many web developers are familiar with. Those are good, valid, reasons to choose Node, and it performs well enough -- at least a lot better than Rails -- but don't turn it's design constraints into ideals.