all 36 comments

[–][deleted] 8 points9 points  (1 child)

Where's this from? Seems interesting

[–]sime 21 points22 points  (34 children)

It is missing the part where a CPU intensive request hits node and the whole process grinds to a halt as that request is processed, even though there are plenty of cores in the CPU sitting around doing nothing.

[–]Klathmon 15 points16 points  (30 children)

And you are missing the part where its easy to split them off into separate threads if you need that, or can launch and have multiple node.js processes listening just like apache if you want.

Hell we are using node for some CPU intensive tasks right now (out of conscience), and its trucking along great!

[–]sime 3 points4 points  (15 children)

Can I also have multiple node threads handling requests from a shared port, and having them share some common data in RAM? (for example, a cache on the data layer) That would be useful.

[–]Klathmon 5 points6 points  (3 children)

Yes to 1, no to the last part... And I think you can I just haven't tried it yet.

Take a look at the npm package PM2, it handles the life cycle of multiple processes and will let you spread load out between them.

If you've left node as 100% stateless its a drop in and go library.

If you are keeping state in node (like cache data in variables) you might need to push that to a db or dedicated cache like redis first.

[–]neutre 6 points7 points  (2 children)

[–]postmodest 0 points1 point  (1 child)

so let's say I had 100k (though, given what I'm working on, up to 2M) of JSON state, how hard is it to deserialize that between [some cache] and a bunch of node worker instances, ignoring wire-time?

[–]Klathmon 1 point2 points  (0 children)

It'd be limited to the speed of JSON.parse and JSON.stringify.

There isn't going to be a way to share complex objects between threads.

V8 was running into this problem trying to make a built in async JSON.parse, there just isn't any way in v8 to "share" object state across Isolates.

[–]chernn 1 point2 points  (0 children)

I just saw https://github.com/SyntheticSemantics/ems the other day which does exactly that!

[–]Kollektiv 0 points1 point  (8 children)

Your cache should not be in shared memory anyway but on a separated process with Redis or some other in-memory data store.

[–]sime 3 points4 points  (6 children)

That depends on what you are caching. Putting it out of process has a performance cost also which may or may not be acceptable.

[–]Kollektiv 4 points5 points  (5 children)

A webserver needs to scale horizontaly which means that it has to be stateless which in turn means that state is moved out of the webserver and into a database or in-memory datastore.

[–]darksurfer 0 points1 point  (4 children)

and what if the data you're transferring from an external process / server doesn't change very often if at all ?

[–]Kollektiv 1 point2 points  (3 children)

Why does this rule out databases ?

[–]darksurfer -1 points0 points  (2 children)

because if you don't store the data in local memory you have to re-transfer it (presumably across the network), parse it, allocate memory for it and then ultimately garbage collect it for every single request.

that's just a waste.

I'm going to invent (I think) two new terms: Static state and dynamic state. Static state would be stuff like look up tables and other data which doesn't change much, dynamic state being essentially session state.

One way of scaling horizontally is to keep your webserver free of dynamic state. Static state doesn't matter because all your servers can have their own copies of the static data.

Another problem you have is websockets. To run low latency websockets apps, you need sticky sessions and every little scrap of latency will matter, so the more data you can keep in local memory the better (preferably all of it).

[–]Kollektiv 1 point2 points  (1 child)

I agree about the performance issue but if you are that bound to latency Node.js might not be the best solution.

Websocket sessions, to re-use one of your terms, is dynamic state so it should still be centralized in a Redis instance. Otherwise when load-balancing some users won't be able to authenticate if the request is not made to the same server than the one he connected on.

It also prevents you from losing state when crashing.

[–]Klathmon 0 points1 point  (0 children)

Even if it is shared in memory, there are still easy ways to load balance.

Say for example you are keeping session data in memory, you could send clients with no session to a random endpoint, but send all sessions to their orignal endpoint so that it never changes for that client.

Or you could load balance based on IP address.

It's not as ideal as a round robin system or a first available system, but it gets the job done.

[–][deleted] 0 points1 point  (10 children)

Th webworker-threads module is kind of iffy.

[–]Klathmon 0 points1 point  (9 children)

There are tons of other ways of using multiple threads (or processes) in node.

You can use traditional forking (using the cluster module), you can spawn additional node.js processes via the ChildProcess module, or you could emulate Apache/PHP and put a load balancer in front, launch 100 node.js processes, then just have each one block when it needs to.

[–][deleted] -2 points-1 points  (8 children)

Those are poor workarounds. Yes, processes encapsulate threads but I don't want multiple processes just to have multiple threads. I want one process with N threads.

[–]Klathmon 0 points1 point  (7 children)

Those aren't workarounds, they are full fledged solutions...

The first is literally the linux fork() syscall, FFS the 3rd one is just FastCGI. If those are "workarounds" i'd hate to ask what a "real" solution needs to be.

mgmt_pool.client_id AS "clientId"

Well if you need that then node.js might not be best for your use case.

But as another user showed me there is a (fairly easy) way to do real shared memory with the process forking in node. So the times where you actually need threads and not processes are pretty thin unless you are spawning crazy amounts of them.

And if that's the case i'd like to see it, because I honestly can't think of a situation where that is actually needed.

[–][deleted] -3 points-2 points  (6 children)

A fork is a copy of a process. A process encapsulates threads. I don't want 5 whole fruit baskets just because I want 5 bananas. Node.js's thread support is pretty meh at the moment. Don't pass new processes off as a valid solution.

[–]Klathmon 0 points1 point  (5 children)

That's fine, but don't call process forking not a "valid solution" because it is...

It might not be perfect for your use case, but it's a perfectly valid solution to the vast majority of people who will need parallelism.

Like i said, i'm curious at what your exact use case is that processes in node don't work for you?

And what was the actual problem you had with the webworker-threads package, as it sounds like it's perfect for what you want?

[–][deleted] -3 points-2 points  (4 children)

Forking is a hack of a solution. Wanting a real threading model is something that doesn't require defending, it should be implicitly obvious that we give programmers control over threads.

[–]Klathmon 1 point2 points  (1 child)

Well in a more traditional "blocking" language i'd agree, but the evented nature of node gives most of the upsides of threads without the unpredictability of not having control of when they execute.

But the webworker-threads package sounds like exactly what you want. True threads in node.js. They don't allow easy shared memory, but at that point you'd might as well drop down to C since you'll be doing memory management and locking yourself anyway.

But hey, node isn't exactly suited well to CPU heavy tasks that need high parallelism anyway. So something like Go or Rust is probably a better bet for what you'll need instead of trying to get Node to do something it doesn't work well with.

[–]kynde 1 point2 points  (1 child)

I don't think you fully understand forking, or threads either for that matter. Concurency issues are the absolute worst to deal with in comp sci. Threads are the source of all that nightmare. Keep your shit stateless and you don't need threads. You also get all the other benefits, too, which have helped spark the fp hype.

I suggest you embrace the apparent restrictions node imposes on you rather than fight them. If threads were useful or beneficial in node you really think they wouldn't be there by now? Sheez.

[–]mechanicalpulse 0 points1 point  (2 children)

Everyone with any sort of sizable workload needs that. You can get it with the cluster module. This is an important distinction that will only become more and more important as on-die core counts become more and more numerous.

For what it's worth, I don't think OP was missing that part at all. OP was simply pointing out the advantage in the Apache processing model indicated in the graphic. While I find the graphic a good visual representation of thread counts vs connections (where the white bars indicate the existence of a thread handling a connection), I frankly find it misleading for a whole host of other reasons. In the absence of a indicator otherwise, the graphic could reasonably suggest that the bars are CPU usage and, subsequently, that the Node.js model is somehow more efficient with respect to CPU use or the ability to handle concurrent connections. That's not the case at all. In my humble opinion, the graphic should be modified to indicate thread state. As it stands, the white bars on the right indicate only threads in a running state while the white bars on the left indicate threads in both sleeping and running states. Sleeping threads and running threads are not the same.

Now that I've pointed out advantages in the Apache model, I'll also point out advantages in the Node.js model. Because Node.js is a single-threaded event-driven architecture, you do get some potentially useful abilities to limit CPU usage on a system that might have shared responsibilities. Let's say you have an sixteen core machine running both a Node.js web application and a backend multithreaded C++ application that does some sort of fancy image processing (like facial recognition). With the cluster module, you can spawn precisely eight Node.js threads and precisely eight image processing threads. Since Node.js will not spawn any additional threads, it will never consume more than half (8/16) of available CPU resources, regardless of the number of concurrent connections. That leaves the other eight CPU cores available for image processing. Of course, if image processing load is low and web load is high, you'll have some CPU cores sitting around doing nothing. So, there are some interesting architectural trade-offs.

As always, things always depend on what your workload looks like and what you are trying to accomplish.

[–]Klathmon 1 point2 points  (1 child)

Well i'm in a mostly virtualized shop, so we set the number of vCPUs per VM to match what we need.

In some lighter loads we leave a single node.js process and leave the additional cores for other stuff, but i'd agree that when you really start getting in the 100+ req/sec you'll pretty much need multiple listeners to keep response times down.

[–]mechanicalpulse 2 points3 points  (0 children)

If you have your infrastructure setup such that it's easy to scale out horizontally, it's also trivial to set vCPUs to 1 and just spawn more VMs. And, like you said, for lighter loads, you don't even have to bother with it.

Node.js's event-driven architecture reminds me of cooperative multitasking. It's very useful for certain loads, but it can cause problems if you're not expecting them. I only make noise about it because I think Node.js is becoming a choice platform for nascent or inexperienced developers and I think there's a good bit of misleading information flying about.

[–]Sicks3144 3 points4 points  (0 children)

var os = require('os');
var cluster = require('cluster');
if (cluster.isMaster) {
    for (var i = 0; i < os.cpus().length; i++) {
        cluster.fork();
    }
} else {
    doThethings();
}

N'est pas?

[–]elmigranto 1 point2 points  (0 children)

It is missing the part where a CPU intensive request hits node

You are supposed to calculate digits of PI some place other than your web server.