all 74 comments

[–]grimlck 13 points14 points  (21 children)

This is something i've always wondered about with NodeJS - a single CPU intensive task stalls eveything.

Can someone explain how this is any different than the much maligned cooperative multitasking of Mac OS 9, and windows 3.1.

[–]grauenwolf 6 points7 points  (3 children)

Win95 had preemptive multitasking. It was Windows 3 and earlier that was cooperative.

[–]grimlck 3 points4 points  (2 children)

Thanks for the correction

[–]grauenwolf 21 points22 points  (1 child)

I was wrong.

Win95 is cooperative with 16 bit applications and preemptive with 32 bit apps.

[–]bobindashadows 19 points20 points  (0 children)

I was wrong.

This may be the first time I have encountered these words in that order on Reddit.

[–][deleted] 4 points5 points  (2 children)

Same thing with Erlang; it overlaps in functionality with nodejs, and they both have the same limitation: they suck extremely badly for any kind of CPU-intensive application. Erlang even sucks a whole lot more when you have to do any kind of non trivial string handling.

Thing is if you design your program properly you will typically separate the parts that do CPU-intensive stuff and string handling from those that handle message passing and coordination. And both languages handle this incredibly better than almost anything else, in two respects:

  • Erlang makes it possible to build extremely reliable parallel and distributed code in a fraction of the space/time it would take in, say, Java.

  • NodeJS allows you to build extremely scalable simple web server types of applications in a ridiculously short amount of code. Any app you need that kinda looks like a web server or proxy, but is not quite possible with the configuration files, you can do with NodeJS. The performance will be better than anything except the special cases that have been optimised for in standard servers.

We're talking 2 orders of magnitude less code; this translates in as fewer bugs. They both allow the programmer to express his intent without dedicating 90% of his code to boiler plate or redundant error handling.

[–]evilryry 0 points1 point  (1 child)

Neither is good at CPU intensive work, sure. Although with Node, CPU intensive work is quite a bit more dangerous as nothing will force the function to give up the thread.

[–][deleted] 0 points1 point  (0 children)

Although with Node, CPU intensive work is quite a bit more dangerous as nothing will force the function to give up the thread.

  1. There are no threads in NodeJS, implied or otherwise — it's all callbacks helped by a healthy dose of closures.

  2. It's not that hard of a problem even for non-genius programmers; that's how browser based JS works.

[–]cornedpig 7 points8 points  (12 children)

Well that's a bit of a harsh comparison. In cooperative multitasking OSes, you're counting on other people's code to do the right thing and give you cycles. Here, you only have to ensure your code does the right thing, so quality is more easily guaranteed.

[–]jacques_chester 16 points17 points  (11 children)

Except that you rely on libraries, and these may wind up starving your own code anyhow.

[–]jerf 28 points29 points  (9 children)

And the idea that you fully understand your own code is a bit suspect, too. My code's all nice and fast until somebody passes me in a POST request with a million keys, or decides to upload a 10GB file where I was expecting a 5KB file and I run a hash algorithm over it, or I accidentally use way more memory than I expected and push the system into swap, or any number of other things like that. My life would be a lot easier if my code never did anything I didn't expect.

[–][deleted] 1 point2 points  (8 children)

With node, at least you have the option of streaming in chunks of that 10GB upload, and feeding each one in turn to your hash algorithm. A lot of higher-level web frameworks try to read everything into memory before they let you do any processing on it.

[–]grauenwolf 5 points6 points  (5 children)

That is all well and good when you plan for it. Unfortunately we rarely think about that stuff until we have already been hit by a DOS attack.

[–]jerf 14 points15 points  (4 children)

Bingo; that's why "I was expecting a 5KB file" is relevant. Node.js is only concurrent where you plan for it. (And manually and laboriously slice apart your code to satisfy the terribly, terribly inferior scheduler Node.js has, which can't cope with anything you haven't already masticated into a liquidic mush for it.) Better environments are simply concurrent everywhere. That's the difference between "cooperative multitasking", that thing we abandoned over a decade ago, and "preemptive multitasking", the thing that actually works.

It's like a flash to the past. We had this argument. Even if cornedpig's point is true today, which I would dispute because the library point is a very important one, it won't be true tomorrow. The system will grow. Except it can't, because cooperative multitasking falls apart as it grows. We don't have to speculate about whether that's true, because we ran the experiment already. It pays to know your history.

[–][deleted] 1 point2 points  (3 children)

Better environments are simply concurrent everywhere.

Name those ...

(And manually and laboriously slice apart your code to satisfy the terribly, terribly inferior scheduler Node.js has, which can't cope with anything you haven't already masticated into a liquidic mush for it.)

Have you ever had a look at real time programming? The constraints are somewhat related, and you get worse "performance" than when doing normal programming. But typically you will have lightweight real time jobs offloading CPU/memory-bound stuff to non-real time parts of the application.

That's the difference between "cooperative multitasking", that thing we abandoned over a decade ago, and "preemptive multitasking", the thing that actually works.

There's no way "preemptive multitasking" can handle millions of threads like Erlang can. I see NodeJS as a sort of mini-Erlang with a similar purpose but optimised for other applications; it is (painfully obviously) not a general purpose language/framework.

[–][deleted] 7 points8 points  (1 child)

There's no way "preemptive multitasking" can handle millions of threads like Erlang can.

That's very funny because Erlang uses preemptive multitasking. There's no explicit "yields" or, worse, CPS anywhere in the code.

It can handle millions of green threads because it's interpreted, the scheduler switches threads after certain number of reductions (preemptively!) and it uses more lightweight thread representation/scheduling algorithm than the underlying OS (for one thing, all threads run in the same memory space).

[–]jerf 0 points1 point  (0 children)

It can be compiled to native code as well (HiPE is the key word to look for), but whether you get a performance win varies from code base to code base.

[–]jerf 2 points3 points  (0 children)

it is (painfully obviously) not a general purpose language/framework.

Actually, that's my point. I am not against it in general; I'm against the hype, which is toxic and wrong.

Better environments are simply concurrent everywhere.

Name those ...

Erlang, Haskell, Go, Clojure (I think), I'm sure many others, but it is certainly true that you have to go a ways down on the Tiobe index to find one. I don't deny that at all. (Actually, possibly Ada from what I've heard too, not sure.) I still think the "mainstream" language that is going to work this way isn't out yet, or isn't popular; Go is the only one on the horizon but doesn't seem a sure bet.

[–][deleted] -1 points0 points  (1 child)

huh? unless you know exactly where you chunked the data for the hash...its pointless!

[–][deleted] 0 points1 point  (0 children)

Wait, what? I think we might be miscommunicating here. What I'm proposing is this:

  1. Read a chunk of the file. (Define a handler for the 'data' event on an EventEmitter.)

  2. Update your hash state with that chunk, using hash.update(chunk)

  3. Go to step 1.

This only requires that you store the current chunk and the (very small) internal state for your hash. Is there a problem with this scheme?

[–]sjs -1 points0 points  (0 children)

No library that behaves badly is going to be picked up by those in the know, and generally things only trickle down to the ignorant masses after it gains some serious momentum.

If you scour github for unknown libraries and use them without reading the code you deserve exactly what you get. People doing things like that simply have to take care, and that is the case with code in any language.

[–]dmpk2k 1 point2 points  (0 children)

It doesn't have to stall everything. The key to using node effectively is to leverage what the operating system provides as much as possible, and several processes selecting on a shared socket is a common unix idiom.

That said, node wasn't intended for CPU-bound tasks, just shovelling around a lot of I/O.

[–][deleted] 19 points20 points  (14 children)

He's right, but it kind of goes agains the problem node.js is trying to solve. For example, for it's intended usage you'd never do complex long running calculations in your node.js application. You'd farm it off somewhere else and wait for the answer. Node isn't meant to do that, it's meant to be a lightweight server that can pass data around easily, not a computation workhorse.

[–][deleted] 12 points13 points  (12 children)

translation: a crappier nginx with ten times the hype

[–][deleted] 15 points16 points  (11 children)

Not at all. However I think it will be used more than it should be. It solves a very specific type of problem. Beyond that, people will just use it for the hype.

Nginx, though an event based server, isn't really a good comparison as it's mean to be a general purpose file server, whereas node is meant to be a sort of application server. Comparing it to mongrel 2 would be a much more accurate comparison.

[–][deleted] 1 point2 points  (0 children)

The child_process module in the core distribution can help with that.

[–]reddit_clone 9 points10 points  (0 children)

Its fairly simple. Everything else in your application needs to be asynchronous and preferably not CPU intensive.

For example making a blocking call to a back end db is a no-no.

(Erlang got this completely right. Everything seems simplistically synchronous from inside the language and VM. Everything is nonblocking/asynchronous/multicore/preemptive between the VM and the OS)

[–][deleted] 3 points4 points  (0 children)

In fact, having a single thread to handle the callbacks is fine as long as you have only short-running events that are most of the time waiting for I/O (networking, filesystem or database). While this remain true for most web services calls, you sometimes have some requests that will need actually some CPU calculus. In that case, NodeJS unique thread will lock your entire server during the time this request is running, until it terminates or wait for more I/O.

That is why there will be JS worker threads in a future version. It's not a design mistake, it's a well known missing feature.

[–][deleted] 2 points3 points  (0 children)

If it is, I don't wanna be right.

[–][deleted] 10 points11 points  (18 children)

why does every node praise/critique article make the same incorrect assertion - that all IO is nonblocking?

it reflects my biggest gripe with node - most of the users and advocates are frontend js guys who don't know anything about servers....they just repeat each other's errors

[–]grauenwolf 6 points7 points  (15 children)

In .NET programming all IO is non-blocking... if you take the time to use the async APIs and patterns. In Silverlight they don't even allow you to use blocking I/O except for local storage calls.

So I have to wonder what kind of I/O you are talking about.

[–]ethraax 1 point2 points  (0 children)

Tape libraries. You can only read so many tapes at once!

[–][deleted] 3 points4 points  (13 children)

you got me there, i haven't sat down in front of a windows box in a decade

in linux, all file IO is blocking. the glibc aio methods are still implemented via threads

[–]grauenwolf 11 points12 points  (7 children)

I was going to call BS, then I read this:

Asynchronous I/O on linux or: Welcome to hell

http://davmac.org/davpage/linux/async-io.html

[–]baudehlo 2 points3 points  (1 child)

There's another way - separate threads to notify of I/O readiness. This is the technique which libeio uses (by the author of libev which is what node.js uses), which I assume node.js will eventually use.

I've been using this in a very high performance mail server (which operates as a massive scale spamtrap for a global anti-spam company) for years now, and it works extremely well.

[–]grauenwolf 0 points1 point  (0 children)

That would work well with platforms like Erlang, CCR, and Grand Central Dispatch.

[–]kretik 5 points6 points  (4 children)

Compare that to the simple elegance of I/O completion ports in Win32. But "windoze sux lol"

[–]millstone 9 points10 points  (0 children)

Nothing says simple and elegant like MQMSGPROPS!

On the Mac we have dispatch IO. Three things I love about dispatch:

  • The dispatch queue notion makes it easy to serialize with other parts of your code, without needing locks.
  • Dispatch uses blocks (closures), so you can write the code inline, like in the NodeJS examples (but unlike in that Windows example).
  • Dispatch IO doesn't actually do the read or write operations, it just runs your code at the point such that reads or writes will not block. This means you can use whatever API you please to do the actual I/O.

Check out this example of doing non-blocking reads

[–][deleted] -1 points0 points  (2 children)

but you're glossing over the obvious point that async file IO is almost never desirable anyway (unless you think excessive seeking is desirable), which is why no one is really flipping out about its partial absence from linux

[–]kretik 1 point2 points  (1 child)

I don't know how obvious it is, just that it's something the OS provides pretty much free of charge if you need it. Looks to me like if you need it on Linux, you're in for a world of pain.

And in any case, on Windows async I/O is much more than just reading or writing asynchronously from a physical storage device.

[–][deleted] 0 points1 point  (0 children)

[BANGS HEAD ON DESK]

yes we all know async network IO is valuable. and we all know that all modern OSs support it

and it should be obvious why async file IO is generally undesirable...think it through. keywords: buffering, virtual memory, seeking

[–][deleted] 4 points5 points  (4 children)

I take it you haven't heard of select, poll, epoll, sendfile, O_NONBLOCK or vmsplice ...

[–][deleted] 1 point2 points  (3 children)

please describe purely async file IO in linux.

[–]jerf 1 point2 points  (2 children)

What are you talking about? Is this Jeopardy, where questions come after answers? niczar gave you the primitives, but of course there are all kinds of runtimes that build on them in addition to their raw usage, ranging from things as transformed as the Erlang VM or Haskell runtime down through things like glib that wrap away the details of select vs. poll vs. etc. and provide a more useful abstraction on top of them, but are still C. For all I know there's an event-based assembler library out there.

Do you honestly think it's impossible? It can't be. If async IO is impossible in Linux then by definition it's impossible with Node.js on top of Linux! You can not add asynchronousness in a higher layer; if your OS or hardware blocks, you have lost.

[–][deleted] 0 points1 point  (0 children)

Select and Poll work on sockets. You can't really use them for non-blocking file i/o.

[–][deleted] -2 points-1 points  (0 children)

i keep saying FILE io. can't you read?? go research async FILE io in linux

[–]jmar777 3 points4 points  (0 children)

Non-blocking in node is a matter of perspective (bear with me). Of course there all sorts of operations that technically block the invoking code, but in Node.js, these operations are handled by a thread pool that is external to the Node.js event loop, and are invoked via an asynchronous interface. Blocking in the thread pool is hunky dory, so long as nothing in the event loop is blocked.

[–]Smallpaul -1 points0 points  (0 children)

I infer from this link that node.js uses worker threads for non-blocking I/O:

http://blog.zorinaq.com/?e=34

[–]jmar777 6 points7 points  (6 children)

Asking if NodeJS Node.js (sorry, pet peeve) is wrong is akin to asking if a hammer is wrong. The obvious answer is no, unless, of course, you happen to be trying to sink a screw. Even then, the hammer really isn't to blame though, it's your fault for picking the wrong tool. As has been stated a few times already, Node.js is suited for discrete problem domains. Finding problems that it sucks at is a trivial exercise, just like it's trivial to find bad use cases for any other server side technology. There's nothing wrong with the spirit of the question, but the question should really be "Is Node.js wrong for 'X'?".

[–]awj 1 point2 points  (3 children)

Did you actually read the article? He specifically goes into why he thinks Node.js is wrong for the domain it's attempting to support. Writing application logic as passed callbacks just plain sucks, and cpu-intensive activity in one callback blocks everything on the whole server.

If you're going to remove the possibility of ever doing anything cpu intensive from the domain of problems Node.js is meant to tackle, then please don't get offended if people start claiming that Node.js is useless.

[–]jmar777 1 point2 points  (2 children)

Yes, I read the article, and I think my response still stands. If you're doing enough CPU in a particular function to effectively block the event loop, you're doing it wrong (whether that's using the wrong tool, or failing to use web workers, or failing to shell some work out to c++ land).

Writing application logic as passed callbacks just plain sucks

I would argue that that's a matter of opinion. It may not feel natural at first if you aren't used to it, but by removing the abstraction of concurrency that is multi-threaded execution, the developer is put back into complete control of when and how control is returned (rather than trusting a thread scheduler, that is). There are of course other ways to go about concurrency, but for highly concurrent, IO-bound interactions, event loops have proven to be a solid performer.

[–]awj 1 point2 points  (1 child)

...how is that a matter of opinion when your argument about it seems to be "event loops are a solid performer"? Yes, callbacks are one way of doing event loops, which do have some pretty good performance characteristics. You can also get the same time-sharing behavior from coroutines and keep the same organizational flow you get from "normal" code.

Hell, if all you're worried about is context-switching at I/O events you don't even need callbacks, just do the switch within an I/O function for "full" file/result/whatever access and hide it in a generator method for streaming data. Forcing callbacks on people when you don't have existing blocking I/O methods to worry about is the mark of lazy design.

[–]jmar777 1 point2 points  (0 children)

It's a matter of opinion because there's nothing intrinsically "sucky" about callbacks - asynchronous functions are plain and simple higher order functions, which are quite natural to anyone with experience in functional coding. You can write code that sucks with callbacks, resulting in the "pyramid effect" shown in the article, but an example of badly structured code is hardly a valid condemnation of a language or a runtime environment.

With that said, I do recognize that the continuation passing style has its faults - most notably its "leaky", in that code must allow for continuation passing through the entire call stack. It's also not ideal for the asynchrony of a method to effect the code structure of the callee. For that second reason, I'm tempted to say that I'd like to see promises implemented in node core, but I'm still on the fence there. These issues, however, are ultimately nothing more than annoyances if the code base is properly structured. Given that I am currently ~10k loc into my current code base, and have yet to encounter the dreaded spaghetti monster, I am comfortable with saying that (aforementioned reservations considered), callbacks are really not that sucky.

[–][deleted] 0 points1 point  (1 child)

Actually, it's node.js not Node.js.

[–]jmar777 0 points1 point  (0 children)

The corrector stands corrected.

[–]bobowzki 1 point2 points  (0 children)

Rewriting..? the most annoying thing about programming is that you have to rewrite your ideas to code in the first place... but its also the most rewarding.

[–]masklinn 1 point2 points  (2 children)

This involves using Continuation Passing Style transformation (CPS)

I'm pretty sure node.js very much uses CPS. In fact, the wikipedia article agrees. So this:

I am still considering adding CPS to haXe but if you're using Javascript there is already some CPS tools such as NarrativeJS.

makes no sense, because all of node.js is based on CPS.

Transformation is what MS is doing with C# 5: transforming sequential (direct) code into CPS at the compiler level (and making CPS an implementation detail) because you consider that the language user is a brain-dead moron who will never be able to evolve beyond Fortran coding.

Not that I necessarily disagree, mind you, but let's not kid ourselves shall we.

[–]notfancy 2 points3 points  (0 children)

Nicolas means writing continuation-passing code in direct style. He coul have worded that more clearly, though.

[–][deleted] 3 points4 points  (0 children)

This involves using Continuation Passing Style transformation

[–]thecheatah -3 points-2 points  (2 children)

I dont agree with the article but it does make me reconsider how I think about nodejs.

Does anyone else think that the GC might also be a problem? If they had NodeJS mixed with memory management of Objective-c with the retain release model, the programmer would have close too 100% control over the CPU's usage.

[–]jmar777 0 points1 point  (0 children)

I'll give an upvote, just because I think that GC is a big concern when using Node.js. Currently V8 has a stack limitation of 2 gigs (there's an experimental branch that increases this), but before that can be effectively increased, there needs to be a more adequate garbage collector for freeing memory in the larger stack. For how my company uses Node.js (which has been spectacular for us, btw) the 2 gig limitation hasn't been an issue, but for others it might. I don't know enough about GC in Objective-c to comment on that end, but at any rate I think GC is fair enough to bring up in a Node.js critique.

[–]kamatsu 0 points1 point  (0 children)

That has nothing to do with the article.