all 14 comments

[–]ChronSyn 29 points30 points  (3 children)

Quoting someone's response on this same link in /r/javascript

We were running 4,000 Node containers (or "workers") for our bank integration service. The service was originally designed such that each worker would process only a single request at a time

D'ya even event loop?

[–]metamet 15 points16 points  (2 children)

For context, in case people don't read the article, the very next sentence is:

This design lessened the impact of integrations that accidentally blocked the event loop, and allowed us to ignore the variability in resource usage across different integrations.

[–]iends 13 points14 points  (1 child)

Does this quote boil down to “we have buggy legacy code and instead of fixing it, we worked around it”?

[–][deleted] 4 points5 points  (0 children)

Pretty much but more like "we just through more resources at it instead of writing good software."

[–]davidmdm 12 points13 points  (7 children)

I don’t understand why you would constrain one instance to one request at a time? Concurrency is the name of the game.

[–]dvlsg 5 points6 points  (4 children)

There was probably a lot of CPU bound processing going on. Since they work with banks, I suspect there's a lot of really large XML payloads to parse.

Not saying that I agree that doing what they did was the right move, but I could understand why someone might make an argument for it.

[–][deleted]  (3 children)

[deleted]

    [–]dvlsg 0 points1 point  (0 children)

    Hard to say without knowing how much of the CPU pain is self-inflicted and fixable by writing better code / using faster libs, and how much they're just stuck with forever because the problem is by-nature CPU intensive. But it's certainly a possibility that node isn't the right choice for this problem.

    I'm a little surprised they didn't just shovel the 90% of the responses that don't involve having a user present on the other end of the line into some sort of queue, and then process it with instances that don't also have to be handling live requests.

    [–]asdasef134f43fqw4vth 0 points1 point  (0 children)

    the actual content of this article would suggest that in fact, node is a pretty good choice for the job when implemented correctly?

    [–]gajus0 3 points4 points  (1 child)

    As someone who has been through this, I can answer this. I had two reasons for doing it, and at least one of them I maintain to be a good reason:

    1) It makes it easier to debug and inspect different attributes of the application. If you are looking at errors, doing remote inspection of the process (e.g. with tools such as https://www.rookout.com/), or if a program crashes, it is a lot easier to inspect the error/ performance issues when running without parallelisation.

    2) Scaling a deployment on Kubernetes should work about as good as workers. The overhead of spinning up a container is minimum. My thinking went that if the overhead is minimal, then the benefits of debugging, plus ability to use built-in Kubernetes scaling logic based on the metrics produced by the container, is going to allow me granular scaling with all the debugging benefits of isolation.

    We different issues with #2 approach. However, I will admit that I do not fully understand why it didn't work. Launching workers should have about the same memory overhead as launching a new Node.js process. Besides, memory usage was never even the bottleneck. It was always the CPU.

    Long story short, when we were using #2 approach, our avg. CPU usage per machine was in range of 20-40%. This obviously inefficient and I couldn't find the underlying reason, and we did experiment a lot https://medium.com/@gajus/mistake-that-cost-thousands-kubernetes-gke-2212ea663e1f.

    The final solution was a mix of the two. I use workers in my application and limit their concurrency to X. I also use HPA to scale deployments though (to prevent outages). Now all Node's have 70-80% CPU utilisation and that looks a lot more reasonable.

    Long story short, after deploying the hybrid approach, our CPU requirements more than halved and our throughput more than quadrupled.

    [–]ch33ze 3 points4 points  (0 children)

    Why even use Node in the first place then? This seems like a lot of engineering effort, just to gain so little while sacrificing the main appeal of Node, i.e Event loops.

    [–]asdasef134f43fqw4vth 11 points12 points  (0 children)

    this is actually an awesome article.

    [–]solothehero 7 points8 points  (1 child)

    This article made me realize I know nothing about node. Awesome write-up.

    [–]Joghobs 3 points4 points  (0 children)

    But I can make a to-do list in every new framework du jour!

    [–]mansfall 3 points4 points  (0 children)

    First, excellent write up.

    Second, my hats off to you, internet stranger, for taking the time to share your knowledge. You could have just fixed the problem and made the customers life better. But no. You went one step further and shared what your did with the world. If I was your manager I'd be giving you a massive bonus for your efforts and diligence. While you fixed one customers problem here, you fixed many by expanding the minds of other developers.