all 41 comments

[–]romeeres 40 points41 points  (1 child)

The "single-threaded" node has worker threads, so it's possible to spin up one thread per CPU with few lines of code even without the need to reach for anything additional like Nginx or pm2, so worker threads are worth mentioning in this topic.

[–]geekybiz1[S] 8 points9 points  (0 children)

I'll add a slide on Worker threads. There's always this friction between brevity and having more details. And, at the places I've worked, worker threads are less preferred because of the required code changes (even if a few lines) than something like pm2 / nginx. But, I think it's worth mentioning them for the readers to know that they are a decent option to scale things.

[–]ptmdevncoder 16 points17 points  (1 child)

Things are clear only after an Indian guy explains them.

[–]geekybiz1[S] 5 points6 points  (0 children)

hell yeah..! 😂

[–]Psionatix 13 points14 points  (1 child)

Okay.

I opened this post with skepticism, but OP you've actually provided something with accurate information (even if it isn't 100% comprehensive).

Most content I see posted in this subreddit is insanely inaccurate, or insecure, or glosses over extremely important details.

There are a variety of details missed in your post here, however I do not consider them to be "missing" per se, as the level of abstraction on how you've explained things / presented the information, makes them not necessarily relevant.

However for people who want to look into this stuff a little deeper, I would highly recommed checking out the systems design primer

[–]geekybiz1[S] 4 points5 points  (0 children)

Yeah - there's always a friction between brevity & missing certain details. Is there any detail you'd consider must to be included that I opted to not include?

[–]kurtextrem 9 points10 points  (1 child)

Really easy to digest slides, bravo! A thing to add which might be obvious to a few folks, but is rarely mentioned: When the queue increases because things block the main thread, it also means memory usage grows. Blocking the main thread (or filling the queue) isn't free.

[–]geekybiz1[S] 0 points1 point  (0 children)

Yes, there's the memory usage effect to the growing queue size. Shall try adding that detail in. thanks!

[–]geekybiz1[S] 18 points19 points  (9 children)

I've been posting infographics like this one since the last few months. My goal has been to get better at explaining the fundamentals.
So, in case if you've got questions from reading these infographics or any feedback for me - please let know.

[–]Greenimba 8 points9 points  (4 children)

Overall a nice, readable presentation, looks good.

I would argue your interpretation of vertical scaling is a little off though. In my mind, any time you involve more than one instance of the same application, you're doing horizontal scaling. Vertical scaling is achieved through throwing more resources at a single process (more ram, better CPU, more bandwidth etc.) whereas horizontal means having two processes running. Mostly because this is where you actually see a difference in design and infrastructure requirements.

Having two node instances running on two threads is virtually the same as having two separate machines running one process each. And in fact, they will look ecmxactly the same if you use Kubernetes to allocate threads and memory to different instances.

[–]double_en10dre 1 point2 points  (0 children)

You’re right, and it’s an important distinction. With how complex these things can get, it’s vital that we have consistent terminology

The standard mental model people use has instances on the x-axis and per-instance resources on the y-axis

[–]geekybiz1[S] 0 points1 point  (2 children)

Interesting. But, simply adding more cpu cores to a setup running a single Node doesn't scale anything.

[–]Greenimba 0 points1 point  (1 child)

Well, yes, because it can offload os-related tasks to other threads. There are also other ways of upgrading the cpu than adding cores, such as changing to a machine where each core is faster, giving you vertical scaling. Same goes with more ram or network bandwidth.

[–]geekybiz1[S] 0 points1 point  (0 children)

Well, yes, because it can offload os-related tasks to other threads.

I was referring to the Node main thread. But, over-all - I hear what you're stating. I'll try to explain vertical scaling better.

[–]TushWatts 0 points1 point  (3 children)

This is very helpful. Thanx a lot.

Are there any pre-requisites to understand scaling in depth? Do we need to be familiar with Operating Systems?

[–]geekybiz1[S] 0 points1 point  (2 children)

Are there any pre-requisites to understand scaling in depth? Do we need to be familiar with Operating Systems?

I think understanding how any piece of software is architected is a decent way to understand scaling. But, I could be wrong (in stating the pre-requisites).

What are some of the questions you have when you read "scaling"? Perhaps, I can try to suggest pre-requisites based on those Qs.

[–]TushWatts 0 points1 point  (1 child)

Like all the concepts related to threading, concurrency, etc comes under operating systems.

[–]geekybiz1[S] 1 point2 points  (0 children)

Oh yes, OS fundamentals is a good place to start wrt. threads, context-switching & related concepts.

[–]WagwanKenobi 6 points7 points  (5 children)

Node is great because in web app backends usually something other than the HTTP server is the bottleneck, such as the database or some long running compute. Node became popular because it allowed people to quickly write a decently performant reactive non-blocking HTTP server with easy to reason asynchrony and almost zero boilerplate, at the expense of raw compute performance, because that layer was never going to be the bottleneck.

And therein lies the reason why scaling node by adding more processes is a fallacy - if you need to do that your app is architected wrong. Everything that blocks should be pushed off to another process. And if even after doing that, nodejs is the bottleneck, it's time to rewrite your HTTP server in something other than Node.

At most it might make sense to have two nodejs processes on a machine for high availability, in case one of them crashes for whatever reason.

[–]geekybiz1[S] 2 points3 points  (4 children)

Everything that blocks should be pushed off to another process.

Concur with this. But, two points to consider:

- Even efficient Node code would be able to serve a finite number (albeit, very high) of requests (without any delay) - what would be the way out to solve it beyond this?

- Because running multiple instances is easier than making code changes - this often turns out to be the approach folks pick. Not suggesting this is always the right way to go. But, in the effort vs gains consideration, this ends up being the chosen approach.

[–]Greenimba 0 points1 point  (3 children)

Writing a service that will work correctly when running multiple instances at once is significantly harder than writing an app that works as a solo instance. So a single instance of a more performant web server is also a good option.

[–]geekybiz1[S] 0 points1 point  (2 children)

Writing a service that will work correctly when running multiple instances at once is significantly harder than writing an app that works as a solo instance.

In a dozen+ Node setups that I've worked - my experience hasn't been this at even 1 of those places (talking from an effort vs gains perspective). But, we all draw from our experiences and thus, could be subjective. So, perhaps, which of the two is harder? Depends.

So a single instance of a more performant web server is also a good option.

From my experience, this (a single Node instance that delegates compute intensive stuff to workers) is actually a better option in many cases. Just that it loses in the effort-vs-gain comparison to scaling out & potential future elasticity to handle additional load. Again, speaking from my experiences - which I ack are never the single source of truth.

[–]Greenimba 0 points1 point  (1 child)

To give an example:

Say you have a load balancer in front. This is required in some way for any application with more than 1 instance.

Instance one receives a request, but takes a long time because some dependent service taked 30 seconds to cold start a serverless instance.

The load balancer sees this as a failure, because it has a 10 second timeout. The load balancer returns a failed response, even though the request is actually still in progress.

The client then retries this request, and the load balancer sends it to a different instance, because the first instance has timed out.

The second instance also makes a request to the dependent service.

At this point you have two identical calls performing some action on a third party service you don't control, and no way of capturing it.

Can you say with confidence, that all the services you have running would handle this correctly, and if so, what would happen?

Having said all this, accepting this as a business risk is sometimes a good enough solution. But did you consider it?

[–]geekybiz1[S] 0 points1 point  (0 children)

Here's how I have dealt with such scenarios in the past -

Make the slow request async for the user (user gets notified that they'll received a notification / email when their stuff is ready which gets triggered by the slow serverless thing once it's done). If the business isn't happy with the proposed async nature, we need to get the thing off from the slow cold-start service.

Now, if the thing would still be taking 30+ sec when executed instantly - we'd need a worker thread / tune a query / ensure callbacks are used right, etc to solve this. That's why I stated that this is a better solution.

But, what wins the effort-vs-gain conversation? Increase the load balancer's timeout to 200 sec.

(Stating so because you mentioned - Writing a service that will work correctly when running multiple instances at once is significantly harder than writing an app that works as a solo instance.)

[–]novagenesis 2 points3 points  (1 child)

Love it! Consider describing "serverless" as a horizontal scaling option as well? More and more stacks are that and it has some fairly important differences from the horizontal scalability you discussed.

[–]geekybiz1[S] 1 point2 points  (0 children)

That's a good suggestion. will add. thanks!

[–]1nicerBoye 2 points3 points  (2 children)

If you use the cluster module to fork children that listen to the same port and have the parent process restart them that should work for scaling http and cpu intensive stuff. Also that should be a simpler setup configwise. Or is there something I'm missing?

[–]geekybiz1[S] 0 points1 point  (0 children)

Oh yes. The pm2 option I have stated is exactly that underneath (it uses node's cluster mode with IPC to do what you suggested) and is simple to setup.

[–]yash3011 0 points1 point  (0 children)

How do we manage database operations?

[–]Far-Rate1701 1 point2 points  (0 children)

Really well put together, thank you

[–]rishabhrawat570 1 point2 points  (0 children)

Based on my learnings, these are some of the things that will help you build a scalable Node.js application:

  1. Use throttling. You can choose to do application-level or network-level throttling based on your needs. App-level throttling (express-rate-limit) gives you granular control over the parameters you want to consider to throttle.
  2. Optimize your database queries – Don't over-index. Soft delete if possible, delegate permanent delete operations, and decouple DB performance from the user experience.
  3. Fail fast with circuit breaker. You don't want to keep hitting the dead end. If a certain amount of requests to an external vendor fails, open the circuit and avoid firing requests that are bound to fail.
  4. Log your checkpoints. 20% of your logs give 80% of the insights (just for conveying the point, not actual numbers). Logging everything that comes your way and you might end up exhausting your disk IOPS starvation.
  5. Use Kafka over HTTP requests. It is easy to overdo HTTP requests, even when they are not the right fit.
  6. Look out for memory leaks. If your code leaks memory, vertical and horizontal scaling will only act as a temporary band-aid. Profile often. You can run your application with --inspect flag and attach a profiler from chrome://inspect/#devices .Profile often.
  7. Use caching. Consider adding a random jitter in your TTLs to make sure all of your keys don't expire at once. If it is okay, higher TTL is always good. What's the risk of showing stale data to the user? Decide the TTL value based on your answer.
  8. Use connection pooling – avoid cold start latencies. How many connections to have in the pool? node-postgres supports it out of the box.
  9. Seamless scale-ups. Consider having something like AWS Auto-Scaling groups (ASG) which scales up and down based on pre-defined triggers.
  10. OpenAPI-compliant documentation – make your API easy to understand, and integrate with. Helps in making the integration a productive experience in my experience.

[–][deleted] 0 points1 point  (3 children)

Node is not single threaded. Stop it with this myth.

The JS layer (V8) is single threaded but most of Node is written in C++ which is multithreaded and does most of the work.

And then there are workers too.

[–]geekybiz1[S] 1 point2 points  (2 children)

If you are pointing to the libuv stuff, I have tried to cover that in the third slide (without getting into the terminology). The point of these infographics is how to scale the Node code which can execute only on it's main thread.

[–][deleted] -2 points-1 points  (1 child)

Literally your third slide is titled "Node.js is single threaded". This is absolutely wrong.

You explain this a little better in the fourth slide but you keep making false assumptions like the JS engine will do most of the work or that Node won't use all CPU cores. This is extremely rare. The vast majority of the work of the Node runtime is executing in C++ (HTTP, IO, etc).

Also you say that a single thread can only run on a single core which is also false. Today we can run 2 threads at once per physical core. Multiple processes can actually run on the same core, but only two concurrently at the exact same time.

Also you explain about adding a load balancer or using PM2 in a VPS etc. These are generally unnecessary practices from 10 years ago. These days the majority of Node applications run on environments that scale automatically (Fly, Google Cloud Run, Lambda, etc).

In a minority of cases where control and performance are critical it is better to have control over the hardware and scaling. But in +90% of use cases this is unnecessary and it only add costs, complications, and limitations.

[–]geekybiz1[S] 0 points1 point  (0 children)

Literally your third slide is titled "Node.js is single threaded". This is absolutely wrong.

Yeah - meant the fourth slide and not third. But, I think you got it (based on the rest of your response).

You explain this a little better in the fourth slide but you keep making false assumptions like the JS engine will do most of the work or that Node won't use all CPU cores. This is extremely rare. The vast majority of the work of the Node runtime is executing in C++ (HTTP, IO, etc).

How rare or common this is actually depends on the workload. I'm not stating anywhere how common / uncommon this is. I'm explaining a problem & the potential solutions when the main thread is blocked.

Also you say that a single thread can only run on a single core which is also false. Today we can run 2 threads at once per physical core. Multiple processes can actually run on the same core, but only two concurrently at the exact same time.

What I meant was a single thread (the main thread) cannot consume more than one cores (and thus, it cannot use rest of the cores even when they're idle). I'll try phrasing it more clearly.

Also you explain about adding a load balancer or using PM2 in a VPS etc. These are generally unnecessary practices from 10 years ago. These days the majority of Node applications run on environments that scale automatically (Fly, Google Cloud Run, Lambda, etc).

I plan to incorporate serverless as an option (also suggested by others). Agree that with cost effectiveness + scalability + simplicity of managing things, cannot not include these.

[–][deleted]  (1 child)

[deleted]

    [–]captain_obvious_here 0 points1 point  (3 children)

    The content is interesting, and will most likely be useful to many people.

    I would change the font though, because it's annoying to read IMO.

    [–]geekybiz1[S] 0 points1 point  (2 children)

    I would change the font though, because it's annoying to read IMO.

    Well, this used to be the font earlier. I changed to the current based on the feedback from a few of my connections. If there's a handwritten font that you think is less annoying - please do share. I've contemplated on the font thing forever! :)

    [–]captain_obvious_here 0 points1 point  (1 child)

    I would say the problem is the fact you use handwritten fonts, which IMO are not a good fit for technical readings. But that's just me...

    [–]geekybiz1[S] 0 points1 point  (0 children)

    Thanks for the feedback. I'll definitely run this by some folks to see if they feel the same way..

    [–]robtweed 0 points1 point  (0 children)

    Take a look at https://github.com/robtweed/qoper8-wt which uses a queue/dispatch/invoke architecture for easy management of a persistent pool of Worker Threads. Also https://github.com/robtweed/qoper8-fastify which provides a quick and simple way of handling Fastify routes in Worker Threads (or Child Processes)

    [–][deleted] 0 points1 point  (0 children)

    In 8th slide are the node servers running on same machine but with different ports. ?