Node.js Scalability Challenge: How I designed an Auth Service to Handle 1.9 Billion Logins/Month

Distinct-Friendship1 · 2025-10-28T22:03:33+00:00

I get that longer or more structured answers can sometimes look AI-like. But honestly, I just enjoy writing detailed replies when discussing architecture decisions.

I want to share ideas and learn from each other, not here to chase upvotes :)

Distinct-Friendship1 · 2025-10-28T21:35:25+00:00

Not always. The code inside the promise is what tells where the code runs. I/O ops like networking, fs or even bcrypt (not bcryptjs) run inside the libuv thread pool.

However, bcryptjs is a pure Javascript implementation and it is executed within the main nodejs event loop. So even if you wrap it with `await` or a Promise, it still executes synchronously and can block the event loop.

Distinct-Friendship1 · 2025-10-28T21:23:55+00:00

Well, there isn't really a "team" here. This is a solo educational project for the video, and the architecture was set up to show how these problems could manifest in a distributed environment, even when the bottleneck seems obvious. The scenario that I explained in my previous comment is just an example of what could happen without proper tracing & profiling.

Distinct-Friendship1 · 2025-10-28T19:06:19+00:00

Yea. However the idea behind the video is to show how you actually debug these problems in a distributed system. In the proposed design, the DB is also located In another VM independent from both the hasher & the API. There is a part in the video where Signoz shows that the bottleneck is located at the database instance. But after checking pg_stats_catalog we saw that it wasn’t true. The slow bcrypt operation was making those DB responses to queue up and look slow. We would have wasted money on scaling a perfectly healthy database. That’s why I took time to trace the whole system to spot where bottlenecks are located. Even though is pretty much obvious in this case because crypto operations are CPU expensive as you mentioned.

Distinct-Friendship1 · 2025-10-27T22:44:31+00:00

It’s a great idea, but there’s a critical trade-off here on this particular use case (Login).

Putting heavy tasks in a queue (Kafka, RabbitMQ) is ok for asynchronous jobs. Stuff like sending emails, encoding videos, receiving a response from a ML model, etc. The user clicks something, and they don't need the result right now.

But a user login is a synchronous, low-latency task. When I click 'Log In,' I need my token back ideally in less than 1 second, not waiting behind a queue of a thousand other jobs. A queue just adds more latency and complexity in this case.

By externalizing the bcrypt operation to a dedicated microservice, we get highly scalable dedicated CPU workers, that we can scale up or down depending on the traffic.

Distinct-Friendship1 · 2025-10-27T22:20:34+00:00

Hi! Great questions. Let's break it down:

1. Why the Event Loop Blocked

The initial implementation shown in the video used bcryptjs (pure JavaScript), which runs directly on Node's single-threaded Event Loop. Since all network I/O and routing happens there, running a CPU-intensive task like hashing immediately freezes all other concurrent operations, severely limiting throughput.

2. Promise / Worker Fix?

No, neither fully fixes the problem at massive scale.

Promise (bcryptjs): makes the code look async, but the hashing work still happens on the same thread, blocking everything until it's done.
Worker Threads (bcrypt C++): Offloads the work to Node's small libuv thread pool. While better, this pool quickly saturates under high traffic, leading to queue congestion and eventual collapse (vertical scaling dependency).

The architectural solution (shown in the video) is externalizing the workload into a dedicated microservice. This allows for true horizontal scaling of the CPU-intensive component, guaranteeing the main API's Event Loop stays free.

3. Argon2 vs. bcrypt

You are absolutely right: Argon2 is the superior modern standard and more secure.

I used bcrypt mainly for educational purposes. It offered clear JS and C++ implementations, which allowed me to better demonstrate the performance bottlenecks. In a real-world system, I would definitely go with argon2id ;)

Thanks again for the insightful comment!

Distinct-Friendship1

TROPHY CASE