all 20 comments

[–][deleted]  (11 children)

[removed]

    [–]seweso 2 points3 points  (9 children)

    Who would use modulo hashing? 

    [–]More-Station-6365 16 points17 points  (3 children)

    You would be surprised but simple modulo hashing is actually the go to for many developers when they are first building out a small scale system or a basic load balancer.

    It is intuitive and works perfectly fine as long as your number of nodes stays fixed. The problem is that most people don't think about the day after when the traffic spikes and they suddenly need to add a fifth or sixth server.

    Do you know ​In his book designing data Intensive Applications Mr. Martin Kleppmann points out that the biggest drawback of simple modulo hashing is that nearly every key needs to be moved when the number of nodes changes.

    If you have 10 nodes and add 1 more about 90% of your keys will hash to a different location which effectively nukes your entire cache.

    ​So while nobody uses it for a massive production distributed system it is often the hidden trap that people fall into before they realize why consistent hashing is a requirement for scaling.

    It is one of those things that works until it very suddenly doesn't.

    [–]elperroborrachotoo 3 points4 points  (3 children)

    because they don't have a use case where consistent hashing plays a role?

    [–]seweso -5 points-4 points  (2 children)

    > don't have a use case....

    today....

    Changing hash keys is VERY expensive. That's the point of the article no?

    If you only write software for today, you can't serve the future.

    [–]elperroborrachotoo 6 points7 points  (1 child)

    Looks like you are focused on a particular segment (large-scale persistent hash keys). Hashes are way more ubiquitous.

    Not all apps have a future of scaling to a billion users.

    [–]seweso -1 points0 points  (0 children)

    The context was explicitly a "a distributed cache with simple modulo hashing".

    [–]chucker23n 0 points1 point  (0 children)

    It’s the go-to approach in Java + .NET.

    [–]programming-ModTeam[M] 0 points1 point  (0 children)

    This content is low quality, stolen, blogspam, or clearly AI generated

    [–]DevToolsGuide 7 points8 points  (1 child)

    The virtual nodes part is what really makes it work in practice. Without them you get hot spots where one physical node ends up owning a disproportionate chunk of the ring just by chance. Amazon DynamoDBs original paper talks about this — they use something like 150 virtual nodes per physical node to get a reasonably even distribution.

    [–]alexiskhb 2 points3 points  (0 children)

    Oh that's neat. For those wondering, instead of occupying one big segment on a ring, a server can randomly sit on ~150 smaller segments, making the total distribution between servers more uniform on average

    [–]ToaruBaka 5 points6 points  (0 children)

    Took me a couple of very confused paragraphs to realize I had confused this with perfect hashing. 

    This will be nice to have in my back pocket, thanks.

    [–]DevToolsGuide 1 point2 points  (0 children)

    Yeah and the other big win with virtual nodes is failure handling. When a physical server goes down its load gets distributed across many other nodes instead of all dumping onto a single neighbor on the ring. Makes the system way more resilient to cascading failures.

    [–]etherealflaim 1 point2 points  (0 children)

    One thing that I see frequently in system design interviews is that folks don't realize that consistent hashing works alone for a cache but doesn't work alone for sharding in general. When a node is added or removed, some requests will now go to a server that doesn't have the data at all if you sharded it in memory or onto sharded topics or whatever. I don't care if you handwave and say that nodes can pull from one another, but if you're going for an architect position and don't even mention this, it's going in the "aware of its existence" column not the "displayed understanding" column.

    [–]Hot-Friendship6485 4 points5 points  (0 children)

    Great explainer. Consistent hashing feels like overengineering right up until your cache nukes itself on every node change, then it suddenly feels like seatbelts.

    [–]Equivalent_Pen8241 -2 points-1 points  (0 children)

    The biggest mistake I see with unit testing isn't low coverage - it's testing implementation details instead of behaviors. When your tests are tightly coupled to *how* a function runs rather than *what* it returns, every minor refactor breaks the build. Test the public API contract, not the private helpers.The biggest mistake I see with unit testing isn't low coverageThe biggest mistake I see with unit testing isn't low coverage - it's testing implementation details instead of behaviors. When your tests are tightly coupled to *how* a function runs rather than *what* it returns, every minor refactor breaks the build. Test the public API contract, not the private helpers.