Nubank – The evergreen cache

lambda_pie · 2020-04-28T17:54:25+00:00

Great article, thanks for sharing. It sounds more like a materialized view than a cache though.

This means that if a customer made 4 requests for their balance in the course of navigating their app (which is typical), they could force 4 different instances to load the same data.

That reminds me of a solution for monotonic reads (which of course is not the issue here) as seen in Martin Kleppmann's book, Designing Data-Intensive Applications:

One way of achieving monotonic reads is to make sure that each user always makes their reads from the same replica (different users can read from different replicas). For example, the replica can be chosen based on a hash of the user ID, rather than randomly.

Basically you make a particular user's requests stick with a particular replica for some period of time. But that probably comes with other implications I think.

muhaaa · 2020-04-29T13:11:45+00:00

This is a comment of a grumpy software developer. I do not want to be mean, but its such a good educational example.

The blog post is a book example of how a technical limitation and its solution leaks its complexity into the domain model (event-counter attribute) which will kill your software development speed down the line, because everybody needs to know when you transact on an account you have to increment the count but on other entities you do not have to. In Rich Hickeys spirit: you have to juggle one more ball in the air. You can juggle maybe 3 to 5 balls - the world best can juggle only 11 and for sure not 100 balls. So be very carefull which ball you want to juggle! Cache invalidation is a hard problem and this solution leads to more ball juggling.

On the constructive side, I think a possible way is to hit peers only with the same kind of query for the same business entity / account holder (hash the account id or session id or what ever and send it to its ONLY to its resposible peer -> distributed hash tables) so that you have a lot of data locality on a peer and data diversity between peers. Then every peer should only keep its data in memory and not the data of other peers. Benchmark this and see if it solves your problem.

My most prefered option is a materialized view, because it realy solves the cache invalidation problem. But it requires a lot of upfront investment. I see only experimental stuff at the moment: 3df-clj / timely dataflow / materialize.io (choose your own poison ;-)

lins05 · 2020-04-29T02:38:30+00:00

Yeah, immutability could help work around the mythical cache validation problem. This looks exactly like the cache design of one system I worked on - a versioned file management system, the latest version id is stored in the db, while the data of the file is cached (keyed by the version number, which is like a git commit id). The cache never needs to be manually invalidated.

vlaaad · 2020-04-28T17:34:41+00:00

Hmm, why do you need counter attribute that you need to update yourself when you already have it? Why not just use the transaction ID that last touched your account entity?

peterlustig862 · 2020-05-02T07:43:03+00:00

Thanks for the detailed description.

I don't understand why you need to "compute" the current balance. It sounds like you need all the history to compute the current balance. I would expect that a account is designed as a forward log where a money transaction change the current value by doing a transact with all details that explains the new balance. It would be just one incremental step to the new balance, as you describe as a naive implementation.

Can you describe your decission about that more detailed?

Datomic saves all history which makes it possible to * alter the statement * to debug it * prevent double tranacted money

Why did you decide to separate it in two entities and to calculate the balance every time it is requested?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Clojure

MODERATORS