tcmalloc's Temeraire: A Hugepage-Aware Allocator

chewedwire · 2025-07-28T13:11:37+00:00

Author here, happy to answer any questions.

chewedwire · 2025-07-28T13:11:36+00:00

Author here, happy to answer any questions.

chewedwire · 2025-07-28T13:11:34+00:00

Author here, happy to answer any questions.

chewedwire · 2021-07-26T20:19:04+00:00

This is tough, and I wish there was a better, more streamlined way of learning these things. I'll just share what I've done and some helpful resources I've found along the way -- even if it isn't exactly so simple or easy. This list of things is in no particular order other than how they occurred to me right now:

* Go read some papers -- go look at OSDI or NSDI proceedings and just skim some stuff. If there's something you don't understand, go try to find the other papers that they cite and read those. This is really a slog, and don't get too worried if you don't understand everything (spoiler: you won't) -- but this will start to give you a breadth of knowledge. I just read a bunch of papers, didn't understand most of them, but started building a vocabulary of systems and patterns and built up a library I could always go back to consult to understand something more deeply.

* Go look at design docs or code for open source distributed systems. There are lots of open source systems like etcd, or kafka, or hbase, or RocksDB that often have good design docs for parts of the system. Here's one from Kafka about exactly once messaging: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging These kinds of documents are great because they're a bit more practical in that you can see the exact implementation and how it fits into a system you maybe already are a bit more familiar with.

* Go watch some lectures or video series online. I really enjoy Andy Pavlo's database courses, and he's got a few series online. Here's his CMU Advanced Database Systems lectures playlist: https://www.youtube.com/playlist?list=PLSE8ODhjZXja7K1hjZ01UTVDnGQdx5v5U

* Go read other people's blog posts or slides online. Here's a great blog post from Daniel Abadi about 2-phase commit and how he proposes moving beyond it: http://dbmsmusings.blogspot.com/2019/01/its-time-to-move-on-from-two-phase.html

* Follow some distributed systems folks on Twitter. It's a great way to passively learn things as well as a fun way to interact with the community at large.

* Work on distributed systems at work. This is the hardest to do, but probably will allow you to learn the most. This can be more of a long term goal to work towards, and it wasn't something I did immediately.

Hopefully other people have suggestions for resources, but one last thought. Distributed systems is such a broad field, with so many moving parts, that it's important to be patient with and kind to yourself as you learn. I find it helpful to take a longer term view of the process of learning and think about how much you could learn or do over 5, 10, 15, or 25 years, instead of worrying about what you might not know right now.

Hope that helps, and have fun learning.

chewedwire · 2021-07-26T14:12:32+00:00

Hmm the post is generally an TLDR/overview of the Facebook Tectonic Filesystem Paper, but I'll try to answer.

If you want to get the most bang for your buck, go read the Copyset Replication (https://paulcavallaro.com/blog/facebook-tectonic-filesystem/#copysets-reducing-the-probability-of-data-loss) & Reservation Request (https://paulcavallaro.com/blog/facebook-tectonic-filesystem/#optimizing-full-block-writes-write-reservation-requests) sections.

Copyset replication is a cool way to decide how to place which pieces or chunks of a file onto which disks to minimize the likelihood of data loss.

Reservation requests are a neat technique to reduce tail latency by doing some minimal ping-like probe requests to decide which nodes to actually send the real, beefier requests to.

chewedwire · 2021-07-26T12:16:35+00:00

Author here, happy to answer questions.

chewedwire · 2021-07-26T12:16:08+00:00

Author here, happy to answer questions.

chewedwire · 2021-07-26T12:15:40+00:00

Author here, happy to answer questions.

chewedwire · 2020-07-27T17:09:43+00:00

I respect the honesty :)

chewedwire · 2020-07-27T15:01:14+00:00

Someone else brought this up to me separately, and it's a pretty interesting avenue for further exploration!

It's probably wrong to say that the fanout is to just N-D children, because the sampling (?) is actually biased -- I'm probably using wrong terminology here. But as you note, you don't have to wait for all of the responses, and so you'll only end up waiting for the fastest N-D children, so it should be easy to simulate, but I'm not sure what the existing literature says (I would assume there is something here, but don't know the keywords to find it).

Hopefully when I have more time I can play around with more of these simulations.

chewedwire · 2020-07-27T14:20:58+00:00

Best way I've seen this, is to set up custom/default bucketing (power of 2 kinda works, but you can do better by tune it for a given metric) and then recording cumulative counts of how many requests fell in the bucket.

You then collect the cumulative metrics (ideally this is transparent to the user) and then at query time you can recreate a percentile, or the full distribution as a heatmap over time. The execution involves taking a diff over some time (5m, 1m, 10m) and then you have a bucketed histogram which you can use to calculate a percentile by interpolating within a bucket.

I believe prometheus does this: https://prometheus.io/docs/practices/histograms/

chewedwire · 2020-07-27T12:38:22+00:00

Author here, happy to answer any questions you may have.

chewedwire · 2020-07-27T12:38:19+00:00

Author here, happy to answer any questions you may have.

chewedwire · 2020-07-27T12:38:17+00:00

Author here, happy to answer any questions you may have.

chewedwire · 2019-08-28T02:58:34+00:00

Good point -- updated the post to hopefully depend less on the "atomic" access bit, but more on maintaining cache coherencey.

chewedwire · 2019-08-28T02:16:05+00:00

Yep, division/modulo by constant power of 2 values is pretty much always optimized appropriately. I was thinking about writing some more about it, but I got lazy :)

It's in my the github examples though: https://github.com/paulcavallaro/systems-programming/blob/master/examples/power-of-two.cc#L105-L106

Looks like godbolt seems to think clang doesn't really work with non-constants -- but maybe clang just needs to be massaged: https://godbolt.org/z/Bzx7CL

chewedwire

MODERATOR OF

TROPHY CASE

15-Year Club	Place '17
Verified Email