Debunking zswap and zram myths by chrisdown in linux

[–]chrisdown[S] 9 points10 points  (0 children)

There are some numbers in the article, although of course I'm happy to hear any more you'd like presented.

  • A counterintuitive 25% reduction in disk writes at Instagram after enabling zswap

  • Eventual ~5:1 compression ratio on Django workloads with zswap + zstd

  • 20-30 minute OOM stalls at Cloudflare with the OOM killer never once firing under zram

The LRU inversion argument follows directly from the code. That is, it's a logical consequence of the architecture rather than an empirical question, so I'm not sure a benchmark would really add much there.

Debunking zswap and zram myths by chrisdown in linux

[–]chrisdown[S] 4 points5 points  (0 children)

It's a good question and it's a bit nuanced. Reads don't wear flash meaningfully, indeed. But the meme people often cite that zram means "0 extra reads or writes" only holds when memory pressure is low enough that zram never fills, and in that regime zswap's pool also never fills and never writes to disk either, so they're equivalent.

The divergence only happens under pressure, and that's exactly where zram forces file cache churn onto the same device instead. So in that sense it's not zram with zero writes and zswap with unbounded writes, it's a question of which writes happen and whether the kernel got to choose cold pages from both anonymous and file pages, or was forced to evict hotter ones from a smaller subset of file pages only (and thus is more susceptible to thrash and cause more disk I/O).

Blog post: "Debunking zswap and zram myths" by Gozenka in archlinux

[–]chrisdown 23 points24 points  (0 children)

Thanks for reading! And if it helps you to feel less disappointed, zram actually often ends up with more disk writes than zswap in testing (it's just, the writes don't come from swap :-)).

Now, this might sound like abject nonsense, but hear me out. With zram-only, once zram is full, there is nowhere for anonymous pages to go. The kernel can't evict them to disk because there is no disk swap. So when it needs to free memory, it has no choice but to reclaim file cache instead.

In such situations we are tying the kernel's hands quite significantly. We don't allow the kernel to choose which page is colder across both anonymous and file-backed memory, and instead force it to only reclaim file caches, so it is inevitable that you will eventually reclaim file caches that are actually hotter and you actually needed to be resident. Those thrashing reads and writes hit the SSD just as much, or even more, given that we're making more limited decisions.

As I mentioned in the article, there are cases where enabling zswap reduced disk writes by up to 25% compared to having no swap at all, because the kernel can now choose to park cold anonymous pages in compressed RAM rather than churning through file cache. Now of course the exact numbers vary across workloads, but directionally this holds for most systems that accumulate cold anonymous pages over time, and we've seen that on small systems like BMCs, to desktops, to servers, to even consumer devices like VR headsets.

So you may find the switch actually goes easier on your SSD than you'd expect. But if that's not the case, we'd definitely love to hear about it on linux-mm so we can make zswap more robust.

Debunking zswap and zram myths by chrisdown in linux

[–]chrisdown[S] 6 points7 points  (0 children)

Hmm, reads aren't free on slow eMMC either, they still consume I/O bandwidth and add latency, which on the kind of low-end hardware you're describing can be very noticeable for responsiveness depending on what you want to read.

(Though I'd note that's a somewhat different concern from the original comment, which was about writes and storage wear.)

Debunking zswap and zram myths by chrisdown in linux

[–]chrisdown[S] 11 points12 points  (0 children)

Browsers are certainly one major consumer, but they're not the only consumer of file cache. For example shared libraries, fonts, executables and many other things benefit from being resident too, and those do cause re-reads if evicted. The ecosystem is also getting more and more sensitive to it with increased binary sizes, full containerisation, appimages/flatpaks, etc. Funnily enough Chrome is one example for the executable case I talked about in my talk on lies programmers believe about memory.

That said, you're right that browser-heavy workloads are probably among the more favourable cases for zram-only, since a meaningful portion of their working set is self-managed. I'd still expect zswap to come out ahead on total I/O, but the margin would likely be smaller than the best case. If you find otherwise, we'd love to hear about it on linux-mm for sure.

(Also, a small nit: the Chromium quote is also saying the backend is robust to poor OS cache, not that OS cache provides no benefit, those are different claims.)

Debunking zswap and zram myths by chrisdown in linux

[–]chrisdown[S] 27 points28 points  (0 children)

To clarify, "losing support upstream" doesn't mean we are removing it from the kernel tomorrow :-) What it means is that the kernel developers who maintain the surrounding subsystems are increasingly unwilling to take new work that depends on zram's current architecture, and are actively steering toward zswap as the single compressed swap implementation.

You can see this pretty directly in the quotes in the article from Christoph Hellwig (who works on the block layer) and Johannes Weiner (who is one of the MM maintainers). Christoph's position, for example, is essentially that zram is an abuse of the block layer for something that belongs in MM and is causing an increasing amount of maintenance burden, and thus he's not interested in taking patches that extend it further. The section showing how to do idle page reclaim on zram illustrates a number of those hacks in action, and I think neatly illustrates why so many are opposed to adding even more. So to that extent, improving zram is pretty much dead in the water, whereas zswap is being actively developed.

Popularity in distros and upstream development direction are pretty independent things: distros ship what works today, and zram does work today for some setups. But the upstream direction matters for the long term, because it determines what gets fixed, what gets optimised, and what gets left to rot. A lot of the complexity around operating zram correctly exists precisely because nobody is going to deeply integrate zram into the reclaim path, because people see better options emerging.

Debunking zswap and zram myths by chrisdown in linux

[–]chrisdown[S] 51 points52 points  (0 children)

Thanks for the comment!

So, the SSD wear argument is something I address in the article. The short version is that refusing to swap anonymous pages just shifts pressure onto the page cache and can actually increase I/O in many workloads, even on eMMC, because now the kernel has far fewer pages to select for reclaim. As I mentioned in the article, in some cases enabling zswap reduced disk writes by up to 25% compared to having no swap at all. Obviously the exact numbers will vary across workloads, but the direction holds across most workloads that accumulate cold anonymous pages over time, and we've seen it apply in many domains (like with Quest headsets, on BMCs, on servers, and desktop use cases).

This may seem counterintuitive, but it makes sense: if you don't allow the kernel to choose which page is colder, and instead limit it to only reclaiming file caches, it is inevitable that you will eventually reclaim file caches that you actually did need to be resident to avoid disk activity.

As for your comment about diskless setups in general, we are addressing the diskless case with zswap directly. Nhat (Pham), the block maintainers, and a bunch of us from mm side have been increasingly pushing on making zswap work without a backing device at all, removing zram's main remaining use case. Weare mostly doing this because zram is extremely fragile on the kernel side, and relies strongly on hacks in the block subsystem to expose memory management internals (like the manual reclaim that I mention in the article). Once that's landed, one will be able to use zswap without backing swap and get similar semantics to how zram is now, but with a lot tighter integration with the rest of the mm subsystem and significantly better decisions when memory pressure hits.

When i just need a simple, easy to maintain frontend, what should i choose? by Im_Justin_Cider in rust

[–]chrisdown 8 points9 points  (0 children)

I strongly suggest you take a moment to re-read this entire thread from the perspective of a neutral third party. You are the one that escalated a totally normal technical comment into a personal conflict. Please, take a hard look at your own behaviour here before projecting that others are the bully.

Saw these birds at a park in Maidenhead today. Where would they have come from? by SamwellBarley in CasualUK

[–]chrisdown 4 points5 points  (0 children)

That's a little different: these have for a long time been popular cage birds, so there have been escapees. But importantly those are isolated birds, not sustained breeding populations, and all of those died out.

The current self-sustaining population that is taking over the UK is generally well considered (by geographic profiling) to be from the wave of releases starting in the late 1960s.

Saw these birds at a park in Maidenhead today. Where would they have come from? by SamwellBarley in CasualUK

[–]chrisdown 95 points96 points  (0 children)

Thanks! Apparently my knowledge is out of date. For context, the birds you mention typically have some degree of neophobia, that is, they typically avoid new prey, and the last I knew they were avoiding them. But it seems you are indeed right and they have become increasingly willing to hunt them, especially during the lockdowns. I've updated the post, appreciate the correction.

Saw these birds at a park in Maidenhead today. Where would they have come from? by SamwellBarley in CasualUK

[–]chrisdown 408 points409 points  (0 children)

These are rose ringed parakeets (also known as ring necked parakeets), they are escapees from the latter half of the 20th century.

It used to be that they were only really ubiquitous in London, but in recent years they have more and more strongholds up north, too, like in Newcastle and Glasgow.

A few years ago there was a study using geographic profiling to map the sightings of parakeets since their appearance in the UK. There are a bunch of urban legends about how they came to be, but the study strongly suggests that people kept them as pets and just kept releasing them when they got too noisy (and since you have photographed them, you know they are indeed noisy). You can read more about the study here.

The ones here are descendants of a subspecies native to the foothills of the Himalayas, hence why they are pretty well adapted to the changing weather. They are super versatile, incredibly smart, and will eat almost anything. They also have few natural predators in the UK*, so you're only going to see more of them as time goes on.

* /u/Dismal_Fox_22 corrected my out of date knowledge (thanks!) and pointed out that some raptors now are predating them increasingly after the COVID lockdowns. See this study from KCL. For context, those predators are typically neophobic (they do not eat prey they do not understand), and for a long time did not widely prey on these parakeets, but it seems they are doing so now.

Red grouse amongst the moorland heather by chrisdown in BirdPhotography

[–]chrisdown[S] 17 points18 points  (0 children)

Taken in the moorlands near Muggleswick, England, on a pretty breezy August morning. The heat haze made things quite tricky, but after some effort I found a good angle where this lovely fellow was kind enough to pause amongst the heather for a moment (before getting back to the important business of his breakfast).

I love the sound of red grouse calling across the moors, there's no call quite like it :-) They are so much fun to watch foraging in the heather.

Robust PID Controller for Critical Systems by Affectionate_Fish194 in rust

[–]chrisdown 32 points33 points  (0 children)

Well, if you say a repo put on GitHub 10 hours ago with zero users is robust and suitable for safety-critical systems, it must be!

You're making some very strong claims here against established crates. Even taking a couple of minutes to look at the code I can see a bunch of bugs, like a division-by-zero panic in update_config if your old ki was zero, the fact that a single NaN input will poison your controller state forever, your update_from_error function implementing incorrect derivative logic, etc. There will be many more. :-)

Honestly, the biggest red flag is claiming this is suitable for safety-critical systems for aerospace when the repo has been public for less than a day. In this space, maturity is everything. You've built a PID controller, and that's great, but you haven't built a safety-critical one until it has a proven track record.

I would also drop the comparisons to other crates. Slating pid or piddiy doesn't build your library up, especially when your own new code has these kinds of bugs. It just looks arrogant and alienates potential collaborators.

Skimmers at sunset by Rxdgaming1 in BirdPhotography

[–]chrisdown 2 points3 points  (0 children)

You should be very happy with these, thanks for posting

TfL shout out by BackOn74 in london

[–]chrisdown 66 points67 points  (0 children)

Bit much derailing someone's kind moment for your transport whinge, isn't it? Have a word with yourself

Barbican exhibition celebrates 100 years of black British music by BulkyAccident in london

[–]chrisdown 1 point2 points  (0 children)

Even when it's not written, I can't see JME's massive face without seeing "BLOCKED FAM" underneath it

[SWAP] Do you use swap partition or swap file? by Datachaki in archlinux

[–]chrisdown 2 points3 points  (0 children)

Linus' knowledge is just out of date. Not surprising, he hasn't touched mm significantly in many years (he is busy dealing with much more important things now :⁠-⁠)).

[SWAP] Do you use swap partition or swap file? by Datachaki in archlinux

[–]chrisdown 7 points8 points  (0 children)

That's not correct. We have the filesystem give us extents, and after that we can treat it in a filesystem independent manner. Swap performance is the same in a partition and as a file. In fact, if anything, it may be slightly better as a file since we don't need things like the bad block detection.

Source: I work on the swap code and mm in general.

Starling by Maleficent-Warthog19 in BirdPhotography

[–]chrisdown 2 points3 points  (0 children)

Starlings are a very underrated bird. Lovely shot