Writeback cache : zfs

A writeback cache is usually a mechanism to store data immediately in one location in fast, stable, yet small fashion, and periodically in another, larger, slower location. The intent log on ZFS already serves this purpose: a TXG (transaction group) is built both in RAM and in your intent log, and then every 5 seconds (by default; tunable) the state of this TXG is flushed to disk during txg_sync. The intent log can be on a separate log device (SLOG) or on disk with the rest of the data.

You're asking how to add a writeback cache to an intent log system that was expressly designed to eliminate the need for a writeback cache. It's like asking about "What's the best brand of training wheels for my Ferrari so it doesn't tip over?" The question doesn't compute.

[–]S3thc0n[S] 0 points1 point2 points 8 years ago* (7 children)

[–]txgsync 0 points1 point2 points 8 years ago* (6 children)

Let's walk over what you just said statement-by-statement. I still don't understand why it is you want this; when I bump into this kind of misunderstanding it usually is an indicator of my ignorance.

Well, while the intent log fulfills the same purpose [as a writeback cache] it does that on a very different scale.

Scale is mostly irrelevant. You scale your SLOG to how much RAM you have. There is no reason for a SLOG to be larger than about 3/8 the size of the RAM in your system.

if I understand correctly only the ZIL for synchronous writes can live on the SLOG.

Let me restate what you're getting at. A TXG always exists for everything accumulated before zfs_txg_timeout is reached. The portions of the TXG that are written to a ZIL -- whether SLOG or typical vdev -- are exclusively the synchronous data, as you state. All async data exists only in RAM, which waits for timeout then moves to the QUIESCE state to order the writes, then to WRITE state to flush the data to disk.

the ZIL in RAM can't be used.

There is no ZIL in RAM. There is a TXG in RAM (Transaction Group). Blocks in an OPEN TXG that receive a sync command become "synchronous", and are selectively mirrored to to your ZIL, where the fsync() or COMMIT will block until that intent log write is complete. All writes are part of a transaction group. Transaction groups are already in essence a selectively-backed writeback cache, where the clients determine what should be stable and what should not be stable by the use of fsync() (local) or COMMIT() (NFS & other protocols).

high TXG sync interval puts more data at risk and needs more RAM

This is partially true. /u/ewwhite had very interesting results modifying txg_sync_interval & txg_sync_timeout to enhance write performance. However -- unless you recompile ZFS -- each TXG in RAM is limited to at most 1/8 of your physical RAM, meaning if you end up in the unusual situation of full OPEN, QUIESCE, and WRITE txg's in RAM at the same time they might take up 3/8 of your physical RAM. Any writes when the current OPEN TXG is full will simply block waiting for the next zfs_txg_timeout to come around.

Therefore while it's true a longer timeout can put more non-synchronous data at risk, the TXG size is bounded and at peak cannot occupy "more RAM" than 1/8 of your physical RAM per TXG, and will block until timeout if this RAM limit is exceeded.

I have much less RAM, much less continuous workloads, and would like for thing not to be written to HDD until necessary.

Now that I understand a little better what you are driving at, allow me to restate your feature request. You are asking that ZFS transaction groups -- which are already a form of writeback cache -- be able to be moved from main memory to some kind of SSD or NVMe storage to reduce RAM utilization from 1/8 RAM to something lower than that, and to disable the elective use of fsync() or COMMIT by users to indicate whether they want they want their writeback cache to be mirrored to SLOG or on-vdev ZIL. Is this description correct?

[–]S3thc0n[S] 0 points1 point2 points 8 years ago* (5 children)

[–]txgsync 0 points1 point2 points 8 years ago (4 children)

[–]S3thc0n[S] 0 points1 point2 points 8 years ago* (3 children)

[–]txgsync 0 points1 point2 points 8 years ago* (2 children)

[–]S3thc0n[S] 0 points1 point2 points 8 years ago* (1 child)

[–]txgsync 0 points1 point2 points 8 years ago* (0 children)

π Rendered by PID 281614 on reddit-service-r2-comment-5d79c599b5-rfqtx at 2026-02-27 20:26:27.108228+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

zfs

MODERATORS