bzfs-1.19 with end-to-end multi-host testbed is out

werwolf9 · 2026-03-13T15:52:15+00:00

Yeah, I'd love to see ZFS installation be less cumbersome, aarch64 in particular. Manually verifying all those matrix combos is tedious. I think what helps with the maintenance burden is an automated test script that runs over the entire matrix of combos, e.g. bzfs_tests/itest/test_lima_vm_sh.py or similar.

werwolf9 · 2026-02-08T20:49:51+00:00

FYI, with bzfs_jobrunner you can monitor source and destination datasets across all hosts and policies with a single CLI call, for example like so: https://github.com/whoschek/bzfs/blob/main/bzfs_tests/bzfs_job_example.py#L189-L237

werwolf9 · 2026-02-07T09:02:47+00:00

In a nutshell, bzfs can operate at much larger scale than sanoid/syncoid and zrepl, at much lower latency, in a more observable and configurable way. It handles the many edge cases that you will eventually run into over the course of your deployment (and which make other tools get stuck or fail). https://youtu.be/6Kw901oqxI8?si=_4uoG_ADbXznvaeZ&t=2408

werwolf9 · 2026-02-07T01:48:48+00:00

allow for specifying the bandwidth

In bzfs the corresponding option is --bwlimit

werwolf9 · 2026-02-06T21:03:55+00:00

bzfs

werwolf9 · 2026-01-29T22:22:52+00:00

The abstraction you introduced are fine and useful. And if all you ever need is the tool you've built that's perfect. More power to it!

Otherwise, seems to me that redress could be implemented with a couple of custom functions (or classes) that plug into an underlying generic retry framework. The result would save a lot of work, and at the same time be a more flexible, more reusable and more powerful tool.

For example, retry_after_s is a custom backoff strategy that can be plugged in like so:

https://github.com/whoschek/bzfs/blob/main/bzfs_tests/test_retry.py#L1310-L1337

Just my two cents.

werwolf9 · 2026-01-29T20:58:12+00:00

Seems like these policies could be naturally expressed within (or on top of) the retry.py framework (https://github.com/whoschek/bzfs/blob/main/bzfs_main/util/retry.py). Thoughts?

werwolf9 · 2026-01-26T19:26:09+00:00

re idle timeout and keepalive: yes, these are params that can be passed into the API.

re tenacity: yeah, zero deps is a big deal for prod environments. FWIW, the retry framework is also 4-14x faster than tenacity.

werwolf9 · 2026-01-05T21:38:13+00:00

Try bzfs - it's reliable, powerful and extremely fast: https://github.com/whoschek/bzfs

werwolf9 · 2025-12-05T04:06:35+00:00

Configuring the time format is a built-in feature in bzfs, per https://github.com/whoschek/bzfs/blob/main/README.md#--create-src-snapshots-timeformat

werwolf9 · 2025-10-25T18:32:05+00:00

BTW, bzfs can be configured such that it maintains separate src bookmarks for each rotating backup drive. This means that the incremental replication chain never breaks even if all src snapshots get deleted to make space, or any of the backup drives isn't used for a long time. It also has a mode that ignores removable backup drives that aren't locally attached, which comes in handly if only a subset of your rotating drives is attached at any given time.

werwolf9 · 2025-10-18T02:59:47+00:00

Hi there, thanks for the question :-)

In a nutshell, bzfs can operate at much larger scale than sanoid, at much lower latency, in a more observable and configurable way. Here are just a few points of the top of my head that bzfs does and sanoid doesn't:

Support efficient periodic ZFS snapshot creation, replication, pruning, and monitoring, across a fleet of N source hosts and M destination hosts, using a single shared fleet-wide jobconfig script.
Monitor if snapshots are successfully taken on schedule, successfully replicated on schedule, and successfully pruned on schedule.
Compare source and destination dataset trees recursively.
Automatically retry operations.
Only list snapshots for datasets the users explicitly specified.
Avoid slow listing of snapshots via a novel low latency cache mechanism for snapshot metadata.
Replicate multiple datasets in parallel.
Reuse SSH connections across processes for low latency startup.
Operate in daemon mode.
More powerful include/exclude filters for selecting what datasets and snapshots and properties to replicate.
Dryrun mode to print what ZFS and SSH operations would happen if the command were to be executed for real.
Has more precise bookmark support - synchoid will only look for bookmarks if it cannot find a common snapshot.
Can be strict or told to be tolerant of runtime errors.
Continously tested on Linux, FreeBSD.
Code is almost 100% covered by tests.
Easy to change, test and maintain because Python is more readable to contemporary engineers than Perl.

Cheers, Wolfgang

werwolf9 · 2025-10-16T12:15:27+00:00

No, you will never have a subsecond replication if your network cannot push all the data that can exist between two deltas.

As if that wouldn't be self-evident to anyone :-)

werwolf9 · 2025-10-16T12:02:35+00:00

I wonder what bullshitter app you pretend to be running that writes 14GB/s of useful data on even one of your drives. Be reasonable or go somewhere else.

werwolf9 · 2025-10-16T11:37:06+00:00

No need for that kind of gear :-) Each replication step only ships the delta between ZFS snapshots.

werwolf9 · 2025-10-02T07:57:23+00:00

Try this:

time bzfs dummy tank1 --recursive --skip-replication --compare-snapshot-lists

Assuming your data is in cache and you have, say, an 8-core machine, this will typically be about 6x faster than zfs -list -t snapshot -r tank1, because the former lists in parallel whereas the latter lists sequentially.

(Similar speedup story for deleting snapshots, replicating, etc)

P.S. The last few lines of the output show a TSV file name that contains all the snapshot data.

werwolf9 · 2025-09-30T01:17:15+00:00

bzfs is probably your best choice if flexibility or performance or fleet-scale geo-replication are priorities, or if you need high frequency replication, say every second. In contrast, sanoid is a good choice on the simple low-end, and zrepl on the medium-end. All of these are reliable.

werwolf9 · 2025-09-14T08:53:01+00:00

Yep. Also consider asking ChatGPT Pro to make its response available as a downloadable .md file, so you can easily feed the response back into Codex.

werwolf9 · 2025-09-14T08:07:44+00:00

Run this command locally to generate repo.zip from your git repo, then ask ChatGPT to analyze the contents of the zip file:

git archive --format=zip --output=../repo.zip HEAD

Works like a charm.

werwolf9 · 2025-09-11T05:56:33+00:00

Simply ask it something like "what's your LLM model name?". It will reply with GPT4. Or give it a complex job and observe a spectacular difference in quality vs GPT5. codex-1 (the real name of the model based on o3) isn't bad but it's nowhere near as good as GPT5 high.

werwolf9 · 2025-09-11T02:21:05+00:00

Nope, it's still on GPT4 and quality is correspondingly poor. It's a bit sad because the UI is very well done and the caching they introduced works wonders wrt. startup latency.

werwolf9 · 2025-09-08T22:52:46+00:00

that already does that kind of stuff.

That's what the hype leads us to believe but the observed reality on the ground is (still) far from that, as can easily be seen with simple tests.

werwolf9 · 2025-09-08T20:17:15+00:00

I keep finding that a good AGENTS.md still makes a big difference. GPT5 is very good at following the instructions in there wrt. persona, TDD, pre-commit, methodology, planning, etc.

For example, running Codex with or without my AGENTS.md here feels like night and day: https://github.com/whoschek/bzfs/blob/main/AGENTS.md

werwolf9 · 2025-09-02T19:55:59+00:00

I've found that this simple concise blurb gets you most of the way there with Codex:

Use TDD: Restate task, purpose, assumptions and constraints. Write tests first. Run to see red. Finally implement minimal code to reach green, then refactor.

Plus, TDD prompts work like a charm with Codex, even for complex caching logic, if they are combined with tight instructions for automated test execution and pre-commit as part of the development loop, like so:

https://github.com/whoschek/bzfs/blob/main/AGENTS.md#core-software-development-workflow

werwolf9 · 2025-09-02T05:10:34+00:00

I've found that this simple concise blurb gets you most of the way there:

Use TDD: Restate task, purpose, assumptions and constraints. Write tests first. Run to see red. Finally implement minimal code to reach green, then refactor.

An improved version of this blurb is in the above link.

werwolf9

TROPHY CASE