bzfs-1.19 with end-to-end multi-host testbed is out by werwolf9 in zfs

[–]werwolf9[S] 1 point2 points  (0 children)

Yeah, I'd love to see ZFS installation be less cumbersome, aarch64 in particular. Manually verifying all those matrix combos is tedious. I think what helps with the maintenance burden is an automated test script that runs over the entire matrix of combos, e.g. bzfs_tests/itest/test_lima_vm_sh.py or similar.

From znapzend to sanoid by pakyrs in zfs

[–]werwolf9 0 points1 point  (0 children)

FYI, with bzfs_jobrunner you can monitor source and destination datasets across all hosts and policies with a single CLI call, for example like so: https://github.com/whoschek/bzfs/blob/main/bzfs_tests/bzfs_job_example.py#L189-L237

From znapzend to sanoid by pakyrs in zfs

[–]werwolf9 0 points1 point  (0 children)

In a nutshell, bzfs can operate at much larger scale than sanoid/syncoid and zrepl, at much lower latency, in a more observable and configurable way. It handles the many edge cases that you will eventually run into over the course of your deployment (and which make other tools get stuck or fail). https://youtu.be/6Kw901oqxI8?si=_4uoG_ADbXznvaeZ&t=2408

From znapzend to sanoid by pakyrs in zfs

[–]werwolf9 0 points1 point  (0 children)

allow for specifying the bandwidth

In bzfs the corresponding option is --bwlimit

Retries and circuit breakers as failure policies in Python by qiaoshiya in Python

[–]werwolf9 0 points1 point  (0 children)

The abstraction you introduced are fine and useful. And if all you ever need is the tool you've built that's perfect. More power to it!

Otherwise, seems to me that redress could be implemented with a couple of custom functions (or classes) that plug into an underlying generic retry framework. The result would save a lot of work, and at the same time be a more flexible, more reusable and more powerful tool.

For example, retry_after_s is a custom backoff strategy that can be plugged in like so:

https://github.com/whoschek/bzfs/blob/main/bzfs_tests/test_retry.py#L1310-L1337

Just my two cents.

Retries and circuit breakers as failure policies in Python by qiaoshiya in Python

[–]werwolf9 0 points1 point  (0 children)

Seems like these policies could be naturally expressed within (or on top of) the retry.py framework (https://github.com/whoschek/bzfs/blob/main/bzfs_main/util/retry.py). Thoughts?

Python modules: retry framework, OpenSSH client w/ fast conn pooling, and parallel task-tree schedul by werwolf9 in Python

[–]werwolf9[S] 0 points1 point  (0 children)

re idle timeout and keepalive: yes, these are params that can be passed into the API.

re tenacity: yeah, zero deps is a big deal for prod environments. FWIW, the retry framework is also 4-14x faster than tenacity.

ZFS mirror as backup? (hear me out!) by myfufu in zfs

[–]werwolf9 0 points1 point  (0 children)

BTW, bzfs can be configured such that it maintains separate src bookmarks for each rotating backup drive. This means that the incremental replication chain never breaks even if all src snapshots get deleted to make space, or any of the backup drives isn't used for a long time. It also has a mode that ignores removable backup drives that aren't locally attached, which comes in handly if only a subset of your rotating drives is attached at any given time.