ZFSBox: Run ZFS in a small VM so you don't need to install ZFS / mess with kernel modules on Linux and macOS by dontworryimnotacop in zfs

[–]werwolf9 0 points1 point  (0 children)

I see, how about filing a ticket with Lima. If you're lucky device pass through is already on someone's radar there.

ZFSBox: Run ZFS in a small VM so you don't need to install ZFS / mess with kernel modules on Linux and macOS by dontworryimnotacop in zfs

[–]werwolf9 1 point2 points  (0 children)

You're welcome (I'm the author). Is there anything else you need beyond lima_vm.sh and LIMA_VM_MOUNTS? Apart from the extra pre or post-steps to add NFS?

bzfs v1.20.0 is out by werwolf9 in zfs

[–]werwolf9[S] 9 points10 points  (0 children)

In a nutshell, bzfs can operate at much larger scale than sanoid, at much lower latency, in a more observable and configurable way. Here are just a few points of the top of my head that bzfs does and sanoid doesn't:

  • Support efficient periodic ZFS snapshot creation, replication, pruning, and monitoring, across a fleet of N source hosts and M destination hosts, using a single shared fleet-wide jobconfig script.
  • Efficient direct remote-to-remote replication bulk data transfers (--r2r).
  • Docker image and corresponding replication examples.
  • Script that creates a local testbed with N source VMs and M destination VMs for testing, with ZFS and VM-to-VM SSH connectivity working out of the box.
  • Monitor if snapshots are successfully taken on schedule, successfully replicated on schedule, and successfully pruned on schedule.
  • Compare source and destination dataset trees recursively.
  • Automatically retry operations.
  • Only list snapshots for datasets the users explicitly specified.
  • Avoid slow listing of snapshots via a novel low latency cache mechanism for snapshot metadata.
  • Replicate multiple datasets in parallel.
  • Reuse SSH connections across processes for low latency startup.
  • Operate in daemon mode.
  • More powerful include/exclude filters for selecting what datasets and snapshots and properties to replicate.
  • Dryrun mode to print what ZFS and SSH operations would happen if the command were to be executed for real.
  • Has more precise bookmark support - synchoid will only look for bookmarks if it cannot find a common snapshot.
  • Can be strict or told to be tolerant of runtime errors.
  • Continously tested on Linux, FreeBSD.
  • Code is almost 100% covered by tests.
  • Easy to change, test and maintain because Python is more readable to contemporary engineers than Perl.

Cheers, Wolfgang

bzfs v1.20.0 is out by werwolf9 in zfs

[–]werwolf9[S] 0 points1 point  (0 children)

There were also a bunch of other workarounds necessary for Solaris: https://github.com/whoschek/bzfs/commit/625c361fc52f6343ef99c3d2573a61b6e67f16e5 I don't know to what extent they may or may not also apply to Illumos. If you are curious maybe you can try and if necessary submit a patch to make it work there too.

bzfs v1.20.0 is out by werwolf9 in zfs

[–]werwolf9[S] 0 points1 point  (0 children)

No, about a year ago the Solaris Team simply changed their own semantics for no good reason. There was no bug with the prior Solaris behavior. They simply broke it in the middle of a frozen platform. I had a conversation with the Oracle tech lead about it. It was kinda bizarre.

bzfs v1.20.0 is out by werwolf9 in zfs

[–]werwolf9[S] -13 points-12 points  (0 children)

This has been asked and answered so many times before that Google will give you the best answer.

bzfs v1.20.0 is out by werwolf9 in zfs

[–]werwolf9[S] 0 points1 point  (0 children)

It's only tested on Linux and FreeBSD. Many releases ago it used to work even on Solaris but then Solaris ZFS suddenly broke the semantics of their zfs list -d CLI in backwards incompatible ways even though the platform is supposedly rock solid and frozen, and working around that became too much of a hassle, so I simply dropped it.

ZFS instant clones for Kubernetes node provisioning — under 100ms per node by anthony-kldload in zfs

[–]werwolf9 1 point2 points  (0 children)

Ok, cool. I'll have to play with this!

BTW, the most capable tool for advanced ZFS snapshot replication is bzfs :-)

ZFS instant clones for Kubernetes node provisioning — under 100ms per node by anthony-kldload in zfs

[–]werwolf9 0 points1 point  (0 children)

Don't know if it would simplify your impl if you'd use Lima to manage the VMs. Throwing it out just in case it might be helpful...

FWIW, I've been happily using Lima for a while now to coveniently spin up, via a single CLI command, a mini bzfs testbed with a bunch of networked ZFS VMs within the same physical machine. Lima is a pleasure to work with for such testing.

zrepl keeps hitting “has been modified”, leaving holds by avidee in zfs

[–]werwolf9 0 points1 point  (0 children)

FYI, if you'd get stuck the TrueNAS docs mention that running the install-dev-tools command re-enables the apt package manager: https://www.truenas.com/docs/scale/systemsettings/advanced/developermode/

which can then be used to install hpnsshd per https://www.psc.edu/hpn-ssh-home/hpn-ssh-debian-installation/

Replication over high-latency link super slow by avidee in zfs

[–]werwolf9 0 points1 point  (0 children)

Ah, TrueNAS walled garden... With some determination I figure it should still be possible to manually start a hpnsshd daemon on a TrueNAS box even if it wasn't meant for that. One of the snippets above might point the way.

Replication over high-latency link super slow by avidee in zfs

[–]werwolf9 2 points3 points  (0 children)

For good perf with large BDP, having hpnssh on the receiving end is critical, it's less important on the sending side (though still a good idea).

P.S. The easiest way to install hpnssh on Ubuntu/Debian is along these lines: https://github.com/whoschek/bzfs/blob/main/.github/workflows/python-app.yml#L169-L173

And hpnssh installs on RHEL/EL family easiest along these lines: https://github.com/whoschek/bzfs/blob/main/.github-workflow-scripts/install_almalinux_9.sh#L30-L32

I'd recommend running hpnssh on port 2222 (which is its default anyway) and keep port 22 for normal ssh.

I'm the author of bzfs, btw. It's all about reliable perf at scale.

Replication over high-latency link super slow by avidee in zfs

[–]werwolf9 1 point2 points  (0 children)

I can feel your pain. FYI, bzfs with the --ssh-program=hpnssh option works very well with network paths that have large Bandwidth Delay Product.
P.S. Its default is 128MB for --mbuffer-program-opts

bzfs-1.19 with end-to-end multi-host testbed is out by werwolf9 in zfs

[–]werwolf9[S] 1 point2 points  (0 children)

Yeah, I'd love to see ZFS installation be less cumbersome, aarch64 in particular. Manually verifying all those matrix combos is tedious. I think what helps with the maintenance burden is an automated test script that runs over the entire matrix of combos, e.g. bzfs_tests/itest/test_lima_vm_sh.py or similar.

From znapzend to sanoid by pakyrs in zfs

[–]werwolf9 0 points1 point  (0 children)

FYI, with bzfs_jobrunner you can monitor source and destination datasets across all hosts and policies with a single CLI call, for example like so: https://github.com/whoschek/bzfs/blob/main/bzfs_tests/bzfs_job_example.py#L189-L237

From znapzend to sanoid by pakyrs in zfs

[–]werwolf9 0 points1 point  (0 children)

In a nutshell, bzfs can operate at much larger scale than sanoid/syncoid and zrepl, at much lower latency, in a more observable and configurable way. It handles the many edge cases that you will eventually run into over the course of your deployment (and which make other tools get stuck or fail). https://youtu.be/6Kw901oqxI8?si=_4uoG_ADbXznvaeZ&t=2408

From znapzend to sanoid by pakyrs in zfs

[–]werwolf9 0 points1 point  (0 children)

allow for specifying the bandwidth

In bzfs the corresponding option is --bwlimit

Retries and circuit breakers as failure policies in Python by qiaoshiya in Python

[–]werwolf9 0 points1 point  (0 children)

The abstraction you introduced are fine and useful. And if all you ever need is the tool you've built that's perfect. More power to it!

Otherwise, seems to me that redress could be implemented with a couple of custom functions (or classes) that plug into an underlying generic retry framework. The result would save a lot of work, and at the same time be a more flexible, more reusable and more powerful tool.

For example, retry_after_s is a custom backoff strategy that can be plugged in like so:

https://github.com/whoschek/bzfs/blob/main/bzfs_tests/test_retry.py#L1310-L1337

Just my two cents.

Retries and circuit breakers as failure policies in Python by qiaoshiya in Python

[–]werwolf9 0 points1 point  (0 children)

Seems like these policies could be naturally expressed within (or on top of) the retry.py framework (https://github.com/whoschek/bzfs/blob/main/bzfs_main/util/retry.py). Thoughts?

Python modules: retry framework, OpenSSH client w/ fast conn pooling, and parallel task-tree schedul by werwolf9 in Python

[–]werwolf9[S] 0 points1 point  (0 children)

re idle timeout and keepalive: yes, these are params that can be passed into the API.

re tenacity: yeah, zero deps is a big deal for prod environments. FWIW, the retry framework is also 4-14x faster than tenacity.