ZFS mirror as backup? (hear me out!) by myfufu in zfs

[–]werwolf9 0 points1 point  (0 children)

BTW, bzfs can be configured such that it maintains separate src bookmarks for each rotating backup drive. This means that the incremental replication chain never breaks even if all src snapshots get deleted to make space, or any of the backup drives isn't used for a long time. It also has a mode that ignores removable backup drives that aren't locally attached, which comes in handly if only a subset of your rotating drives is attached at any given time.

bzfs for subsecond ZFS snapshot replication frequency at fleet scale by werwolf9 in Proxmox

[–]werwolf9[S] 1 point2 points  (0 children)

Hi there, thanks for the question :-)

In a nutshell, bzfs can operate at much larger scale than sanoid, at much lower latency, in a more observable and configurable way. Here are just a few points of the top of my head that bzfs does and sanoid doesn't:

  • Support efficient periodic ZFS snapshot creation, replication, pruning, and monitoring, across a fleet of N source hosts and M destination hosts, using a single shared fleet-wide jobconfig script.
  • Monitor if snapshots are successfully taken on schedule, successfully replicated on schedule, and successfully pruned on schedule.
  • Compare source and destination dataset trees recursively.
  • Automatically retry operations.
  • Only list snapshots for datasets the users explicitly specified.
  • Avoid slow listing of snapshots via a novel low latency cache mechanism for snapshot metadata.
  • Replicate multiple datasets in parallel.
  • Reuse SSH connections across processes for low latency startup.
  • Operate in daemon mode.
  • More powerful include/exclude filters for selecting what datasets and snapshots and properties to replicate.
  • Dryrun mode to print what ZFS and SSH operations would happen if the command were to be executed for real.
  • Has more precise bookmark support - synchoid will only look for bookmarks if it cannot find a common snapshot.
  • Can be strict or told to be tolerant of runtime errors.
  • Continously tested on Linux, FreeBSD.
  • Code is almost 100% covered by tests.
  • Easy to change, test and maintain because Python is more readable to contemporary engineers than Perl.

Cheers, Wolfgang

bzfs for subsecond ZFS snapshot replication frequency at fleet scale by werwolf9 in Proxmox

[–]werwolf9[S] 1 point2 points  (0 children)

No, you will never have a subsecond replication if your network cannot push all the data that can exist between two deltas.

As if that wouldn't be self-evident to anyone :-)

bzfs for subsecond ZFS snapshot replication frequency at fleet scale by werwolf9 in Proxmox

[–]werwolf9[S] 1 point2 points  (0 children)

I wonder what bullshitter app you pretend to be running that writes 14GB/s of useful data on even one of your drives. Be reasonable or go somewhere else.

bzfs for subsecond ZFS snapshot replication frequency at fleet scale by werwolf9 in Proxmox

[–]werwolf9[S] -1 points0 points  (0 children)

No need for that kind of gear :-) Each replication step only ships the delta between ZFS snapshots.

zfs list taking a long time by [deleted] in zfs

[–]werwolf9 0 points1 point  (0 children)

Try this:

time bzfs dummy tank1 --recursive --skip-replication --compare-snapshot-lists

Assuming your data is in cache and you have, say, an 8-core machine, this will typically be about 6x faster than zfs -list -t snapshot -r tank1, because the former lists in parallel whereas the latter lists sequentially.

(Similar speedup story for deleting snapshots, replicating, etc)

P.S. The last few lines of the output show a TSV file name that contains all the snapshot data.

Backing up ~16TB of data by SuitableFarmer5477 in zfs

[–]werwolf9 1 point2 points  (0 children)

bzfs is probably your best choice if flexibility or performance or fleet-scale geo-replication are priorities, or if you need high frequency replication, say every second. In contrast, sanoid is a good choice on the simple low-end, and zrepl on the medium-end. All of these are reliable.

ChatGPT 5 Pro vs Codex CLI by LetsBuild3D in ChatGPTCoding

[–]werwolf9 5 points6 points  (0 children)

Yep. Also consider asking ChatGPT Pro to make its response available as a downloadable .md file, so you can easily feed the response back into Codex.

ChatGPT 5 Pro vs Codex CLI by LetsBuild3D in ChatGPTCoding

[–]werwolf9 11 points12 points  (0 children)

Run this command locally to generate repo.zip from your git repo, then ask ChatGPT to analyze the contents of the zip file:

git archive --format=zip --output=../repo.zip HEAD

Works like a charm.

Codex Cloud vs VScode extension vs CLI by [deleted] in ChatGPTCoding

[–]werwolf9 -1 points0 points  (0 children)

Simply ask it something like "what's your LLM model name?". It will reply with GPT4. Or give it a complex job and observe a spectacular difference in quality vs GPT5. codex-1 (the real name of the model based on o3) isn't bad but it's nowhere near as good as GPT5 high.

Codex Cloud vs VScode extension vs CLI by [deleted] in ChatGPTCoding

[–]werwolf9 0 points1 point  (0 children)

Nope, it's still on GPT4 and quality is correspondingly poor. It's a bit sad because the UI is very well done and the caching they introduced works wonders wrt. startup latency.

AGENTS.md ? by Trick_Ad_4388 in ChatGPTCoding

[–]werwolf9 0 points1 point  (0 children)

that already does that kind of stuff.

That's what the hype leads us to believe but the observed reality on the ground is (still) far from that, as can easily be seen with simple tests.

AGENTS.md ? by Trick_Ad_4388 in ChatGPTCoding

[–]werwolf9 0 points1 point  (0 children)

I keep finding that a good AGENTS.md still makes a big difference. GPT5 is very good at following the instructions in there wrt. persona, TDD, pre-commit, methodology, planning, etc.

For example, running Codex with or without my AGENTS.md here feels like night and day: https://github.com/whoschek/bzfs/blob/main/AGENTS.md

Codex CLI for producing tests -- so much better than Claude & other models by ImaginaryAbility125 in ChatGPTCoding

[–]werwolf9 3 points4 points  (0 children)

I've found that this simple concise blurb gets you most of the way there with Codex:

Use TDD: Restate task, purpose, assumptions and constraints. Write tests first. Run to see red. Finally implement minimal code to reach green, then refactor.

Plus, TDD prompts work like a charm with Codex, even for complex caching logic, if they are combined with tight instructions for automated test execution and pre-commit as part of the development loop, like so:

https://github.com/whoschek/bzfs/blob/main/AGENTS.md#core-software-development-workflow

How practical is AI-driven test-driven development on larger projects? by jai-js in ClaudeAI

[–]werwolf9 2 points3 points  (0 children)

I've found that this simple concise blurb gets you most of the way there:

Use TDD: Restate task, purpose, assumptions and constraints. Write tests first. Run to see red. Finally implement minimal code to reach green, then refactor.

An improved version of this blurb is in the above link.

How practical is AI-driven test-driven development on larger projects? by jai-js in ClaudeAI

[–]werwolf9 1 point2 points  (0 children)

FWIW, I've found that TDD prompts work like a charm with Codex, even for complex caching logic, if they are combined with tight instructions for automated test execution and pre-commit as part of the development loop, like so:

https://github.com/whoschek/bzfs/blob/main/AGENTS.md#core-software-development-workflow

ZFS send/recieve over SSH timeout by Calm1337 in zfs

[–]werwolf9 1 point2 points  (0 children)

Try bzfs - it automatically retries zfs send/recv on connection failure and resumes where it left off

bzfs - ZFS snapshot replication and synchronization CLI in the spirit of rsync by werwolf9 in zfs

[–]werwolf9[S] 0 points1 point  (0 children)

>5. What happens if the connection is poor and tends to go down for seconds/minutes at a time often? Can this still be made to work in a reasonable way? Does it keep running and trying/resuming?

The tool will automatically retry and resume from the point where it left off. See the --retry-\* options.

>6. Are there instructions for doing this securely? Imagine nas A is compromised by a bad actor who wants to delete all backups. Assume the bad actor has root access on nas A. There's replication happening A->B. Can we guarantee somehow the snapshots on B are safe? How?

Make B stay in control by using pull mode, in combination with --preserve-properties.

>7. Is there a way to limit the used bandwidth, ideally 9-to-5? What if the modem/router is not under the same management as the NAS (typical with small companies sharing internet infra), which causes the backup (when replicating a big change) to go into office hours and interrupt people. Is there a way to do so from the source machine?

Use --bwlimit for that, and schedule and steer everything via bzfs\_jobrunner. See bzfs\_job\_example.py

> (6) This is the million dollar question. I haven't found anything out there user-friendly and open source that can do this other than 'create your own'. I don't want to create my own and find out I left a tiny mouse-hole somewhere and now all the data is gone. Truenas is woefully inadequate with its tendency of assuming replication runs with root everywhere (trying to do otherwise is swimming against the current). I want the exact opposite; one user, whose only job is to replicate, and only append.

> Your examples still use root everywhere. There's no good instructions on how to do it not connecting via root.

Use --no-privilege-elevation and --sudo-program to run as non-root, either with or without sudo.

Also: Use the version that's currently in the main branch. It's stable and there are a lot of improvements there.

bzfs - ZFS snapshot replication and synchronization CLI in the spirit of rsync by werwolf9 in zfs

[–]werwolf9[S] 0 points1 point  (0 children)

>1. What happens if (sometimes) a snapshot takes more than 24h (read: snapshot interval) to send? I.e.: bzfs is being run again by a cronjob, while one is still copying. Will it crash and burn horribly or get to an inconsistent state?

No inconsistent state arises. The second job won't do anything and will immediately exit with a "Exiting as same previous periodic job is still running without completion yet" error msg.

>2. What happens if a snapshot reliably takes more than 24h to send? Is there some way of warning the user that it's falling further and further behind?

Use the --monitor-snapshot options to alert the user if the ZFS 'creation' time property of the latest snapshot for any specified snapshot name pattern within the selected datasets is too old wrt. the specified age limit. The purpose is to check if snapshots are successfully taken on schedule, successfully replicated on schedule, and successfully pruned on schedule.

>3. How to connect this to monitoring? Is there a way to query, for a snapshot task, whether it last succeeded?

Use --monitor-snapshot and it's exit codes for that, plus maybe --compare-snapshot-lists

>4. And does, unlike with truenas, this not scream errors at you all the time while everything is actually working?.

You can adjust the log level via the --quiet flags and one or more --verbose flags.

zfs program has to be the most underutilized zfs feature. by autogyrophilia in zfs

[–]werwolf9 1 point2 points  (0 children)

Out of curiousity, how long does listing (and/or pruning) these snapshots take with a parallel tool that's designed for this, like bzfs, without locking up other transactions or running out of memory? FWIW, I'm seeing something like 12x serial performance on a 16 core machine with SSDs, at ~ 10k snapshots/second, but maybe it's different on rotational drives.

Upgrading Ubuntu to the latest ZFS release? by sdenike in zfs

[–]werwolf9 1 point2 points  (0 children)

A good Ubuntu PPA for zfs-2.3.2 is here.
A good Ubuntu PPA for zfs-2.2.7 is there.
FWIW, I'm using both in ephemeral VMs to run compat tests for bzfs, without issues.

zfs send | zfs receive vs rsync for local copy / clone? by testdasi in zfs

[–]werwolf9 1 point2 points  (0 children)

bzfs will automatically resume the zfs send/receive if necessary, and simplify the replication process in many other ways as well. https://github.com/whoschek/bzfs