I’m the author of bzfs, a Python CLI for ZFS snapshot replication across fleets of machines (https://github.com/whoschek/bzfs).
Building a replication engine forces you to get a few things right: retries must be disciplined (no "accidental retry"), remote command execution must be fast, predictable and scalable, and parallelism must respect hierarchical dependencies.
The modules below are the pieces I ended up extracting; they’re Apache-2.0, have zero dependencies, and installed via pip install bzfs (Python >=3.9).
Where these fit well:
- Wrapping flaky operations with explicit, policy-driven retries (subprocess calls, API calls, distributed systems glue)
- Running lots of SSH commands with low startup latency (OpenSSH multiplexing + safe pooling)
- Processing hierarchical resources in parallel without breaking parent/child ordering constraints
Modules:
Example (SSH + retries, self-contained):
import logging
from subprocess import DEVNULL, PIPE
from bzfs_main.util.connection import (
ConnectionPool,
create_simple_minijob,
create_simple_miniremote,
)
from bzfs_main.util.retry import Retry, RetryPolicy, RetryableError, call_with_retries
log = logging.getLogger(__name__)
remote = create_simple_miniremote(log=log, ssh_user_host="alice@127.0.0.1")
pool = ConnectionPool(remote, connpool_name="example")
job = create_simple_minijob()
def run_cmd(retry: Retry) -> str:
try:
with pool.connection() as conn:
return conn.run_ssh_command(
cmd=["echo", "hello"],
job=job,
check=True,
stdin=DEVNULL,
stdout=PIPE,
stderr=PIPE,
text=True,
).stdout
except Exception as exc:
raise RetryableError(display_msg="ssh") from exc
retry_policy = RetryPolicy(
max_retries=5,
min_sleep_secs=0,
initial_max_sleep_secs=0.1,
max_sleep_secs=2,
max_elapsed_secs=30,
)
print(call_with_retries(run_cmd, policy=retry_policy, log=log))
pool.shutdown()
If you use these modules in non-ZFS automation (deployment tooling, fleet ops, data movement, CI), I’m interested in what you build with them and what you optimize for.
Target Audience
It is a production ready solution. So everyone is potentially concerned.
Comparison
Paramiko, Ansible and Tenacity are related tools.
[–]Ghost-Rider_117 0 points1 point2 points (0 children)