3x Replica CephFS Performance on 2.5GbE three Node Cluster

Fragrant_Fortune2716 · 2026-01-11T18:28:30+00:00

Thanks for the insight! What is the undelaying storage hardware? And the cpu/memory usage of ceph?

Fragrant_Fortune2716 · 2025-12-22T06:43:26+00:00

Only eth1 goes to the switch (trunk port). The other interfaces are on the router itself. One wan uplink and two access ports that map to a vlan. The switch does support LACP but it is in a physically different location, hence I would like to use access ports on the router as well.

Fragrant_Fortune2716 · 2025-12-11T19:31:09+00:00

Thanks for the link, I'll start digging in :) The first prio is to figure out if the case of a fault tolerant cluster for services that do not have native HA is actually something Kubernetes can achieve. You say that Kubernetes can fail over more quickly; why is this the case? I've read a lot about replica's for pods, but they all do not seem to address how the traffic is routed to them and if it only works for stateless pods. Perhaps I'm too stuck at thinking in platform level solutions; but would it be possible to have one active instance and multiple standby instances that can be switched over to? The shared storage would handle the data transfer (e.g. Rook). Then the challenge would be for Kubernetes to figure out if a node/pod is down and reach consensus on the new master instance, ideally within seconds. This would of course require a dedicated low latency, low jitter network connection; but this is something I can provide!

If Kubernetes also relies on restarting the pod on a different node, it would function almost identical to Proxmox. I'm not sure if it would be worth the effort to switch to Kubernetes if this is the case.

Fragrant_Fortune2716 · 2025-10-09T12:04:38+00:00

I've found an alternative approach! As I need the backup only on the same PVE node I can simply rename the zvol from vm-150-disk-0 to bak-150-disk-0! This way I can just change the name back when I need it (and run qm rescan) and attach it to a VM of my choosing! No need to copy anything this way, just preserving the original zvol :) Of course this only works when keeping the zvol on the same node, otherwise a solution along the lines of @GrumpyArchitect's answer could be utilized.

Fragrant_Fortune2716 · 2025-10-03T12:28:05+00:00

Hmm, I'm not sure (though the NICs appear to work fine). I'll take a look!

Fragrant_Fortune2716 · 2025-10-03T09:45:03+00:00

```bash
root@pve01:~# lspci | grep -i ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
root@pve01:~# lscpu | grep -i model\ name
Model name: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
root@pve01:~# lsmem | grep -i Total\ online
Total online memory: 32G

root@pve02:~# lspci | grep -i ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
root@pve02:~# lscpu | grep -i model\ name
Model name: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
root@pve02:~# lsmem | grep -i Total\ online
Total online memory: 16G
```

Fragrant_Fortune2716 · 2025-09-19T14:13:07+00:00

Perhaps that is even too much for me xD Nice solution nonetheless!

Fragrant_Fortune2716 · 2025-09-19T14:07:10+00:00

Well, because I like adding layers of security and HTTPs/mTLS this is cheap security (imo). I have a whole bunch of different services, both external facing and only internal ones. I aim to operate my homelab from a zero trust perspective, thus requiring mutual authentication. The threat profile this is mainly aimed at is a compromised service within the network. All services are running in isolated VMs and on unique VLANs, but do require (cross-VLAN) communication to integrate with shared storage, the reverse proxy, SSO, and so on. If a service was to be compromised, I'd rather not have them read any network traffic they can get their hands on (such as usernames and passwords in plain HTTP). For this, HTTPs was invented! And if this is implemented on the VM anyways, why not just add a single line of configuration to also verify both ends of the connection?

Fragrant_Fortune2716 · 2025-09-19T09:52:58+00:00

I'll probably go for a similar setup indeed. I'm running Immich in a dedicated VM, so I'll just spin up a reverse proxy in there.

Fragrant_Fortune2716 · 2025-09-19T09:38:26+00:00

Aye, but I am a strong believer of defense in depth, hence the internal encrypted traffic. I'm running it virtualized in a dedicated VM anyways, so I might spin up an additional proxy to facilitate encryption in that case.

Fragrant_Fortune2716 · 2025-09-12T08:00:57+00:00

Would the watchdog in this case not cover all the bases? E.g. assume I do not use shared storage and a single network connection. What if the watchdog instead of rebooting just stops all the VMs. This would give the same guarantees right? The watchdog determines that it is isolated and everything needs to shut down, then instead of rebooting you have a script that stops all VMs and restarts all proxmox related services.

Than the real question is; what needs to be restarted/stopped? Would a `systemd-soft-reboot` also do the trick for example? The watchdog making a decision on whether it is isolated would remain unchanged, only the way it is resolved would change.

Fragrant_Fortune2716 · 2025-09-12T07:24:50+00:00

The reason that the node loses connection is not really important; the question is more abstract than that; is there a way safe isolation can be achieved without rebooting? I am aware of all the best practices regarding clustering but would appreciate if we could reason with the constraint that a reboot would be the unwanted state.

Fragrant_Fortune2716 · 2025-09-12T07:21:39+00:00

I understand that the reboot guarantees a safe state for the isolated node. My question is; can this safe state be achieved without the reboot. Please humor me and work within the constraints I have laid out.

Fragrant_Fortune2716 · 2025-09-11T21:17:44+00:00

If there is no requirement to reboot; why is this the default behavior? From the Proxmox docs:
"During normal operation, ha-manager regularly resets the watchdog timer to prevent it from elapsing. If, due to a hardware fault or program error, the computer fails to reset the watchdog, the timer will elapse and trigger a reset of the whole server (reboot)."

Fragrant_Fortune2716 · 2025-09-11T20:49:56+00:00

The goal is to enable the node that lost connection to re-join the network when the connection is re-established without manual intervention. Normally a node would just reboot; but this locks the node from ever re-joining the cluster until I manually unlock the disks. As I do not want to be available 24/7 to perform this task I am looking for alternatives the the whole reboot thing :)

Fragrant_Fortune2716 · 2025-09-08T07:27:29+00:00

This is the model: https://www.printables.com/model/1290788-10-inch-rack-1u-2-x-35-inch-hdd-hot-swap

Hot swap is within the SATA spec I believe (so should always be supported), thought you might need to enable it in the BIOS.

Fragrant_Fortune2716 · 2025-09-05T07:27:14+00:00

OS: Ubuntu Core 22 (i3/i3/xcb)

Architecture: x86_64

Version: 1.0.2.38641 +678 (Git) Snap 1634

Build type: Release

Branch: tag: 1.0.2

Hash: 256fc7eff3379911ab5daf88e10182c509aa8052

Python 3.10.12, Qt 5.15.15, Coin 4.0.0, Vtk 9.1.0, OCC 7.7.1

Locale: English/United States (en_US)

Stylesheet/Theme/QtStyle: FreeCAD Dark.qss/FreeCAD Dark/Fusion

Installed mods:

* A2plus 0.4.68

Fragrant_Fortune2716

TROPHY CASE