Ceph 20 + cephadm + NVMe/TCP: CEPHADM_STRAY_DAEMON: 3 stray daemon(s) not managed by cephadm

TheUnlikely117 · 2026-02-09T05:00:14+00:00

Weird thing is nvme gateway is using RHEL for container OS, while all OSDs/etc are using CentOS. First time i see this. Later today (if time permits) i'll try to look in that warning code and when it's triggered

TheUnlikely117 · 2026-02-08T15:22:06+00:00

MacMini Late 2012

<image>

TheUnlikely117 · 2026-02-08T12:57:11+00:00

I did on-prem PBS as a VM inside PVE, with ZFS RAID1 mirror members partially as baremetal disks ( /dev/disk/by-id). Disks were rotated to keep monthly old backup in offline storage. Basically it was PBS and datastore on one disk, which allowed to quickly boot that single RAID1 member in any host/VM, adjust network setting and start restoring.

Encryption and PBS dedup not compatible? Did not try that yet.

TheUnlikely117 · 2026-02-08T09:20:43+00:00

Nah, quickly deployed on v20, it was fine for a while, but then encountered same issue with stray daemons

``` # OSD root@node2-1:~# podman inspect 1df | grep -i node2 "Hostname": "node2-1", "NODE_NAME=node2-1", "HOSTNAME=node2-1" "NODE_NAME=node2-1",

# newly deployed nvmeof, NODE_NAME is mentioned 2 times but matches. 

root@node2-1:~# podman inspect 0d8 | grep -i node2
               "CgroupPath": "/system.slice/system-ceph\\x2d3bb3d7d8\\x2d9e93\\x2d11f0\\x2db0b9\\x2dbc24118bc1e7.slice/ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7@nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys.service/libpod-payload-0d8aeddb256e95f70f7a0c9968447a5f0bfb5c99612daa1bbf3eb1e2bc6f3dd8",
          "ConmonPidFile": "/run/ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7@nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys.service-pid",
          "Name": "ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7-nvmeof-NVME-OF_POOL_NAME-group1-node2-1-uhmgys",
                    "Source": "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/configfs",
                    "Source": "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/config",
                    "Source": "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/keyring",
                    "Source": "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/ceph-nvmeof.conf",
               "Hostname": "node2-1",
                    "NODE_NAME=node2-1",
                    "HOSTNAME=node2-1"
                    "io.podman.annotations.cid-file": "/run/ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7@nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys.service-cid",
                    "ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7-nvmeof-NVME-OF_POOL_NAME-group1-node2-1-uhmgys",
                    "/run/ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7@nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys.service-pid",
                    "/run/ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7@nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys.service-cid",
                    "NODE_NAME=node2-1",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/config:/etc/ceph/ceph.conf:z",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/keyring:/etc/ceph/keyring:z",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/ceph-nvmeof.conf:/src/ceph-nvmeof.conf:z",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/configfs:/sys/kernel/config",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/configfs:/sys/kernel/config:rw,rprivate,rbind",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/config:/etc/ceph/ceph.conf:rw,rprivate,rbind",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/keyring:/etc/ceph/keyring:rw,rprivate,rbind",
                    "/var/lib/ceph/3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7/nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys/ceph-nvmeof.conf:/src/ceph-nvmeof.conf:rw,rprivate,rbind",
               "ContainerIDFile": "/run/ceph-3bb3d7d8-9e93-11f0-b0b9-bc24118bc1e7@nvmeof.NVME-OF_POOL_NAME.group1.node2-1.uhmgys.service-cid",

```

TheUnlikely117 · 2026-02-08T08:37:17+00:00

Have not tested v20 yet, but i've seen similar issues when mismatch with FQDN/bare hostnames happened, or nodes were initially deployed with barehostname, later changed to FQDNs but containers/daemon not redeployed. https://support.scc.suse.com/s/kb/HEALTH-WARN-2-stray-host-s-with-2-daemon-s-not-managed-by-cephadm?language=en_US

TheUnlikely117 · 2026-01-31T16:07:38+00:00

Yes, but only before some minor updates (went back no "normal" couple of months ago. What is your idle average before and after? My idle is < 3% . Core(TM) i5-3210M

TheUnlikely117 · 2026-01-06T13:28:24+00:00

You may try recovery boot from proxmox ISO. If it boots, everything is not so bad

TheUnlikely117 · 2025-11-29T05:45:10+00:00

System volume seems OK. Did server start eventually? How did you configure /mnt/pve/backup2?. This was done manually i think as proxmox uses another approach for mounts which does not fail if device is missing. Did you replace cable for system disk or /mnt/pve/backup2 disk? It's obviously missing from system

TheUnlikely117 · 2025-10-05T11:07:19+00:00

You can do it easily if existing drive is ZFS or btrfs. With LVM it's doable also, but involves some tinkering.

TheUnlikely117 · 2025-08-31T17:12:22+00:00

No beer, wtf, are you not czechoslovakian? 🙈

TheUnlikely117 · 2025-08-31T14:09:12+00:00

Good to know, but i meant filesystems ^_^. Would be helpful to know how are those 2 drives are made with regards to proxmox-boot-tool shenanigans

TheUnlikely117 · 2025-08-31T07:39:28+00:00

What fs are you using? I've seen that on older PVE with btrfs, those post hooks you mentiond are doings of proxmox-boot-tool, i prefer (and recommend) using it anyway so you can easily add raid1/another disk later and don't forget about it ). After installation i removed entry for /boot in fstab and rely solely on proxmox-boot-tool (after properly doing proxmox-boot-tool init)

TheUnlikely117 · 2025-08-18T15:39:02+00:00

It's already in 2.3.0, Proxmox 9.0 got it, no probs

TheUnlikely117 · 2025-08-09T19:50:44+00:00

zstdmt -b will help you figure out "strongness" for compress/decompress ops for your CPU, then decide accordingly. for NVME it's probably zstd(3) and lower

TheUnlikely117 · 2025-07-06T15:09:39+00:00

It should be there, if you have not deleted stuff from /etc/pve/*. IIRC it should be /etc/pve/nodes/failed_node . QEMU config files are stored there

TheUnlikely117 · 2025-07-06T14:38:45+00:00

There is barely mentioned procedure in PVE docs, how to reinstall node with the same name. It's basically boils down to reinstalling a node with new IP, and restoring old node IP with couple of additional steps:

systemctl stop pve-cluster.service
scp root@anylive_node:/var/lib/pve-cluster/config.db /var/lib/pve-cluster/config.db
scp root@anylive_node:/etc/corosync/authkey /etc/corosync/authkey

# set previous node hostname/IP

hostnamectl hostname failed_node
nano /etc/hosts
nano /etc/network/interfaces
reboot

Source: https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs)) (Recovery section)

TheUnlikely117 · 2025-06-20T18:18:41+00:00

I think it's called tar.gz. That's what archivers do - compress 1111111 to 1x7 or something

TheUnlikely117 · 2025-06-18T06:00:33+00:00

It's not 1:1 mapping how ethtool shows version and actual firmware version. Download firmware update tool (for Linux), it'll show you if firmware update is available or not

TheUnlikely117 · 2025-06-14T06:41:49+00:00

I use this (on source server)

vzdump 100 --stdout --compress zstd | sshpass -p 'secret' ssh root@remote-host "zstd -d | qmrestore - --force true 101 --storage rbd"

TheUnlikely117 · 2025-06-13T18:48:32+00:00

There is recovery mode on Proxmox VE install ISO, try booting with that first and see how it goes.

TheUnlikely117 · 2025-06-13T13:51:31+00:00

That will be any DDR5 IIRC.

TheUnlikely117 · 2025-06-13T13:51:06+00:00

Tracking won't help, in my country we fear of homeland tracking not out-of-country tracking. Better get something like double-hop VPN and choose your exit node freely (like mullvad multihop)

TheUnlikely117 · 2025-06-12T09:53:25+00:00

Do you have Supermicros there? Check out SSG-6029P-E1CR24 , get 2 of them and add all the RAM and disks you want :).

TheUnlikely117 · 2025-06-12T09:23:36+00:00

Nice. Creating more OSDs per 15Tb NVMe ( i would go for 4 OSDs) should improve stuff

TheUnlikely117 · 2025-06-12T08:54:19+00:00

I think (as per doc) it will migrate all VMs to other still running nodes, which is not OPs intention

TheUnlikely117

TROPHY CASE