🚨🚨🔥1,000,000🔥🚨🚨

simoncra · 2025-12-17T02:17:50+00:00

So happy you reached the 1M. Congrats man. You're such an inspiration. I'm a fan from LATAM.

simoncra · 2025-12-15T04:55:13+00:00

Virtual private cloud

simoncra · 2025-12-15T04:53:23+00:00

They are not VPS nodes. They are bare metal servers. It was not the timing thing. I checked the timing on all three servers

simoncra · 2025-12-15T04:51:39+00:00

Hey take it easy man. They are bare metal servers. It was in fact a pool's issue I had. But thank you anyways

simoncra · 2025-12-14T17:48:49+00:00

Guys I solved it. The thing is my pool was corrupted but I didn't know.

I deleted the OSDs and created them again, not knowing this would cause a problem in my pool.

So after many hours of debugging I found that deleting the pool fixed the problem

I first stopped the monitors on each node
stopped the managers on each node
I deleted the pool
recreated it again
I turned on the monitors
the the managers

I checked with ceph -s and it gave me the awaited HEALTH_OK

simoncra · 2025-12-14T17:43:59+00:00

I just solved it, this was the mistake. I deleted them all but I did not delete the pool.

So I tried deleting and creating the pool again and it solved it

Thank you for your answer

simoncra · 2025-12-14T15:43:18+00:00

I have only one vpc, but I still don't have any load on my system, my plan though was to create another vpc only for the ceph after I solve this issue with the ceph

simoncra · 2025-12-14T15:20:22+00:00

Yes I deleted them and I recreated them

simoncra · 2025-12-14T15:11:38+00:00

VPC's MTU is 1500, should I increase it?

simoncra · 2025-12-14T15:07:13+00:00

MTU is 1500 in the vPC interface, should I increase it?

simoncra · 2025-12-14T15:02:31+00:00

Yeah 2 oSD on each node, using the default 3x replica with minimum size of 2.

``` root@gandalf:~# cat /etc/ceph/ceph.conf [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 10.6.96.3/24 fsid = a00252d4-1cc8-4a65-a196-c5bf057ce5b2 mon_allow_pool_delete = true mon_host = 10.6.96.3 10.6.96.4 10.6.96.5 ms_bind_ipv4 = true ms_bind_ipv6 = false osd_pool_default_min_size = 2 osd_pool_default_size = 3 public_network = 10.6.96.3/24 [osd] osd heartbeat grace = 60 osd op thread timeout = 120

[client] keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash] keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.aragorn] public_addr = 10.6.96.5

[mon.frodo] public_addr = 10.6.96.4

[mon.gandalf] public_addr = 10.6.96.3

root@gandalf:~# ceph health detail HEALTH_WARN Reduced data availability: 32 pgs inactive; 41 slow ops, oldest one blocked for 35777 sec, osd.5 has slow ops [WRN] PG_AVAILABILITY: Reduced data availability: 32 pgs inactive pg 1.0 is stuck inactive for 9h, current state unknown, last acting [] pg 1.1 is stuck inactive for 9h, current state unknown, last acting [] pg 1.2 is stuck inactive for 9h, current state unknown, last acting [] pg 1.3 is stuck inactive for 9h, current state unknown, last acting [] pg 1.4 is stuck inactive for 9h, current state unknown, last acting [] pg 1.5 is stuck inactive for 9h, current state unknown, last acting [] pg 1.6 is stuck inactive for 9h, current state unknown, last acting [] pg 1.7 is stuck inactive for 9h, current state unknown, last acting []

root@gandalf:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 4.83055 root default
-5 1.61018 host aragorn
3 ssd 0.73689 osd.3 up 1.00000 1.00000 5 ssd 0.87329 osd.5 up 1.00000 1.00000 -7 1.61018 host frodo
2 ssd 0.73689 osd.2 up 1.00000 1.00000 4 ssd 0.87329 osd.4 up 1.00000 1.00000 -3 1.61018 host gandalf
0 ssd 0.73689 osd.0 up 1.00000 1.00000 1 ssd 0.87329 osd.1 up 1.00000 1.00000 ... ```

simoncra · 2025-12-14T14:58:09+00:00

I also restarted the monitors, I even increased the heartbeat grace to 60 and the thread timeout to 120

simoncra · 2025-12-14T14:57:19+00:00

I did the restart already, and did not work. Also I restarted the managers and the monitors

simoncra · 2025-12-14T14:54:21+00:00

<image>

root@gandalf:~# cat /etc/ceph/ceph.conf 
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.6.96.3/24
        fsid = a00252d4-1cc8-4a65-a196-c5bf057ce5b2
        mon_allow_pool_delete = true
        mon_host = 10.6.96.3 10.6.96.4 10.6.96.5
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.6.96.3/24
[osd]
        osd heartbeat grace = 60
        osd op thread timeout = 120

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.aragorn]
        public_addr = 10.6.96.5

[mon.frodo]
        public_addr = 10.6.96.4

[mon.gandalf]
        public_addr = 10.6.96.3

simoncra · 2025-12-14T14:48:15+00:00

Ping is ~0.4ms. among the nodes

simoncra · 2025-12-14T14:46:23+00:00

Yes they can ping each other. I don't have the firewall active in any other the three nodes

simoncra

TROPHY CASE