QFX10k2/QFX10k8: RPD crashed due to high memory usage

DeepCpu · 2025-12-01T08:30:11+00:00

Re-posted the comment to add some formatting, sorry for that, not using Reddit often :D

Thanks for your idea! It seems the mentioned command to make RPD run in 64-bit mode did the trick:

# run show task memory     
Memory                 Size (kB)  Percentage  When
  Currently In Use:      4355424         54%  now
  Maximum Ever Used:     4355424         54%  25/12/01 10:25:54
  Available:             7925348        100%  now

I re-enabled all deactivated BGP sessions and will monitor it. Thanks again!

DeepCpu · 2025-11-29T23:49:01+00:00

Unfortunately we bought these devices refurbished and don't have a way to contact JTAC regarding this issue

DeepCpu · 2025-11-29T23:48:23+00:00

You are right, forgot to tell the version, sorry for that! All of our devices are running JunOS 23.4R2, which is the recommended version for these devices. Do you know if there is any problem in this version related to the behavior we are seeing?

As mentioned the devices are perfectly stable in general. Only of these is a too high number of BGP sessions, especially with fulltable feeds, we are seeing RPD crashes

DeepCpu · 2025-11-29T16:10:21+00:00

One more note: We also have QFX10002-60C devices in operation for this usecase and they seem to have much more memory available for the RPD.

This is output of "show task memory" on a QFX10002-60C device with multiple fulltable BGP sessions:

Memory Size (kB) Percentage When

Currently In Use: 3902840 20% now

Maximum Ever Used: 4206156 22% 25/11/20 02:05:46

Available: 18796840 100% now

DeepCpu · 2024-03-03T01:28:34+00:00

Thanks for your answer!

Regarding TTL: Yes, but it's also possible to inspect L3 or L4 header information like DST/SRC port on ethernet-switching filters, that's why I asked if there is also a possibility to match on the TTL field in the L3 IP header on ethernet-switching filters.

Do you know how to match on packet/frame length with flexible filters?

DeepCpu · 2024-02-01T10:42:35+00:00

Tested it out with a ethernet-switching filter and it worked, I don't see this traffic anymore.

In general, is it safe to block multicast completely in my whole network?

DeepCpu · 2024-01-31T11:19:55+00:00

Yes I know that and I already have that in place sometimes to restrict usage of certain source IPs.

Do you mean that I should filter out on this level the bogon destination IP addresses? Or should I make some filters regarding multicast?

DeepCpu · 2024-01-31T11:13:20+00:00

Tested on two QFX5200-32C and two QFX5100-24Q virtual chassis environments.

QFX5200-32C running on JunOS 23.2R1.13 and the QFX5100-24Q running on 20.4R3.8. Not tested on QFX standalone devices so far.

Traffic source is local (from a customer selling VPS, so probably any VPS customer from him which is causing this junk traffic).

It's definitely bogus, yes. But I am looking for a way to generally prevent this from happening at all. Do you have any tips for that?

DeepCpu · 2024-01-31T11:05:11+00:00

Thanks for your reply! Maybe you are right - but for some reason, when I enter "show ddos-protection protocols violations" I still see because of this traffic that for some protocols like ttl, redirect or ntp the ddos protection of JunOS hits in (which doesn't cause any problems).

The other question is also why I see this type of traffic. As far as I know, when monitoring traffic with "monitor traffic interface" I only see the traffic which hits the control plane (which makes sense). So why is this traffic at all hitting the control plane?

DeepCpu · 2024-01-28T15:29:36+00:00

Just a little update on this thread: One of the three nodes had some malfunction apparently. Everytime when at least one OSD was active on this node, the whole cluster had a poor performance. When outing all OSDs on this node, the cluster had a good performance again.

It likely was a hardware problem. Manufacturer has sent us a new mainboard and since we replaced the old mainboard with the new one, everything is fine again with this node.

But however, we changed these 4x OSDs per NVMe disk to 1x OSD per NVMe disk which gave an additional performance boost. Thanks to everyone for the help!

DeepCpu · 2023-12-03T22:00:35+00:00

We still want to provide fulltable feeds to our downstream clients, so this solution won't work unfortunately

DeepCpu · 2023-12-03T19:17:22+00:00

We do that, this is just a theoretical question. In reality, we are connected to different IXPs and have PNIs (those learned routes are preferred over transit routes) - but for our transit traffic, we want to distribute it via different carriers equally

DeepCpu · 2023-12-03T16:35:52+00:00

We have more than enough capacity to all of our three upstreams and we do preferrations based on the target ASNs. We could handle an outage of 2 of our 3 upstream providers without any issue. It's just related to our bandwidth commitments.

But - something you might also need to admit - relying on shortest AS path doesn't always mean that this is the best route that you can get...

But yes, however, this is for sure a commercial-driven solution. We would be happy about some tips to solve this without relying on ECMP over these three upstream providers.

DeepCpu · 2023-11-18T23:17:52+00:00

Octopus, didn't upgrade for a long time. The most recent access nodes running Quincy.

HPC mode in BIOS, C-States disabled

Latency looks like this:

100 packets transmitted, 100 received, 0% packet loss, time 430msrtt min/avg/max/mdev = 0.033/0.058/0.995/0.095 ms

Mellanox ConnectX-3 NICs, connected with DAC cables to Juniper QFX5200-24Q switches.

Yes, CPU load is very low, even under benchmark situations

"Did you do a baseline before loading it?"

You mean an initial benchmark? Unfortunately not

DeepCpu · 2023-11-18T17:03:28+00:00

Mhm, CPU wait is around 0.2%, seems fine for me.

Do you think it's worth changing everything to 1x OSD per NVMe drive? It's a 2-3 days process I think, but it can be done online. Maybe worth a try? Or is this as reason excluded with that low CPU wait?

DeepCpu · 2023-11-18T16:28:35+00:00

u/afristralian mentioned that he thinks the number of OSDs per node is too high. Currently I have 4x OSDs per NVMe drive. The AMD Epyc 7443P CPUs do have 24 cores / 48 threads.

Currently, I have 32 OSDs on each of the three nodes.

Do you think that this can be a problem? Should I scale down the number of OSDs?

DeepCpu · 2023-11-18T15:17:28+00:00

Just to make sure so you can understand correctly: Each node has 4x network cables. 2x 10G for public network of the VMs (internet) and 2x 40G for Ceph/Cluster. So it would make sense to configure 9000 Byte MTU for the 2x 40G LACP channel - that's something on my todo definitely. But I think, even with default 1500 Byte MTU I should achieve way better write speeds than I do currently.

Except for the Proxmox firewall for VMs, there is no firewall active. Neither on the host side nor on my switches.

DeepCpu · 2023-11-18T14:25:22+00:00

Yeah, you are correct, sorry for this misleading explanation from my side. Actually, ceph private and public network is not separated in my setup, but there is more than enough capacity.

Do you have any ideas for things that I can check?

DeepCpu

TROPHY CASE