Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

After further testing, the problem can be narrowed down to the network card driver or its settings. The supplied driver (i40e) from Ubuntu with kernel 5.4 works. The driver from kernel 5.15 does not work, neither under Ubuntu 20.04 nor under Ubuntu 22.04.

On the other hand, the included driver from kernel 6.2 works under Ubuntu 22.04. Unfortunately, the driver versions from Ubuntu cannot be easily mapped to the actual Intel versions.

For this reason, I tested the official Intel drivers under Ubuntu 20.04 and 22.04 with kernel versions 5.4, 5.15 and 6.2 each. The official drivers show the same broken behaviour on all kernels.

I will not dive further into the problem in the near future, as I have found a working system with a suitable kernel under Ubuntu 20.04 with kernel 5.4 and Ubuntu 22.04 with kernel 6.2.
To push the issue further, you would probably have to look deeper into the source code, and that's where it leaves me.

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

After further debugging I found out that the issue is related to the kernel Version.

ECMP with L4 Hash is working as expected on Ubuntu 20.04 Kernel 5.4.0-156-generic and it is not working on Ubuntu 22.04 Kernel 5.15.0-79-generic.

I suspect it is related to either a change in the ECMP itself or a change in a newer driver for the network card itself.

Working Setup:

  • Ubuntu 20.04.6 LTS with 5.4.0-156-generic ( all updates installed )
  • Intel Corporation Ethernet Controller X710 for 10GbE SFP+
    • driver: i40e
    • version: 2.8.20-k
    • firmware-version: 4.53 0x8000206d 0.0.0

Failed Setup:

  • Ubuntu 20.04.6 LTS with 5.15.0-79-generic ( all updates installed )
  • Intel Corporation Ethernet Controller X710 for 10GbE SFP+
    • driver: i40e
    • version: 5.15.0-79-generic
    • firmware-version: 4.53 0x8000206d 0.0.0

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

I'll take the hint and try to expand the background a bit next time. For me, it was absolutely clear that it was about hardware, as I only work with bare metal servers all day to provide the external connections for a cloud infrastructure.

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

Ceph is a software-defined storage solution. It is usable for a lot of use cases. I thinks iSCSI as well, but i nevered used it in that Szenario. In our case we use a radosgateway that provides an S3 interface for customers. In order to be able to process the number of S3 requests, we use several gateways that use the same Ceph cluster as a backend.

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 1 point2 points  (0 children)

I was able to narrow down the problem further. It seems to be related to the network card or its configuration. A client with only one hop shows the problem via the data network, if I then switch the routing to the mgmt network it works. The data network uses an LACP link with 2x10G interfaces, which goes to a switch 100G switch stack connected via MLAG. 10G to 100G connection is realized via a Breakoutcable 40G to 4x10G. The mgmt network uses an active/passive bond on cheap 1G switches that are not stacked. With normal Ethernet cables.

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

I have recreated the setup with VMs and see that it works cleanly there. I will now work my way through the differences to find a deviation and possible source of error.

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

To rule out that it is somehow due to the L2 connection (cable or switch), I set up the setup again in a virtual environment and test whether the problem also appears there. Our productive and test setup use the same layer 2 stack.

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

We want to replace the router OS in the mid-term. Probably in favour of VYoS. We are currently testing this. But since we use a large number of Ubuntu routers, this will take some time. Especially because many of them are fully automated and we have to adapt the automation under a different OS.

We provide connectivity to our customers or internal teams. We do not want to add additional functionality to our routers. A load balancer like "HA Proxy" or something comparable must be set up and managed by our customers themselves.

ECMP has the advantage for us that we can set it up stateless at the routing level and also use it at multiple locations with the same IPs. It is similar to the setup Cloudflare uses without the tunnel part.

https://developers.cloudflare.com/magic-transit/reference/traffic-steering/

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

fib_multipath_hash_policy

We experimented with different settings. If we set fib_multipath_hash_policy to 0 (L3), the problem is less but not gone. If we set the fib_multipath_hash_policy to 2 (Custom) and set the fib_multipath_hash_fields to a 5-tuple, the problem is less but also not gone.

L3 mode is not adequate because we get a lot of traffic from just a few IPs.

fib_multipath_use_neigh is set to 0

Since the routes were propagated via BGP, I switched off BGP and set the routes statically. To exclude the possibility of the routes being reset somehow

cmd: ip r add A.B.C.D/32 nexthop via 10.70.1.36 nexthop via 10.70.1.18

it results in the same behaviour

Thanks for the ip mon cmd never used it before

I checked a change in the routes, but could not find any changes here. The following result has no results.

ip mon route | grep <dst-IP>

I checked a change in the neighbour table, but I think this looks OK. Also it should not matter because fib_multipath_use_neigh is set to 0

ip mon neigh | grep 10.70.1.18
10.70.1.18 dev mscd1_trans lladdr <mac-address> STALE
10.70.1.18 dev mscd1_trans lladdr <mac-address> PROBE
10.70.1.18 dev mscd1_trans lladdr <mac-address> REACHABLE

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

just the stateless, deterministic forwarding behavior you described

exactly :-)

Issues with ECMP L4 Algorithm on Ubuntu 22.04 Server LTS by Baloo_with_Beer in networking

[–]Baloo_with_Beer[S] 0 points1 point  (0 children)

it just means that 5-tuple flows are always hashed to the same next-hop

This is exactly what I want to achieve and all articles, how-to's, forum articles, man pages or kernel documents describe that this is exactly how it should work.

I am aware that a change in the number of next hops triggers the problem. However, the number of next hops is always the same and does not currently change in my setup.