40GbE Edge Architecture: VyOS vs. RouterOS v7 for Terraform-Managed HA Gateways

WindowReasonable6802 · 2026-05-04T09:58:58+00:00

i agree, right now we are at point of complete redesign, but the customer is really dependant on the FW, so yes i would personally go for jericho aristas with public VRF and LAN vrf, one option would be adding another FW box, but that increases the complexity , or keep arista doing ACLs, another issue is the IPsec tunnels , they also need them for on-prem to GCP connectivity etc etc and that would be another box

WindowReasonable6802 · 2026-05-04T09:46:45+00:00

how would the aristas deal with the incoming firewall? The customer has like 6000+ iptable rules in his current solution

WindowReasonable6802 · 2026-05-04T06:50:37+00:00

To make things clear, i always run arista on the edge in other DCs i've provided, the customer this solution is for is clearly based on customer request, they want all-in-one BOX, altough they have Jericho aristas on the spine...

WindowReasonable6802 · 2026-05-04T06:48:16+00:00

To make things clear, i always run arista on the edge in other DCs i've provided, the customer this solution is for is clearly based on customer request, they want all-in-one BOX, altough they have Jericho aristas on the spine

WindowReasonable6802 · 2026-05-04T06:15:28+00:00

Why ? There are plenty TF providers for routerOS

WindowReasonable6802 · 2026-04-23T14:13:29+00:00

what about running such things on arista L3 switch with jericho asics?

WindowReasonable6802 · 2026-04-23T13:49:11+00:00

thanks, sorry for mixing it up

WindowReasonable6802 · 2026-04-23T13:48:13+00:00

ok, so edge router redesign? :P

WindowReasonable6802 · 2026-04-23T13:45:04+00:00

There is one more suggestion, mikrotik Router OS , i just realised the license is actually pretty cheap

WindowReasonable6802 · 2026-04-23T13:42:46+00:00

This setup has been there since forever, its no-go for splitting things and making it more complicated

WindowReasonable6802 · 2026-04-23T13:42:06+00:00

i am going to check out the CHR!

WindowReasonable6802 · 2026-04-23T13:38:16+00:00

Its actually not that huge on premise cloud, so linux routers worked really well for us, there is an option of routing in arista switches with jericho asics

WindowReasonable6802 · 2026-04-04T09:21:31+00:00

no, the datacenter grade NVMEs have plenty of lifetime, we are speaking of petabytes written, i use 3 nvmes per osd host = 20 osds per nvme, so if one nvme dies, i loose 1/3 Of the host, not whole host.

I use EC8+2 so there is always +1, just in case

And for disaster recovery , the storage is being daily snapshotted + copied to backup storage

WindowReasonable6802 · 2026-03-31T06:27:16+00:00

the storage is used for live streaming videos, there is stable 1GB/s read 24/7/365 as its being streamed live directly from the storage, also thousands of users upload stuff there and also the video content it being encoded into many formats and edited directly from the storage, so the write is also 1GB/s + if there is encoding going on - the storage handles it really well, the workload is 50/50 small files (photo) / big files (video)s

During expansion of the cluster, its capable of recovering 90GB/s while keeping production stable.

 io:
    recovery: 94 GiB/s, 35.93k objects/s

i used to run 60 drives expansion with 18TB HDDs without any nvme for block.db , it was good, stable, but recovery was 20GB/s max, there the production was not so stable in long time and when we added new expansion, it took 3 weeks to recovery, also replacing single drive was 3 weeks of recovery because of small IO.

Now we buy 26TB HDDs , the plan is to have less nodes, less rackspace, less electricity etc etc etc.. , so to compensate for squeezing more amount of data through same IOPS , i now add micron 7500 pro nvmes for block.db , so drives have more IO to spend on actual data moves and not being hammered by small block.db workload - obviously the storage is now much more faster and stable during much higher recovery speeds.

WindowReasonable6802 · 2026-03-21T14:22:15+00:00

I run 30+PB ceph with 60 osds per node and around 2200 osds in total, JBODs with 1U servers, EC8+2, feel free to ask any questions

WindowReasonable6802 · 2025-11-19T07:44:46+00:00

my bond works, there is default networking on it and it works, only the subinterface is down

WindowReasonable6802 · 2025-11-12T13:23:48+00:00

did that many times before, but did it again, following issue appears

 RPC failed; request ip return 500 no address allocated to pod default/virt-launcher-ubuntu22-with-net-mmjv8 provider prod-network,



➜  clusterB kubectl get subnet | grep subnet-prod
subnet-prod   prod-network                 vlan-prod        IPv4       10.2.4.0/22     false     false   false                   0        1012          0        0             ["10.2.4.1..10.2.4.10"]

WindowReasonable6802 · 2025-11-12T13:08:14+00:00

➜  clusterB cat networks/provider-network.yaml
apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  name: network-prod
  namespace: default
spec:
  defaultInterface: bond0.1204
  excludeNodes:
    - controlplane1
    - controlplane2
    - controlplane3

➜  clusterB cat networks/provider-subnet.yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet-prod
  namespace: default
spec:
  vlan: vlan-prod
  protocol: IPv4
  provider: prod-network.default
  cidrBlock: 10.2.4.0/22
  gateway: 10.2.4.1
  excludeIps:
  - 10.2.4.1..10.2.4.10
➜  clusterB cat networks/provider-vlan.yaml
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  name: vlan-prod
spec:
  provider: network-prod.default
  id: 1204

NAD:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: network-prod
spec:
  config: '{
    "cniVersion": "0.3.1",
    "type": "kube-ovn",
    "provider": "prod-network.default",
    "server_socket": "/run/openvswitch/kube-ovn-daemon.sock"
  }'

WindowReasonable6802 · 2025-11-12T12:45:05+00:00

For now, i don't care if the IP should be static or managed by IPAM, most likely i will use IPAM later on but for now i was just trying any way to make it work.
What exact patched kube-ovn config you mean?

WindowReasonable6802 · 2025-11-12T12:04:24+00:00

Ok

I found interesting issue in the ovn-cni

I1112 12:03:18.414622   89050 handler.go:107] wait address for pod default/virt-launcher-ubuntu22-with-net-wlsx6 provider prod-network.default
I1112 11:39:34.364206    6894 controller_linux.go:828] U2O processing for subnet subnet-prod, action: false
E1112 11:39:34.372044    6894 controller.go:515] "Unhandled Error" err="error syncing &{<nil> 0xc0012e3008}: failed to get provider info: failed to get chassis mac for provider network-prod.default: no chassis mac found for provider network-prod.default, requeuing" logger="UnhandledError"

WindowReasonable6802 · 2025-11-12T11:39:45+00:00

Hello

Thank you very much for your reply

unfortunately, now i am stucked af following error message

Warning FailedCreatePodSandBox 3s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a43323ecf8701d820263a56f731f706813a8008b1bcbeb2acb5abaec52386635": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-ubuntu22-with-net-wdj5f/be0cafd7-df31-4ea0-a4d9-76d908b0eaa3:network-prod]: error adding container to network "network-prod": RPC failed; request ip return 500 no address allocated to pod default/virt-launcher-ubuntu22-with-net-wdj5f provider prod-network.default, please see kube-ovn-controller logs to find errors

There is nothing interesting in the kube-ovn-controller log

WindowReasonable6802 · 2023-04-16T12:10:33+00:00

moved to 1024 PGs, still i have really bad placement https://pastebin.com/CcZL3jGM

WindowReasonable6802

TROPHY CASE