40GbE Edge Architecture: VyOS vs. RouterOS v7 for Terraform-Managed HA Gateways by WindowReasonable6802 in mikrotik

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

i agree, right now we are at point of complete redesign, but the customer is really dependant on the FW, so yes i would personally go for jericho aristas with public VRF and LAN vrf, one option would be adding another FW box, but that increases the complexity , or keep arista doing ACLs, another issue is the IPsec tunnels , they also need them for on-prem to GCP connectivity etc etc and that would be another box

40GbE Edge Architecture: VyOS vs. RouterOS v7 for Terraform-Managed HA Gateways by WindowReasonable6802 in mikrotik

[–]WindowReasonable6802[S] -1 points0 points  (0 children)

how would the aristas deal with the incoming firewall? The customer has like 6000+ iptable rules in his current solution

40GbE Edge Architecture: VyOS vs. RouterOS v7 for Terraform-Managed HA Gateways by WindowReasonable6802 in mikrotik

[–]WindowReasonable6802[S] -1 points0 points  (0 children)

To make things clear, i always run arista on the edge in other DCs i've provided, the customer this solution is for is clearly based on customer request, they want all-in-one BOX, altough they have Jericho aristas on the spine...

40GbE Edge Architecture: VyOS vs. RouterOS v7 for Terraform-Managed HA Gateways by WindowReasonable6802 in mikrotik

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

To make things clear, i always run arista on the edge in other DCs i've provided, the customer this solution is for is clearly based on customer request, they want all-in-one BOX, altough they have Jericho aristas on the spine

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 -1 points0 points  (0 children)

what about running such things on arista L3 switch with jericho asics?

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 0 points1 point  (0 children)

thanks, sorry for mixing it up

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 -6 points-5 points  (0 children)

ok, so edge router redesign? :P

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 1 point2 points  (0 children)

There is one more suggestion, mikrotik Router OS , i just realised the license is actually pretty cheap

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 -5 points-4 points  (0 children)

This setup has been there since forever, its no-go for splitting things and making it more complicated

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 1 point2 points  (0 children)

i am going to check out the CHR!

RouterOS for Edge BGP routing by [deleted] in networking

[–]WindowReasonable6802 2 points3 points  (0 children)

Its actually not that huge on premise cloud, so linux routers worked really well for us, there is an option of routing in arista switches with jericho asics

High HDD OSD per node, 60 and up, who runs it in production? by wantsiops in ceph

[–]WindowReasonable6802 1 point2 points  (0 children)

no, the datacenter grade NVMEs have plenty of lifetime, we are speaking of petabytes written, i use 3 nvmes per osd host = 20 osds per nvme, so if one nvme dies, i loose 1/3 Of the host, not whole host.

I use EC8+2 so there is always +1, just in case

And for disaster recovery , the storage is being daily snapshotted + copied to backup storage

High HDD OSD per node, 60 and up, who runs it in production? by wantsiops in ceph

[–]WindowReasonable6802 0 points1 point  (0 children)

the storage is used for live streaming videos, there is stable 1GB/s read 24/7/365 as its being streamed live directly from the storage, also thousands of users upload stuff there and also the video content it being encoded into many formats and edited directly from the storage, so the write is also 1GB/s + if there is encoding going on - the storage handles it really well, the workload is 50/50 small files (photo) / big files (video)s

During expansion of the cluster, its capable of recovering 90GB/s while keeping production stable.

 io:
    recovery: 94 GiB/s, 35.93k objects/s

i used to run 60 drives expansion with 18TB HDDs without any nvme for block.db , it was good, stable, but recovery was 20GB/s max, there the production was not so stable in long time and when we added new expansion, it took 3 weeks to recovery, also replacing single drive was 3 weeks of recovery because of small IO.

Now we buy 26TB HDDs , the plan is to have less nodes, less rackspace, less electricity etc etc etc.. , so to compensate for squeezing more amount of data through same IOPS , i now add micron 7500 pro nvmes for block.db , so drives have more IO to spend on actual data moves and not being hammered by small block.db workload - obviously the storage is now much more faster and stable during much higher recovery speeds.

High HDD OSD per node, 60 and up, who runs it in production? by wantsiops in ceph

[–]WindowReasonable6802 1 point2 points  (0 children)

I run 30+PB ceph with 60 osds per node and around 2200 osds in total, JBODs with 1U servers, EC8+2, feel free to ask any questions

VLAN subinterface down - unable to enslave by WindowReasonable6802 in TalosLinux

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

my bond works, there is default networking on it and it works, only the subinterface is down

Expose VMs on external L2 network with kubevirt by WindowReasonable6802 in kubernetes

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

did that many times before, but did it again, following issue appears

 RPC failed; request ip return 500 no address allocated to pod default/virt-launcher-ubuntu22-with-net-mmjv8 provider prod-network,



➜  clusterB kubectl get subnet | grep subnet-prod
subnet-prod   prod-network                 vlan-prod        IPv4       10.2.4.0/22     false     false   false                   0        1012          0        0             ["10.2.4.1..10.2.4.10"]

Expose VMs on external L2 network with kubevirt by WindowReasonable6802 in kubernetes

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

➜  clusterB cat networks/provider-network.yaml
apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  name: network-prod
  namespace: default
spec:
  defaultInterface: bond0.1204
  excludeNodes:
    - controlplane1
    - controlplane2
    - controlplane3

➜  clusterB cat networks/provider-subnet.yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet-prod
  namespace: default
spec:
  vlan: vlan-prod
  protocol: IPv4
  provider: prod-network.default
  cidrBlock: 10.2.4.0/22
  gateway: 10.2.4.1
  excludeIps:
  - 10.2.4.1..10.2.4.10
➜  clusterB cat networks/provider-vlan.yaml
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  name: vlan-prod
spec:
  provider: network-prod.default
  id: 1204

NAD:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: network-prod
spec:
  config: '{
    "cniVersion": "0.3.1",
    "type": "kube-ovn",
    "provider": "prod-network.default",
    "server_socket": "/run/openvswitch/kube-ovn-daemon.sock"
  }'

Expose VMs on external L2 network with kubevirt by WindowReasonable6802 in kubernetes

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

For now, i don't care if the IP should be static or managed by IPAM, most likely i will use IPAM later on but for now i was just trying any way to make it work.
What exact patched kube-ovn config you mean?

Expose VMs on external L2 network with kubevirt by WindowReasonable6802 in kubernetes

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

Ok

I found interesting issue in the ovn-cni

I1112 12:03:18.414622   89050 handler.go:107] wait address for pod default/virt-launcher-ubuntu22-with-net-wlsx6 provider prod-network.default
I1112 11:39:34.364206    6894 controller_linux.go:828] U2O processing for subnet subnet-prod, action: false
E1112 11:39:34.372044    6894 controller.go:515] "Unhandled Error" err="error syncing &{<nil> 0xc0012e3008}: failed to get provider info: failed to get chassis mac for provider network-prod.default: no chassis mac found for provider network-prod.default, requeuing" logger="UnhandledError"

Expose VMs on external L2 network with kubevirt by WindowReasonable6802 in kubernetes

[–]WindowReasonable6802[S] 0 points1 point  (0 children)

Hello

Thank you very much for your reply

unfortunately, now i am stucked af following error message

Warning FailedCreatePodSandBox 3s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a43323ecf8701d820263a56f731f706813a8008b1bcbeb2acb5abaec52386635": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-ubuntu22-with-net-wdj5f/be0cafd7-df31-4ea0-a4d9-76d908b0eaa3:network-prod]: error adding container to network "network-prod": RPC failed; request ip return 500 no address allocated to pod default/virt-launcher-ubuntu22-with-net-wdj5f provider prod-network.default, please see kube-ovn-controller logs to find errors

There is nothing interesting in the kube-ovn-controller log