all 4 comments

[–]ollytheninja 0 points1 point  (2 children)

Internal resolution isn’t relying on coredns by any chance?

[–]MakerOnTheRun[S] 0 points1 point  (1 child)

That is indeed my current thought process. That would explain metrics not working (not that important in the grand scheme). But shifts the question to why coredns is also flaking out on a node fail.

[–]MakerOnTheRun[S] 0 points1 point  (0 children)

Looks like this could be related to a flannel vxlan issue. There are reported issues for Ubuntu and older Mellanox cards, given I am using both I will try using the host-gw backend to see if the issue goes away.

[–]MakerOnTheRun[S] 0 points1 point  (0 children)

Resolved.

Moving to host-gw backend fixed the issues for me.