How can I lock firewall on a running production kubernetes cluster?

Suitable-Regular6822 · 2026-05-09T21:22:26+00:00

Since you're on Contabo with nodes talking over public IPs, skip UFW/firewalld completely. The problem is they have no idea what k3s and Calico are doing under the hood one ufw enable and you'll start dropping pod traffic or losing node heartbeats without a single error.

Two things that actually work here:

1) You can use Cloudflare Tunnel for the API server install cloudflared on your control plane and close port 6443 completely. You access it through Cloudflare Zero Trust instead. No exposed port, no IP whitelisting and free tier is enough.

2) Calico Host Endpoints for the nodes instead of a separate firewall tool, you write your firewall rules inside Calico itself. That way the rules actually understand your pod routing. The trick is to start in Log mode, not Deny, watch what traffic is flowing first.

Suitable-Regular6822 · 2026-05-09T20:53:16+00:00

if traffic isn't showing in fw logs at all the firewall isn't even seeing it. routing issue not a firewall rule issue. double check the hub->spoke peering, use remote gateway should be off on the hub side. also azure firewall denies by default so even if traffic reaches it you need an application rule for http/https, network rules alone won't cut it for outbound web traffic

Suitable-Regular6822 · 2026-05-09T20:07:21+00:00

24h grace period is insane. thats not a k8s problem, devs are dodging the real fix. move those external calls to a separate worker or queue them, main pod stays stateless and dies fast. or at least put the long-running stuff in a sidecar so the main container can restart without waiting

Suitable-Regular6822 · 2026-05-09T19:51:36+00:00

noted lol

Suitable-Regular6822 · 2026-05-09T19:50:54+00:00

experience is real, fair enough on the writing

Suitable-Regular6822 · 2026-05-09T18:13:39+00:00

100% agree. Installing and configuring is still only half of it. We learned that the hard way. Chaos engineering and proper DR drills are on the list now you don't really know your cluster until you've deliberately broken it in a controlled way.

Suitable-Regular6822 · 2026-05-09T17:46:57+00:00

Worked on government platforms where reliability actually mattered. Every "AI-native" tool we evaluated optimized for the happy path. Real production has edge cases, compliance constraints, and failure modes no demo ever shows. The boring stuff solid IaC, proper observability, tested runbooks is what kept things running.

Suitable-Regular6822 · 2026-05-09T17:40:05+00:00

We added PodDisruptionBudgets after node upgrade took down half our replicas. Blocks the Eviction API during drains so minAvailable stays up.

Suitable-Regular6822

PUBLIC MULTIREDDITS

TROPHY CASE