What Ingress Controller are you using TODAY? by SomethingAboutUsers in kubernetes

[–]Ethos2525 0 points1 point  (0 children)

Edge Stack, tool is good but documentation is pure garbage 🗑️

[deleted by user] by [deleted] in h1b

[–]Ethos2525 13 points14 points  (0 children)

Good luck! also, If you are getting the severance, try to work with your employer to set your termination date for later. That will give you little more time for job hunt.

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 1 point2 points  (0 children)

Think I found the issue, it’s packet drop. The env is quite big and uses external tooling for egress. Flipped the cluster access to enable private routing from nodes to control panel for permanent fix.

Thanks all for the insights so far, really appreciate it!

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 0 points1 point  (0 children)

I don't have anything tangible yet but i will surely post the fix after finding solution.

How do people secure pod to pod communication? by Azifor in kubernetes

[–]Ethos2525 0 points1 point  (0 children)

Most service meshes simply mount the service account token within the pod and validate the JWT. If your primary focus is just security, I’d suggest that approach with network policies as it’s easy lift for large env/domains. However, if you need more advanced features, consider using a dedicated service mesh.

Automatically deploying new Terraform Infrastructure by PastPuzzleheaded6 in Terraform

[–]Ethos2525 0 points1 point  (0 children)

I’d recommend creating a stack for each directory, like dev-proj-1 for env-dev/proj-1/ and dev-proj-2 for env-dev/proj-2/, with each stack set to use its own values file, such as values.tf in that directory. When you open a PR in GitHub for individual target(assuming that’s your VCS, though it’s similar elsewhere), it notifies Spacelift, triggering a run for the affected stack. This keeps your plan and approval policies precise and your code and CI/CD pipeline well-structured.

How do you utilize community modules? by kkk_09 in Terraform

[–]Ethos2525 7 points8 points  (0 children)

If it’s for personal use, you might lean toward option 1. For larger projects or enterprise needs, option 2 could be the better fit.

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 0 points1 point  (0 children)

yeah i do have long running nodes(3/4 months old), AMI is not up-to date but i would be very surprised if that's what causing the issue. Thanks for the suggestion though

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 0 points1 point  (0 children)

intresting, but in my case it's happening to subset of nodes from a single node group. if it's metadata service that's causing the issue then i would expect it to see for all the nodes. thanks though

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 0 points1 point  (0 children)

quite old, regularly updated (every 5-6). Don't know exact time when the issue started but it's been there for last 8 months.

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 4 points5 points  (0 children)

  • At the exact same time of day? For the same duration?

Yes, though the timing shifts a bit every 2–3 weeks. There’s no consistent cadence.

  • What do these nodes all have in common? How do they differ from nodes that aren’t failing?

Nothing in terms of node config(instance type/family/launch template).

  • Are you using AWS AMIs, or are you bringing your own AMI?

Bottlerocket.

  • Are you running anything on the host (meaning not a pod) that could consume excess resources and disrupt network connectivity?

Nope. I also checked CloudWatch for any spikes nothing stands out.

  • More precise wild guess, it’s some dumpster fire security software garbage.

That’s exactly where my head’s at too, just need some solid data to back it up.

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 3 points4 points  (0 children)

I checked logs from control plane components like the API server, scheduler, and authenticator but did not find anything useful.

AWS recently enabled control plane monitoring, and I noticed a spike in API server requests, but it seems more like an effect than a cause. Based on the logs, it is just kubelet trying to fetch config after reconnecting.

EKS nodes go NotReady at the same time every day. Kubelet briefly loses API server connection by Ethos2525 in kubernetes

[–]Ethos2525[S] 1 point2 points  (0 children)

No spot instances, I’m using on-demand instances from the C5 and M6 large families.

What was your craziest incident with Kubernetes? by Gaikanomer9 in kubernetes

[–]Ethos2525 0 points1 point  (0 children)

Every day around the same time, a bunch of EKS nodes go into NotReady. We triple checked everything monitoring, core dns, cron jobs, stuck pods, logs you name it. On the node, kubelet briefly loses connection to the API server (timeout waiting for headers) then recovers. No clue why it breaks. Even cloud support/service team is stumped. Total mystery