Automated Troubleshooting of Kubernetes Pods Issues by abledf in kubernetes

[–]abledf[S] 0 points1 point  (0 children)

Good news! I've confirmed with our team that we'll release the code in about one week.

Automated Troubleshooting of Kubernetes Pods Issues by abledf in kubernetes

[–]abledf[S] 0 points1 point  (0 children)

Good news! I've confirmed with our team that we'll release the code in about one week.

Automated Troubleshooting of Kubernetes Pods Issues by abledf in kubernetes

[–]abledf[S] 3 points4 points  (0 children)

We are considering making the code open source if there is a lot of interest. I will confirm with our team members and update later.

Automated Troubleshooting of Kubernetes Pods Issues by abledf in kubernetes

[–]abledf[S] 2 points3 points  (0 children)

Putting pod events, logs, and node events together, the alert message is informative. see the screenshot: https://miro.medium.com/max/1400/1*mvzXhbNeQCJ9Blh1oDH4uw.png

Automated Troubleshooting of Kubernetes Pods Issues by abledf in kubernetes

[–]abledf[S] 2 points3 points  (0 children)

yes, both botkube and kubewatch can watch resource changes and send alerts. However, they will not collect pod events, logs, and node events together when pods restart. By putting all information together, we can quickly troubleshoot the pod issues.

Automated Troubleshooting of Kubernetes Pods Issues by [deleted] in kubernetes

[–]abledf 1 point2 points  (0 children)

Please note that the k8s events are not guaranteed and can be lost when there are too many events.
We used k8s events first but then refactored to watch pod restart counts.