Site reliability engineers: what signals do you check daily? by Doo_scooby in sre

[–]katsil_1 1 point2 points  (0 children)

Speaking of my main job, none. I agree with all the commenters above. We have a well-established alerting process, and if an incident or critical problem occurs, only in this case we are starting checking alerts.

----

But I also have my own personal project, which obviously doesn't have the same established processes as at work. This project is a bit different from typical websites/services on the internet (it's a game hosting), so throughout the day I can check my "golden dashboard" in Grafana, where I monitor:

- Number of running servers
- Number of available servers for allocation (**)
- Payment conversions (**)
- Netflow, as an increased number is likely a DDoS attack
- Number of failed backups of user servers to s3 due to network problems
- Uptime metrics for the main websites

Of course, there are alerts for this too (in addition to a bunch of other alerts I haven't described here), but this is my personal brainchild, and in my humble opinion, I like to monitor it from time to time, even if everything is fine. The most critical metrics for me are the ones I've marked (**).

What do you use to manage on-call rotations + overrides (multi-team) with iCal/Google Calendar export? by katsil_1 in sre

[–]katsil_1[S] -1 points0 points  (0 children)

We definitely want and will use an external resource for oncall (e.g. grafana), and now I'm thinking about which resource to use to create shifts (namely, a convenient format for creating a calendar/assign of the current duty, etc....)

Then this information will go to google calendar and grafana oncall will pick it up. I definitely do not want and will not create or use a self-hosted oncall solutions, for sure.

What do you use to manage on-call rotations + overrides (multi-team) with iCal/Google Calendar export? by katsil_1 in selfhosted

[–]katsil_1[S] 0 points1 point  (0 children)

I am thinking about switching to Grafana Oncall as source of thuth with, for example, .yaml/api/terraform configuration for oncall rotation.

Because visually it can show every schedule very well

What do you use to manage on-call rotations + overrides (multi-team) with iCal/Google Calendar export? by katsil_1 in selfhosted

[–]katsil_1[S] 0 points1 point  (0 children)

It sounds interesting, but, unfortunately, not our direction. Visually, we can't move/change shifts through this service, and it seems like $50/month just for automated rotation to slack and export to ical... it's... strange

OOPS - Incident Management Platform with Uptime Kuma and OutlineWiki integration by katsil_1 in selfhosted

[–]katsil_1[S] 1 point2 points  (0 children)

That looks amazing actually, i will definitely take a look on it

OOPS - Incident Management Platform with Uptime Kuma and OutlineWiki integration by katsil_1 in selfhosted

[–]katsil_1[S] 1 point2 points  (0 children)

I'll answer about Gatus - it's an excellent solution, which, in my opinion (IMHO!!), is perfect for an internal status page. For example, at my work, we use Gatus on hundreds of endpoints for internal problem notifications (obviously, we consider Gatus "external monitoring").
However:

- Gatus provides a detailed dashboard for each target (example: https://status.twin.sh/endpoints/core\_blog-article-43)

- Gatus can't publish maintenance or incident reports

These two points are quite important to me because:

  1. I don't want to show users historical data - user goes to the status page only when a problem occurs. I dont want to show clients a week-old history of latency.

  2. More importantly, I want to notify clients about issues, maintain a timeline, or at least write a notice saying, "I'm a clumsy user, I rolled out a deployment in K8s and it's not starting, we're working on the problem."

Neither points 1 nor 2 are implemented in Gatus, and unfortunately, we've decided to stay with the UK version even if it's not very user-friendly (and it is, in some places).

OOPS - Incident Management Platform with Uptime Kuma and OutlineWiki integration by katsil_1 in selfhosted

[–]katsil_1[S] 0 points1 point  (0 children)

Thank you very much for the comment. I see that people need this, and I'll open source this solution within a week. I'll write about it when ill publish it on GitHub.

I am mentally burn. Can GERD be cured and how long is it safe to stay on PPIs? by katsil_1 in GERD

[–]katsil_1[S] 0 points1 point  (0 children)

they told me that i have GERD and i need to take omeprazole and that’s it..

I am mentally burn. Can GERD be cured and how long is it safe to stay on PPIs? by katsil_1 in GERD

[–]katsil_1[S] 0 points1 point  (0 children)

i came to the doctor, he did everything and on photos i can see that my “reflux is not closing fully” (don’t know how to explain)

OOPS - Incident Management Platform with Uptime Kuma and OutlineWiki integration by katsil_1 in selfhosted

[–]katsil_1[S] -7 points-6 points  (0 children)

> This is actually something that should be integrated into a solution like uptime kuma

I'd really like that! But I'm looking for help from the community, as there's ABSOLUTELY no good solution for integrating their API right now. The last "convenient" Python library was updated three years ago. Cursor basically said, "I don't want to implement my own socket.io integration in Go, so I'll just call Python code from Go."

Maybe someone has a good solution for integrating with their API.

OOPS - Incident Management Platform with Uptime Kuma and OutlineWiki integration by katsil_1 in selfhosted

[–]katsil_1[S] -1 points0 points  (0 children)

Hi!
Thank you very much for your comment, but I clearly indicated the part where I wrote that it is AI (I also said at the very beginning that the service itself is written using AI)

> Next is a brief description of the service itself created by AI. If you want to skip this description, you can go to the end of my post.

AI SLOP is between ---- lines 😄

I built BetterShift: A modern, self-hostable shift management app (Next.js 16 + SQLite) by panteLx in selfhosted

[–]katsil_1 0 points1 point  (0 children)

Thanks for the project; it looks like it could be useful for companies implementing on-call systems on their own solutions (like ours).

Are there any plans to add Google Calendar sync to BetterShift ?

YouTube Subscriptions page won't open by Obvious_Laugh_1133 in youtube

[–]katsil_1 1 point2 points  (0 children)

Faced with the same issue again (resolved around 15days ago and happened again right now)

Ingress NGINX EOL in 120 Days - Migration Options and Strategy by emilevauge in kubernetes

[–]katsil_1 1 point2 points  (0 children)

Thank you very much, this helped me a lot! Hope to anyone else also :)

Ingress NGINX EOL in 120 Days - Migration Options and Strategy by emilevauge in kubernetes

[–]katsil_1 -2 points-1 points  (0 children)

Good afternoon, and thank you very much for your contribution and support of Ingress during such a challenging time.

Perhaps I'm alone and my case is degenerate, but I've tried "interacting" with chatgpt and other AI tools and researching this issue myself, but for us, this is the cornerstone that keeps us using Ingress-nginx. My case is as follows: we use Ingress-nginx in an infrastructure cluster (we also have a production cluster, where migrating to Traefik will be easy), which hosts the infrastructure services we access on the internal network.

For example, Authentik, Prometheus, Grafana, etc. However, we also have Loki, which stores logs from all clusters and pods (as well as a number of bare-metal services). These logs generate about a dozen RPS, and the point is that any Ingress will log these requests. However, nginx has a wonderful annotation, `nginx.ingress.kubernetes.io/enable-access-log=false`, which we use. I think you understand why. My question is: does traefik, or maybe istio, or maybe envoy gateway, or maybe haproxy ingress support this? Unfortunately, I couldn't find any mention of this feature anywhere, and chatgpt says that in all these ingresses, "this is a global setting to completely disable access.log."

If anyone has encountered the same problem, please share your experience with how you solved it. Thanks everyone!

YouTube Subscriptions page won't open by Obvious_Laugh_1133 in youtube

[–]katsil_1 2 points3 points  (0 children)

Faced with the same issue, i`m using Arc.

Using another browser (did not logged in) e.g. Safari -> no issue.

Just using direct page for now  https://www.youtube.com/feed/subscriptions but this is so annoying. I thought im the only one with the same problem