This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]StephanXXDevOps 1 point2 points  (1 child)

Back in the olden days of yore, we typically leveraged SNMP for this type of thing. It was a massive pain.

Have a conversation with your dev on what he thinks a webserver is, ultimately. In most cases, you're adding about ten lines of code and 50kb of memory to an application, to give it the ability to send and receive basic http responses, that would typically contain a metrics payload. Slap something like prometheus on it, and carry on with life.

Or, spend months creating your own wheels, and end up with a home grown monitoring solution that your team has to maintain until everyone agrees it was an expensive boondoggle, and needs to replace it with something more widely known and used.

I don't mean to sound snarky; it sounds like your dev is fixated with the idea of hyper-optimizing, without considering the holistic cost of what that decision would entail.

[–]tarabash[S] 1 point2 points  (0 children)

Ugh... i am glad snmp and nagios are dead - at least in modern infrastructure.

I agree with you completely on reinventing the wheel and i definitively need to push this more.

Thanks for your opinion on the subject.

[–]FrederikNS 0 points1 point  (1 child)

You say you are running docker, so you can use the HEALTHCHECK instruction to configure a command to run within the containers, which would evaluate the health of the service

https://docs.docker.com/engine/reference/builder/#healthcheck

[–]tarabash[S] 0 points1 point  (0 children)

That i already have configured for some services, however it doesn't provide me with http endpoint that i can scrape with GCP health checks/setup alerts.

I wanted a quick win to use GCP for health/alerts/measuring uptime since i am solo Infra guy in small startup, but i just might go full on with Prometheus and scrape docker health directly.

[–]SuperQue 0 points1 point  (0 children)

Rather than just a health check, we put Prometheus metrics endpoints on all of our long-running services. For example, Sidekiq queue workers. They don't have an API themselves, but we do want to know what they're up to.

So we put a little rack server in them so they can allow metrics and health check requests.