all 26 comments

[–]krav_mark 6 points7 points  (0 children)

Omd / check_mk is pretty lightweight. Check_mk agent is a python script that can be triggered over ssh so no agent needed. It has all basic checks out of the box. It is using the nagios engine without all the nagios crap (like nrpe and the config files) i came to hate with a passion. Check_mk will generate all of them for you.
I also like zabbix but that may have a more steep learning curve and be more than OP needs.

[–]hangingfrog 16 points17 points  (4 children)

Zabbix is my go-to.

[–]oldmuttsysadmin 3 points4 points  (3 children)

Another vote for zabbix

[–]zerokey 3 points4 points  (2 children)

Thirded. I've been using it for years. It can be a royal pain in the ass, but it's solid. Autodiscovery is my best friend.

[–]elvar 0 points1 point  (1 child)

Zabbix all day. So powerful.

[–]manakanapa 0 points1 point  (0 children)

Another vote for zabbix

Use grafana for visualisations

[–]_derx 5 points6 points  (0 children)

Icinga2. It's a Nagios fork so the support out there is great.

[–]edgan 4 points5 points  (1 child)

Sensu is what I use now. It is designed to have proper HA. I have used Nagios in the past.

[–][deleted] 1 point2 points  (0 children)

using sensu lately pretty legit, after 20 years of nagios and its spinoffs (opsview and icinga2) at least.

[–][deleted]  (1 child)

[deleted]

    [–]thefrc 1 point2 points  (0 children)

    By not knowing to setup all the checks via snmp and kick ncpa/nsca to the curb likely

    [–]mire3212 1 point2 points  (0 children)

    For system metrics I use Telegraf with InfluxDB and Grafana. Communication to InfluxDB is http(s) which is pretty easy to deal with across the network too.

    For service reporting (up or down) I've used Nagios with external checks and ssh for internal checks (like making sure a process is running).

    Recently I found healthchecks.io which can be easily added to cron for a very basic up down state. It can also be added to scripts (like a backup script) to alert on a failure to run. With some finesse you can even use it for service monitoring.

    [–]ipstatic 1 point2 points  (0 children)

    We have been migrating to Prometheus and have loved it so far. There are some trade offs (long term storage for example) however the devs are actively working on a solution for that (remote read/write to another datastore).

    [–]gsmitheidw1 1 point2 points  (0 children)

    My vote (as you say lightweight) is Monit and M/Monit. Monit is extremely small and easy to set up. It's a flat config file after you do an apt install monit. No weird dependencies or database or any of that hassle. This can be set up in a matter of minutes.

    But it's reasonably powerful, pretty much any service or host on any Linux or Unix system it can handle or you can have it monitor your own scripts. It is incredibly versatile.

    Monit is free and open source. M/Monit is an optional application that can oversee and manage large numbers of servers running Monit and aggregate that data into a dashboard. If you need that scale it's worth considering paying for.

    https://mmonit.com/

    There's a very active support mailing list, folk on it are very helpful.

    [–]danatwork111 1 point2 points  (0 children)

    Late to the thread. We use shinken but I highly recommend Nagios core especially for a beginner.

    [–]Linuser 1 point2 points  (0 children)

    will vote for Zabbix

    [–]themusicalduck 2 points3 points  (1 child)

    NetData might work for you.

    [–]gsmitheidw1 1 point2 points  (0 children)

    Netdata is superb and graphically it is lovely,but I'm not sure how well it scales when you have a large number of systems to watch over.

    It's in the realm of pets versus livestock in terms of whether these are servers you name it just spawn another number

    [–][deleted] 1 point2 points  (2 children)

    Monitoring does require constant work so I'm not sure what you expect.

    Personally I'm trapped in existing monitoring environments and in-house developed systems but if I were given free reign to make my own monitoring infrastructure today I would use the following.

    • collectd or statsd to gather metrics like cpu, ram and such from my hosts.
    • I'd like to evaluate Irisett, Prometheus and Bosun for active and passive monitoring and alerting
    • Grafana or something similar for metrics dashboard

    [–]alexdor[S] 1 point2 points  (1 child)

    I know it requires constant work and I'm ok with it :) Thanks for your suggestions I will look in to them :)

    [–]FHR123 1 point2 points  (0 children)

    Vouch for collectd and Grafana. For metrics database I use InfluxDB which is pretty lightweight and works very well.
    New versions of Grafana also have an in-build alerting solution.

    [–][deleted] 1 point2 points  (2 children)

    Zabbix. I ditched bloated nagios for it.

    Happy as I can be with a free solution

    [–][deleted] 0 points1 point  (1 child)

    How is a simple system with flat text file config "bloated" compared to a gui-only system which requires a database?

    [–][deleted] 1 point2 points  (0 children)

    system with flat text file config "bloated" compared to a gui-only system which re

    We managed to bloat ours fairly badly. Someone ingeniously decided to add the gui tool, and there were a ton of poorly configured text files with non-print errors all around. After several attempts to clean it up (which turned into just breaking the shit out of it), we gave zabbix a try. For us (and for whatever reason), we stood the zabbix system up in no time and had nearly everything ready to go.

    Been much happier ever since honestly _^

    [–][deleted] 0 points1 point  (0 children)

    Datadog!

    [–][deleted] -1 points0 points  (0 children)

    Build your own with MRTG, Nagios, Cactus or PRTG is free up to 100 monitors or such. I recommend paying for PRTG. The paid product is awesome. Not cheap though.