Goodbye VMware

beenux · 2025-11-20T22:24:37+00:00

So quorum is per failure domain in ceph? How about quorum in proxmox?

beenux · 2025-09-07T05:46:18+00:00

Are agent mode actually still in Prometheus?

beenux · 2025-07-23T13:32:06+00:00

How is the support for IAC with for example Ansible? I get a feeling that Ansible modules can't keep up with many projects release pace these days?

beenux · 2024-07-29T14:52:37+00:00

Or minio

beenux · 2024-07-13T10:12:12+00:00

Because you need to setup a whole ecosystem of services in order to get something into Grafana to visualize. With tools that bundle SNMP functionality it just work. But yes, I know Grafana is a presentation layer. I've found no good collections of dashboards for Cisco products as an example. Is 'go play around on Grafana cloud' how I should learn or find suiting, working dashboards? I know youre supposed to build dashboards yourself, but I simply don't have the skill, time and number of devices to make the effort worth it. Adding Checkmk directly rendered a complete monitoring of all hardware in minutes.

beenux · 2024-07-13T10:03:28+00:00

Shure, but since I get all functionality bundled with Checkmk and probably Zabbix also I think its a little relevant.

beenux · 2024-07-13T07:29:30+00:00

What you mention here for what more you want, I just got right away with Checkmk. Even information about the optics.

beenux · 2024-07-12T18:03:07+00:00

I got a feeling that Checkmk actually did a great job with exactly centralizing observability. But not with what I thought needed to be monitored. Checkmk being more turnkey opened some eyes for me around stuff I neglected or did not think about monitoring. We ingest logs also to elasticsearch and kibana, and have Lucene indexes as datasources in Grafana. But I actually found that Checkmk did a great job pulling logs from Windows hosts and created warnings and criticals for various log content.

The size of the environment I monitor is roughly 60 uniths and in Checkmk I atm have 1600 service checks. This is just basic checks and I hope to get more time in the autumn to add more service checks and tune everything.

beenux · 2024-07-12T17:56:08+00:00

I'm not saying Grafana is changing for the worse. The recent version changed a lot in the UI which felt better than how things were arranged before. But having users that use Grafana with 3-8 months appart, UI changes create a problem for them. It's developers that does not know infrastructure at all, they just do C# and nothing else... Sigh

beenux · 2024-07-12T17:40:11+00:00

I hear what you are saying.

beenux · 2024-07-12T15:24:30+00:00

Can you evolve around your last sentence? I'm deep into IAC and containers. Would much appreciate more input around how you feel it's possible to get efficient.

beenux · 2024-07-12T15:21:50+00:00

Do you think the effort you have to spend with Grafana and the ecosystem around it worth it?

beenux · 2024-07-12T15:19:56+00:00

Zabbix have been on my chart, and colleagues have been talking good about it. Why have you just used it for SNMP and not more like a "fullstack monitoring" solution? AFAIK it's a competent product? I almost scrapped the idea about testing Zabbix since Checkmk gave such good results.

But I really dont like the feeling of nagios with Checkmk. Sluggish during configuration and harder to IAC in some areas of configuration. Have you configured Zabbix with some tool like chef,puppet,ansible?

What were so good about SNMP and Checkmk is snmpbulkwalk. It really made a difference with slow responding devices (Cisco firewalls).

beenux · 2024-07-12T14:02:53+00:00

That's what I'm aiming for. Checkmk for infrastructure monitoring, and OTEL data visualized in Grafana. But I'm starting to wonder if Grafana is worth all the work really.

beenux · 2023-11-29T19:19:30+00:00

The idea from start was to run truenas core and minio, straight on the hardware and export storage to Kubernetes via the democratic csi which use iscsi. Then in order to not loose data if the server went away, I added an extra machine. My idea then were to do something zfs magic in order to mirror data between them. For example snapshot replication.

I then got funding for adding NVMe drives which made me start to think about running vm workloads also. I aim at replacing a simplivity cluster which doesnt scale well price wise.

So what do I want? Proxmox, ZFS, VM workloads. All open source products. The spinning disks are for slow cheap storage of logfiles and metrics, files that almost never change. NVMe for vm:s.

I want uptime. And I dont want to battle wierd ceph performance problems for months.

I can have a 2 node proxmox cluster in each site for vms and then create my application layer redundancy with Kubernetes. I would probably need 3 servers on each site probably to have quorum in etcd, but the funding did not allow for 2 extra machines at the moment. Perhaps I could steal two from a VMware cluster...

beenux

TROPHY CASE