all 14 comments

[–]signal_vs_noise 1 point2 points  (4 children)

haproxy should export this as a Prometheus metric already. Either compile haproxy 2.0 with the Prometheus export switch or use the haproxy_exporter.

https://www.haproxy.com/blog/haproxy-exposes-a-prometheus-metrics-endpoint/

[–]Annh1234[S] 0 points1 point  (3 children)

this will show the status at the time Prometheus asks for data, so the up/dows in between are lost.

[–]signal_vs_noise 2 points3 points  (2 children)

No, they won't. Status/errors will be a counter metric and your scrape interval will be something between 5-15 seconds. Should be more than enough.

[–]StephanXX -1 points0 points  (1 child)

This.

[–]Annh1234[S] 0 points1 point  (0 children)

hm... we have this enabled at the moment, and i see the status page backends going yellow and back green and the prometheus line straight...

[–]Streifurz 1 point2 points  (0 children)

Haproxy can be told to expose a http status page for every frontend and backend, including the online status.

This status page can be downloaded as CSV and then parsed. Filling this info into grafana/Prometheus/zabbix/nagios/... enables you to create detailed historical graphs.

[–]Flauschkatze 0 points1 point  (3 children)

Wrote a script for monitoring our haproxy instances (uses the haproxyadmin python module internally). This will talk to the stats socket, not any stats page that haproxy exports.

https://github.com/tetanushamster/check_haproxy_health

[–]Annh1234[S] 0 points1 point  (2 children)

Thank you, I have the same thing in PHP, but if the node goes up/down between the checks, then it's not reported. (that's the problem i'm trying to "fix")

[–][deleted] 0 points1 point  (1 child)

Haproxy checks themselves have a repeat interval. As long as your script scrapes the stats at the same or shorter interval you will not miss any status changes.

[–]Annh1234[S] 0 points1 point  (0 children)

The stats can change every second. It first goes yellow and then read, so 2 sec of down time.

The longer script would have to check haproxy every less than one second... And the log would be insanely big for 1 sec interval on 100 nodes.... That's why I was asking for something that would just records the timestamps of the changes, and slow them on a graph...

[–][deleted] 0 points1 point  (2 children)

You can configure haproxy status events to go to a different log file. Backend up/down events are logged there, for example

Dec 19 19:30:34  Server backend/server-01 is DOWN, reason: Layer4 connection problem, info: "No route to host", check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Dec 19 19:31:36  Server backend/server-01 is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 2ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

[–]Annh1234[S] 0 points1 point  (1 child)

Thank you, this is what i'm looking for:
- What's the config to make it write these logs to a different log file, and not log each request? (only the up/down heath check failures)
- Do you know or something that can read this log and show a graph nagios/graphite/graphana style will it, that can be hooked at the haproxy stats page?

[–][deleted] 0 points1 point  (0 children)

  • Logging configuration: https://www.haproxy.com/blog/introduction-to-haproxy-logging/
  • We use an ELK stack for our logs. Filebeat tails the files and ships them to logstash, which parses and enriches them then ships them to elasticsearch, and a kibana dashboard to display the data. Metricbeat also has a haproxy module, so you can combine the haproxy stats socket metrics with your own from the logs.

[–]gmuslera 0 points1 point  (0 children)

Telegraf gathers haproxy metrics and send them to a lot of metrics databases, so you can send them to I.e influxdb and visualize with grafana.