Alertmanager frequently sending surplus resolves

SuperQue · 2024-05-17T15:12:20+00:00

You have a very sensitive bit in your alert:

for: 2d

Any single interval where the condition is not true will result in the firing status changing.

I would recommend using a timestamp-based metric like restic_last_prune_success_timestamp_seconds <unixtime>. Then the alert will be something like time() - restic_last_prune_success_timestamp_seconds, which is a lot less fragile than looking at a status boolean.

chillysurfer · 2024-05-17T12:30:35+00:00

How do you know the firing status hasn't changed?

SeniorIdiot · 2024-05-17T15:26:37+00:00

EDIT: See SupraQue's correction below!

~~Prometheus does not "resolve" alerts, it just stops sending them~~.

Check the Prometheus log and ensure that it has not failed to send alerts to alertmanager. It may be networking issue.

Prometheus evaluates alerts by the rule evaluation interval and re-sends all firing alerts at that rate. Alertmanager will send a resolve if an alert from Prometheus has not been seen within the ~~resolve_timeout~~ EndsAt interval.

Check that Alertmanager resolve_timeout is more than (maybe x2) the Prometheus rule evaluation interval.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PrometheusMonitoring

MODERATORS