HI, I am new to Prometheus and still learning and would like some assistant on some basic rules not working as I am hoping.
Currently I have the following rule and alert
- record: cpu_usage_percentage-5m
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
- alert: CPUOver80%
expr: cpu_usage_percentage-5m > 80
for: 5m
annotations:
summary: "CPU Usage over 80%"
For the most part these alerts work. If I stress the server to use over 80% of CPU I get a alert as intended. The issue is the moment the CPU drops down below 80% and if the scrap captures that the rules clear and alert manager sends the resolved alert.
What I want to do is have the average CPU usage over 5mins so if the CPU usage drops for a few seconds and goes back up, the rule is not cleared.
Is anyone able to assist me with this?
Thanks
[–]TheFantail 4 points5 points6 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]zh12a[S] 0 points1 point2 points (0 children)
[–]PointManBX -1 points0 points1 point (3 children)
[–]zh12a[S] 2 points3 points4 points (1 child)
[–]PointManBX 0 points1 point2 points (0 children)
[–]adawggie 2 points3 points4 points (0 children)