Update rules.yml

This commit is contained in:
Samuel Berthe 2026-01-30 12:14:32 +01:00 committed by GitHub
parent 6179475625
commit 2c341445db
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -273,6 +273,8 @@ groups:
description: OOM kill detected
query: "(increase(node_vmstat_oom_kill[30m]) > 0)"
severity: warning
comments: |
When a machine runs out of memory, the node exporter can become unresponsive for several minutes. Even if the system takes 1520 minutes to recover, the alert should still trigger.
- name: Host EDAC Correctable Errors detected
description: 'Host {{ $labels.instance }} has had {{ printf "%.0f" $value }} correctable memory errors reported by EDAC in the last 5 minutes.'
query: "(increase(node_edac_correctable_errors_total[1m]) > 0)"