mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-26 11:27:00 +08:00
Update rules.yml
This commit is contained in:
parent
6179475625
commit
2c341445db
1 changed files with 2 additions and 0 deletions
|
|
@ -273,6 +273,8 @@ groups:
|
|||
description: OOM kill detected
|
||||
query: "(increase(node_vmstat_oom_kill[30m]) > 0)"
|
||||
severity: warning
|
||||
comments: |
|
||||
When a machine runs out of memory, the node exporter can become unresponsive for several minutes. Even if the system takes 15–20 minutes to recover, the alert should still trigger.
|
||||
- name: Host EDAC Correctable Errors detected
|
||||
description: 'Host {{ $labels.instance }} has had {{ printf "%.0f" $value }} correctable memory errors reported by EDAC in the last 5 minutes.'
|
||||
query: "(increase(node_edac_correctable_errors_total[1m]) > 0)"
|
||||
|
|
|
|||
Loading…
Reference in a new issue