mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-26 19:37:27 +08:00
Update rules.yml
This commit is contained in:
parent
6179475625
commit
2c341445db
1 changed files with 2 additions and 0 deletions
|
|
@ -273,6 +273,8 @@ groups:
|
||||||
description: OOM kill detected
|
description: OOM kill detected
|
||||||
query: "(increase(node_vmstat_oom_kill[30m]) > 0)"
|
query: "(increase(node_vmstat_oom_kill[30m]) > 0)"
|
||||||
severity: warning
|
severity: warning
|
||||||
|
comments: |
|
||||||
|
When a machine runs out of memory, the node exporter can become unresponsive for several minutes. Even if the system takes 15–20 minutes to recover, the alert should still trigger.
|
||||||
- name: Host EDAC Correctable Errors detected
|
- name: Host EDAC Correctable Errors detected
|
||||||
description: 'Host {{ $labels.instance }} has had {{ printf "%.0f" $value }} correctable memory errors reported by EDAC in the last 5 minutes.'
|
description: 'Host {{ $labels.instance }} has had {{ printf "%.0f" $value }} correctable memory errors reported by EDAC in the last 5 minutes.'
|
||||||
query: "(increase(node_edac_correctable_errors_total[1m]) > 0)"
|
query: "(increase(node_edac_correctable_errors_total[1m]) > 0)"
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue