mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-25 02:46:59 +08:00
fix: narrow systemd unit inactive query to reduce noise
Add type="service" and name filter to the inactive unit alert to avoid false positives from legitimately inactive units.
This commit is contained in:
parent
9640433efb
commit
ce08b16265
1 changed files with 2 additions and 3 deletions
|
|
@ -823,12 +823,11 @@ groups:
|
||||||
for: 5m
|
for: 5m
|
||||||
- name: Systemd unit inactive
|
- name: Systemd unit inactive
|
||||||
description: "Systemd unit {{ $labels.name }} is inactive. (instance {{ $labels.instance }})"
|
description: "Systemd unit {{ $labels.name }} is inactive. (instance {{ $labels.instance }})"
|
||||||
query: 'systemd_unit_state{state="inactive"} == 1'
|
query: 'systemd_unit_state{state="inactive", type="service", name=~"your-critical-service.+"} == 1'
|
||||||
severity: warning
|
severity: warning
|
||||||
for: 5m
|
for: 5m
|
||||||
comments: |
|
comments: |
|
||||||
Many units are legitimately inactive. Filter by unit name to avoid noise, e.g.:
|
Many units are legitimately inactive. You must adjust the name=~ filter to match your critical services.
|
||||||
systemd_unit_state{state="inactive", name=~"your-critical-service.+"} == 1
|
|
||||||
- name: Systemd service crash looping
|
- name: Systemd service crash looping
|
||||||
description: "Systemd service {{ $labels.name }} has restarted {{ $value }} times in the last hour. (instance {{ $labels.instance }})"
|
description: "Systemd service {{ $labels.name }} has restarted {{ $value }} times in the last hour. (instance {{ $labels.instance }})"
|
||||||
query: 'increase(systemd_service_restart_total[1h]) > 5'
|
query: 'increase(systemd_service_restart_total[1h]) > 5'
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue