fix: narrow systemd unit inactive query to reduce noise

Add type="service" and name filter to the inactive unit alert
to avoid false positives from legitimately inactive units.
This commit is contained in:
Samuel Berthe 2026-03-16 14:01:08 +01:00
parent 9640433efb
commit ce08b16265

View file

@ -823,12 +823,11 @@ groups:
for: 5m for: 5m
- name: Systemd unit inactive - name: Systemd unit inactive
description: "Systemd unit {{ $labels.name }} is inactive. (instance {{ $labels.instance }})" description: "Systemd unit {{ $labels.name }} is inactive. (instance {{ $labels.instance }})"
query: 'systemd_unit_state{state="inactive"} == 1' query: 'systemd_unit_state{state="inactive", type="service", name=~"your-critical-service.+"} == 1'
severity: warning severity: warning
for: 5m for: 5m
comments: | comments: |
Many units are legitimately inactive. Filter by unit name to avoid noise, e.g.: Many units are legitimately inactive. You must adjust the name=~ filter to match your critical services.
systemd_unit_state{state="inactive", name=~"your-critical-service.+"} == 1
- name: Systemd service crash looping - name: Systemd service crash looping
description: "Systemd service {{ $labels.name }} has restarted {{ $value }} times in the last hour. (instance {{ $labels.instance }})" description: "Systemd service {{ $labels.name }} has restarted {{ $value }} times in the last hour. (instance {{ $labels.instance }})"
query: 'increase(systemd_service_restart_total[1h]) > 5' query: 'increase(systemd_service_restart_total[1h]) > 5'