Query fails if instance names are not unique across jobs. This fixes it.

This commit is contained in:
Evi Vanoost 2024-07-02 13:49:12 -04:00
parent 51d0484bb4
commit 54e2b09b3d

View file

@ -27,7 +27,7 @@ groups:
severity: critical severity: critical
- name: Prometheus target missing with warmup time - name: Prometheus target missing with warmup time
description: Allow a job time to start up (10 minutes) before alerting that it's down. description: Allow a job time to start up (10 minutes) before alerting that it's down.
query: "sum by (instance, job) ((up == 0) * on (instance) group_right(job) (node_time_seconds - node_boot_time_seconds > 600))" query: "sum by (instance, job) ((up == 0) * on (instance) group_left (__name__) (node_time_seconds - node_boot_time_seconds > 600))"
severity: critical severity: critical
- name: Prometheus configuration reload failure - name: Prometheus configuration reload failure
description: Prometheus configuration reload error description: Prometheus configuration reload error