From 9d5cc9364f4e922cc696af53c2e8269d2b9533c1 Mon Sep 17 00:00:00 2001 From: Samuel Berthe Date: Mon, 16 Mar 2026 14:36:16 +0100 Subject: [PATCH] feat: add cert-manager alerting rules (4 rules) Add Prometheus alerting rules for cert-manager under the "Network, security and storage" category: - Cert-Manager absent (service down detection) - Certificate expiring soon (21-day threshold) - Certificate not ready (readiness check) - Hitting ACME rate limits (rate limit detection) Based on imusmanmalik/cert-manager-mixin and official cert-manager metrics documentation. --- _data/rules.yml | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/_data/rules.yml b/_data/rules.yml index 4483ddf..2dfbf07 100644 --- a/_data/rules.yml +++ b/_data/rules.yml @@ -3638,6 +3638,35 @@ groups: query: ssl_verified_cert_not_after{chain_no="0"} - time() < 86400 * 7 severity: warning + - name: cert-manager + exporters: + - name: Embedded exporter + slug: embedded-exporter + doc_url: https://cert-manager.io/docs/devops-tips/prometheus-metrics/ + rules: + - name: Cert-Manager absent + description: Cert-Manager has disappeared from Prometheus service discovery. New certificates will not be able to be minted, and existing ones can't be renewed until cert-manager is back. + query: 'absent(up{job="cert-manager"})' + severity: critical + for: 10m + - name: Cert-Manager certificate expiring soon + description: The certificate {{ $labels.name }} is expiring in less than 21 days. + query: 'avg by (exported_namespace, namespace, name) (certmanager_certificate_expiration_timestamp_seconds - time()) < (21 * 24 * 3600)' + severity: warning + for: 1h + comments: | + Threshold of 21 days is a rough default. ACME certificates are typically renewed 30 days before expiry, so expiring within 21 days may indicate issuer misconfiguration. + - name: Cert-Manager certificate not ready + description: "The certificate {{ $labels.name }} in namespace {{ $labels.exported_namespace }} is not ready to serve traffic." + query: 'max by (name, exported_namespace, namespace, condition) (certmanager_certificate_ready_status{condition!="True"} == 1)' + severity: critical + for: 10m + - name: Cert-Manager hitting ACME rate limits + description: Cert-Manager is being rate-limited by the ACME provider. Certificate issuance and renewal may be blocked for up to a week. + query: 'sum by (host) (rate(certmanager_http_acme_client_request_count{status="429"}[5m])) > 0' + severity: critical + for: 5m + - name: Juniper exporters: - name: czerwonk/junos_exporter