Feature/cert manager rules (#524)

* Add .worktrees/ to .gitignore

* feat: add cert-manager alerting rules (4 rules)

Add Prometheus alerting rules for cert-manager under the
"Network, security and storage" category:
- Cert-Manager absent (service down detection)
- Certificate expiring soon (21-day threshold)
- Certificate not ready (readiness check)
- Hitting ACME rate limits (rate limit detection)

Based on imusmanmalik/cert-manager-mixin and official
cert-manager metrics documentation.

* docs: add cert-manager to README
This commit is contained in:
Samuel Berthe 2026-03-16 15:01:07 +01:00 committed by GitHub
parent 7f346ede99
commit d8315eb3bc
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 30 additions and 0 deletions

View file

@ -115,6 +115,7 @@ Collection available here: **[https://samber.github.io/awesome-prometheus-alerts
- [OpenEBS](https://samber.github.io/awesome-prometheus-alerts/rules#openebs)
- [Minio](https://samber.github.io/awesome-prometheus-alerts/rules#minio)
- [SSL/TLS](https://samber.github.io/awesome-prometheus-alerts/rules#ssl/tls)
- [cert-manager](https://samber.github.io/awesome-prometheus-alerts/rules#cert-manager)
- [Juniper](https://samber.github.io/awesome-prometheus-alerts/rules#juniper)
- [CoreDNS](https://samber.github.io/awesome-prometheus-alerts/rules#coredns)
- [FreeSwitch](https://samber.github.io/awesome-prometheus-alerts/rules#freeswitch)

View file

@ -3684,6 +3684,35 @@ groups:
query: ssl_verified_cert_not_after{chain_no="0"} - time() < 86400 * 7
severity: warning
- name: cert-manager
exporters:
- name: Embedded exporter
slug: embedded-exporter
doc_url: https://cert-manager.io/docs/devops-tips/prometheus-metrics/
rules:
- name: Cert-Manager absent
description: Cert-Manager has disappeared from Prometheus service discovery. New certificates will not be able to be minted, and existing ones can't be renewed until cert-manager is back.
query: 'absent(up{job="cert-manager"})'
severity: critical
for: 10m
- name: Cert-Manager certificate expiring soon
description: The certificate {{ $labels.name }} is expiring in less than 21 days.
query: 'avg by (exported_namespace, namespace, name) (certmanager_certificate_expiration_timestamp_seconds - time()) < (21 * 24 * 3600)'
severity: warning
for: 1h
comments: |
Threshold of 21 days is a rough default. ACME certificates are typically renewed 30 days before expiry, so expiring within 21 days may indicate issuer misconfiguration.
- name: Cert-Manager certificate not ready
description: "The certificate {{ $labels.name }} in namespace {{ $labels.exported_namespace }} is not ready to serve traffic."
query: 'max by (name, exported_namespace, namespace, condition) (certmanager_certificate_ready_status{condition!="True"} == 1)'
severity: critical
for: 10m
- name: Cert-Manager hitting ACME rate limits
description: Cert-Manager is being rate-limited by the ACME provider. Certificate issuance and renewal may be blocked for up to a week.
query: 'sum by (host) (rate(certmanager_http_acme_client_request_count{status="429"}[5m])) > 0'
severity: critical
for: 5m
- name: Juniper
exporters:
- name: czerwonk/junos_exporter