Commit graph

79 commits

Author SHA1 Message Date
Samuel Berthe
88e2c19017
feat: add Keycloak alerting rules (aerogear/keycloak-metrics-spi) (#517)
* feat: add Keycloak alerting rules (aerogear/keycloak-metrics-spi)

* fix: correct Keycloak metrics-spi metric names and query grouping
2026-03-16 04:40:15 +01:00
Samuel Berthe
20651aa10d
feat: add OpenStack alerting rules (openstack-exporter) (#515)
* feat: add OpenStack alerting rules (openstack-exporter)

Add 20 alerting rules for openstack-exporter/openstack-exporter covering
Nova, Neutron, Cinder, Octavia, and Placement services.

* docs: add OpenStack to README services list

* fix: align OpenStack load balancer alert name with operating_status semantics

The operating_status label uses ONLINE/OFFLINE/DEGRADED/ERROR values,
not ACTIVE. Rename alert to "not online" and use the label in the
description for clarity.
2026-03-16 03:43:51 +01:00
Samuel Berthe
bf7b902881
feat: add process-exporter alerting rules (ncabatoff/process-exporter) (#514)
* feat: add process-exporter alerting rules (ncabatoff/process-exporter)

* docs: add Process to README services list

* fix: address PR review feedback for process-exporter rules

- Rename service from "Process" to "Process Exporter" for clarity
- Fix grammar: "file descriptors usage" → "file descriptor usage"
- Clarify CPU alert description as core-equivalent percentage
- Rename "high disk IO" to "high disk write IO" for accuracy
2026-03-16 03:31:18 +01:00
Samuel Berthe
2b239736cf
feat: add alerting rules for prometheus/memcached_exporter (#512) 2026-03-16 03:25:38 +01:00
Samuel Berthe
f97f692596
feat: add Proxmox VE alerting rules (prometheus-pve-exporter) (#509)
Add 9 alerting rules for Proxmox VE covering node/guest status,
CPU, memory, storage, backup coverage, replication, and cluster quorum.
2026-03-16 03:12:06 +01:00
Samuel Berthe
be7a2e4d5d
feat: add IPMI exporter alerting rules (#510)
* feat: add IPMI exporter alerting rules

Add 17 alerting rules for prometheus-community/ipmi_exporter covering
temperature, fan, voltage, current, power sensors, chassis status,
and system event log monitoring.

* docs: add IPMI to README service list

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-16 03:10:10 +01:00
Samuel Berthe
c064d2264e
feat: add Envoy proxy alerting rules using built-in metrics (#511)
Add 19 alerting rules for Envoy proxy under "Reverse proxies and load
balancers" using native metrics from /stats/prometheus endpoint.

Covers: server health, HTTP error rates (downstream/upstream), connection
saturation, cluster membership, health checks, outlier detection,
SSL/TLS certificate expiry, circuit breakers, and request timeouts.
2026-03-16 03:03:57 +01:00
Samuel Berthe
89e703d763
feat: add alerting rules for cloudflare/ebpf_exporter (#508)
* feat: add alerting rules for cloudflare/ebpf_exporter

* docs: add eBPF to README service list
2026-03-16 02:56:04 +01:00
Samuel Berthe
3db9281508
feat: add SNMP exporter alerting rules (#507)
Add 7 alerting rules for prometheus/snmp_exporter covering device
availability, interface status, error rates, bandwidth utilization,
and device restarts. Rules use standard IF-MIB and SNMPv2-MIB metrics.
2026-03-16 02:34:34 +01:00
Samuel Berthe
8f462ce962 adding claude.md 2026-03-15 19:59:01 +01:00
Samuel Berthe
080a792777
data: adding python/ruby/golang (#502)
* data: adding python/ruby/golang

* fix: address review feedback on runtime alerts

- JVM non-heap: guard against unbounded metaspace (max_bytes = -1)
- JVM old gen GC: note regex only matches CMS/G1/Parallel collectors
- JVM/Python file descriptors: note process_* metrics are generic
- Go memory usage: fix description (sys_bytes is runtime memory, not host)
- Go goroutine spike: use deriv() instead of rate() on gauge
- Go GC CPU fraction: note deprecation since Go 1.20
- Go GC duration: clarify quantile="1" is max, not p99
- Python uncollectable: use increase() on counter instead of raw threshold
- Add threshold comments for workload-dependent defaults
2026-03-15 19:46:39 +01:00
Samuel Berthe
f0107caf9e
Update README.md 2026-01-15 12:33:35 +01:00
Samuel Berthe
65551ae19f
Update README.md 2026-01-15 02:42:42 +01:00
Samuel Berthe
2b5c8b0ec7
Update README.md 2026-01-15 02:39:24 +01:00
Samuel Berthe
d0d1b00a7b
Fix typo in OpenTelemetry Collector link 2025-11-05 17:15:10 +01:00
Samuel Berthe
e617c07179
Update README.md 2025-11-05 17:14:47 +01:00
Samuel Berthe
dfac84209d
Update README.md 2025-09-01 15:41:07 +02:00
Samuel Berthe
4be87d7796
Update README.md 2025-05-03 22:53:51 +02:00
Felix Bühler
10d00c66da
Add caddy.yml (#450) 2025-02-04 14:23:14 +01:00
Samuel Berthe
fff8a80ae5
Update README.md 2024-12-08 21:24:45 +01:00
Samuel Berthe
b6a6c2e313
Update README.md 2024-07-02 09:33:01 +02:00
Samuel Berthe
847143ecc9
Update README.md 2024-05-13 10:42:04 +02:00
Samuel Berthe
85b102df08
Welcome @betterstack-community ✌️ 2024-03-21 16:25:24 +01:00
Samuel Berthe
854688d17a
Update README.md 2024-02-09 20:24:10 +01:00
josedev-union
c6ff5a59dc
feat: Add rules for Graph Node (#387)
Co-authored-by: josedev-union <josedev-union@users.noreply.github.com>
2024-01-20 20:33:26 +01:00
Samuel Berthe
32a097836a
Update README.md 2023-10-06 18:48:38 +02:00
Samuel Berthe
b19b403862
Update README.md 2023-08-15 20:05:13 +02:00
Samuel Berthe
5b6a86fa00
Update README.md 2023-08-15 20:03:06 +02:00
Samuel Berthe
ab7e29cfc0
Update README.md 2023-08-15 20:01:45 +02:00
Samuel Berthe
9efec14d26
chore: move from "https://awesome-prometheus-alerts.grep.to" to "https://samber.github.io/awesome-prometheus-alerts/" 2023-04-23 23:32:26 +02:00
Samuel Berthe
6ba9eb104c
feat: adding cloudflare exporter (#310) 2022-10-03 16:57:24 +02:00
Yonah Dissen
55b049eb28
add argocd rules (#309)
* add argocd rules

* fix(argocd): move contrib into _data/rules.yml instead of dist/...

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2022-10-02 18:05:30 +02:00
Samuel Berthe
4662cd2812
doc: improve pulsar doc 2022-06-07 01:29:31 +02:00
Samuel Berthe
37722256d5
Adding jenkins 2021-12-27 12:49:32 +01:00
Samuel Berthe
3ff969670d
Update README.md 2021-11-21 18:54:56 +01:00
Andre Martins
36ca52e598
adding alerts to promtail and loki (#241)
Co-authored-by: apmbktf <andre.pasqualinoto-martins@itau-unibanco.com.br>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-10-03 22:12:59 +02:00
Igor Churmeev
3612c9cc3e
Add alerts for Hashicorp Vault (#238)
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-08-19 21:19:43 +02:00
Gjed
c2b8178304
Loki alerts (#218)
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-07-04 23:59:46 +02:00
Samuel Berthe
b9f09e7f93
fix(freeswitch): move to the networking section 2021-05-01 18:53:04 +02:00
Samuel Berthe
0ee7f1266f
minor improvements for ssl exporter 2021-01-20 18:09:36 +01:00
Samuel Berthe
f7c25e648c
data: adding netdata 2021-01-08 23:26:57 +01:00
Samuel Berthe
549980fd68
adding vmware link to readme 2021-01-08 21:07:09 +01:00
Samuel Berthe
778e101030
adding alerts for Ceph 2020-03-17 18:50:36 +01:00
Samuel Berthe
0b89a764ee
Adding exporters: sidekiq, pgbouncer and thanos.
Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!
2020-03-09 21:18:56 +01:00
Samuel Berthe
8f515ceae2
Improves repo intro 2020-03-08 19:23:28 +01:00
Samuel Berthe
b5469f2a59
Doc: organizing sections 2020-03-08 17:39:49 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy 2020-03-08 16:10:33 +01:00
Samuel Berthe
c4d35090eb
Improves readme and contributing guidelines 2020-03-08 15:19:48 +01:00
Samuel Berthe
90a9a08b7c
Improves readme and contributing guidelines 2020-03-08 15:17:55 +01:00
Samuel Berthe
d19171f5c6
doc: adding disclamer about alert thresholds 2020-03-07 20:06:11 +01:00