Commit graph

82 commits

Author SHA1 Message Date
Samuel Berthe
a65c0e43c2 feat: add Grafana Tempo and Grafana Mimir alerting rules (67 rules)
Add 18 Tempo rules and 49 Mimir rules based on official upstream mixins.
Covers ring health, compaction, TSDB, instance limits, ruler, alertmanager, and more.
2026-03-16 14:01:10 +01:00
Samuel Berthe
97aae5dabf
feat: add GitLab alerting rules (28 rules across 3 exporters) (#518)
Add new GitLab service under "Other" category with 3 exporters:
- Built-in exporter (18 rules): Puma, HTTP errors/latency, Sidekiq jobs,
  database connection pool, CI/CD pipelines, Ruby process health
- Workhorse (3 rules): HTTP error rate, latency, in-flight requests
- Gitaly (7 rules): gRPC errors, ResourceExhausted, RPC latency,
  CPU throttling, auth failures, circuit breaker

All metrics verified against gitlabhq/gitlabhq source code.
Several rules derived from GitLab Omnibus default alerting rules.
2026-03-16 04:48:52 +01:00
Samuel Berthe
e6cdcdb9e5 feat: add Apache Flink and Apache Spark alerting rules
Add 20 new alerting rules under the Runtimes category:
- Apache Flink (12 rules): job status, TaskManager registration, slot
  availability, restarts, checkpoints, backpressure, heap memory, GC,
  and record processing
- Apache Spark (8 rules): worker health, waiting apps, memory/cores
  exhaustion, executor GC, task failures, and disk spill
2026-03-16 04:46:00 +01:00
Samuel Berthe
88e2c19017
feat: add Keycloak alerting rules (aerogear/keycloak-metrics-spi) (#517)
* feat: add Keycloak alerting rules (aerogear/keycloak-metrics-spi)

* fix: correct Keycloak metrics-spi metric names and query grouping
2026-03-16 04:40:15 +01:00
Samuel Berthe
20651aa10d
feat: add OpenStack alerting rules (openstack-exporter) (#515)
* feat: add OpenStack alerting rules (openstack-exporter)

Add 20 alerting rules for openstack-exporter/openstack-exporter covering
Nova, Neutron, Cinder, Octavia, and Placement services.

* docs: add OpenStack to README services list

* fix: align OpenStack load balancer alert name with operating_status semantics

The operating_status label uses ONLINE/OFFLINE/DEGRADED/ERROR values,
not ACTIVE. Rename alert to "not online" and use the label in the
description for clarity.
2026-03-16 03:43:51 +01:00
Samuel Berthe
bf7b902881
feat: add process-exporter alerting rules (ncabatoff/process-exporter) (#514)
* feat: add process-exporter alerting rules (ncabatoff/process-exporter)

* docs: add Process to README services list

* fix: address PR review feedback for process-exporter rules

- Rename service from "Process" to "Process Exporter" for clarity
- Fix grammar: "file descriptors usage" → "file descriptor usage"
- Clarify CPU alert description as core-equivalent percentage
- Rename "high disk IO" to "high disk write IO" for accuracy
2026-03-16 03:31:18 +01:00
Samuel Berthe
2b239736cf
feat: add alerting rules for prometheus/memcached_exporter (#512) 2026-03-16 03:25:38 +01:00
Samuel Berthe
f97f692596
feat: add Proxmox VE alerting rules (prometheus-pve-exporter) (#509)
Add 9 alerting rules for Proxmox VE covering node/guest status,
CPU, memory, storage, backup coverage, replication, and cluster quorum.
2026-03-16 03:12:06 +01:00
Samuel Berthe
be7a2e4d5d
feat: add IPMI exporter alerting rules (#510)
* feat: add IPMI exporter alerting rules

Add 17 alerting rules for prometheus-community/ipmi_exporter covering
temperature, fan, voltage, current, power sensors, chassis status,
and system event log monitoring.

* docs: add IPMI to README service list

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-16 03:10:10 +01:00
Samuel Berthe
c064d2264e
feat: add Envoy proxy alerting rules using built-in metrics (#511)
Add 19 alerting rules for Envoy proxy under "Reverse proxies and load
balancers" using native metrics from /stats/prometheus endpoint.

Covers: server health, HTTP error rates (downstream/upstream), connection
saturation, cluster membership, health checks, outlier detection,
SSL/TLS certificate expiry, circuit breakers, and request timeouts.
2026-03-16 03:03:57 +01:00
Samuel Berthe
89e703d763
feat: add alerting rules for cloudflare/ebpf_exporter (#508)
* feat: add alerting rules for cloudflare/ebpf_exporter

* docs: add eBPF to README service list
2026-03-16 02:56:04 +01:00
Samuel Berthe
3db9281508
feat: add SNMP exporter alerting rules (#507)
Add 7 alerting rules for prometheus/snmp_exporter covering device
availability, interface status, error rates, bandwidth utilization,
and device restarts. Rules use standard IF-MIB and SNMPv2-MIB metrics.
2026-03-16 02:34:34 +01:00
Samuel Berthe
8f462ce962 adding claude.md 2026-03-15 19:59:01 +01:00
Samuel Berthe
080a792777
data: adding python/ruby/golang (#502)
* data: adding python/ruby/golang

* fix: address review feedback on runtime alerts

- JVM non-heap: guard against unbounded metaspace (max_bytes = -1)
- JVM old gen GC: note regex only matches CMS/G1/Parallel collectors
- JVM/Python file descriptors: note process_* metrics are generic
- Go memory usage: fix description (sys_bytes is runtime memory, not host)
- Go goroutine spike: use deriv() instead of rate() on gauge
- Go GC CPU fraction: note deprecation since Go 1.20
- Go GC duration: clarify quantile="1" is max, not p99
- Python uncollectable: use increase() on counter instead of raw threshold
- Add threshold comments for workload-dependent defaults
2026-03-15 19:46:39 +01:00
Samuel Berthe
f0107caf9e
Update README.md 2026-01-15 12:33:35 +01:00
Samuel Berthe
65551ae19f
Update README.md 2026-01-15 02:42:42 +01:00
Samuel Berthe
2b5c8b0ec7
Update README.md 2026-01-15 02:39:24 +01:00
Samuel Berthe
d0d1b00a7b
Fix typo in OpenTelemetry Collector link 2025-11-05 17:15:10 +01:00
Samuel Berthe
e617c07179
Update README.md 2025-11-05 17:14:47 +01:00
Samuel Berthe
dfac84209d
Update README.md 2025-09-01 15:41:07 +02:00
Samuel Berthe
4be87d7796
Update README.md 2025-05-03 22:53:51 +02:00
Felix Bühler
10d00c66da
Add caddy.yml (#450) 2025-02-04 14:23:14 +01:00
Samuel Berthe
fff8a80ae5
Update README.md 2024-12-08 21:24:45 +01:00
Samuel Berthe
b6a6c2e313
Update README.md 2024-07-02 09:33:01 +02:00
Samuel Berthe
847143ecc9
Update README.md 2024-05-13 10:42:04 +02:00
Samuel Berthe
85b102df08
Welcome @betterstack-community ✌️ 2024-03-21 16:25:24 +01:00
Samuel Berthe
854688d17a
Update README.md 2024-02-09 20:24:10 +01:00
josedev-union
c6ff5a59dc
feat: Add rules for Graph Node (#387)
Co-authored-by: josedev-union <josedev-union@users.noreply.github.com>
2024-01-20 20:33:26 +01:00
Samuel Berthe
32a097836a
Update README.md 2023-10-06 18:48:38 +02:00
Samuel Berthe
b19b403862
Update README.md 2023-08-15 20:05:13 +02:00
Samuel Berthe
5b6a86fa00
Update README.md 2023-08-15 20:03:06 +02:00
Samuel Berthe
ab7e29cfc0
Update README.md 2023-08-15 20:01:45 +02:00
Samuel Berthe
9efec14d26
chore: move from "https://awesome-prometheus-alerts.grep.to" to "https://samber.github.io/awesome-prometheus-alerts/" 2023-04-23 23:32:26 +02:00
Samuel Berthe
6ba9eb104c
feat: adding cloudflare exporter (#310) 2022-10-03 16:57:24 +02:00
Yonah Dissen
55b049eb28
add argocd rules (#309)
* add argocd rules

* fix(argocd): move contrib into _data/rules.yml instead of dist/...

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2022-10-02 18:05:30 +02:00
Samuel Berthe
4662cd2812
doc: improve pulsar doc 2022-06-07 01:29:31 +02:00
Samuel Berthe
37722256d5
Adding jenkins 2021-12-27 12:49:32 +01:00
Samuel Berthe
3ff969670d
Update README.md 2021-11-21 18:54:56 +01:00
Andre Martins
36ca52e598
adding alerts to promtail and loki (#241)
Co-authored-by: apmbktf <andre.pasqualinoto-martins@itau-unibanco.com.br>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-10-03 22:12:59 +02:00
Igor Churmeev
3612c9cc3e
Add alerts for Hashicorp Vault (#238)
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-08-19 21:19:43 +02:00
Gjed
c2b8178304
Loki alerts (#218)
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-07-04 23:59:46 +02:00
Samuel Berthe
b9f09e7f93
fix(freeswitch): move to the networking section 2021-05-01 18:53:04 +02:00
Samuel Berthe
0ee7f1266f
minor improvements for ssl exporter 2021-01-20 18:09:36 +01:00
Samuel Berthe
f7c25e648c
data: adding netdata 2021-01-08 23:26:57 +01:00
Samuel Berthe
549980fd68
adding vmware link to readme 2021-01-08 21:07:09 +01:00
Samuel Berthe
778e101030
adding alerts for Ceph 2020-03-17 18:50:36 +01:00
Samuel Berthe
0b89a764ee
Adding exporters: sidekiq, pgbouncer and thanos.
Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!
2020-03-09 21:18:56 +01:00
Samuel Berthe
8f515ceae2
Improves repo intro 2020-03-08 19:23:28 +01:00
Samuel Berthe
b5469f2a59
Doc: organizing sections 2020-03-08 17:39:49 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy 2020-03-08 16:10:33 +01:00