awesome-prometheus-alerts

mirror of https://github.com/samber/awesome-prometheus-alerts.git synced 2026-06-22 01:17:19 +08:00

Author	SHA1	Message	Date
Samuel Berthe	8b443be6d2	feat: add systemd_exporter alerting rules (7 rules) (#522 ) * feat: add systemd_exporter alerting rules (7 rules) Add new Systemd service under Basic resource monitoring with rules for: - Unit failed/inactive state detection - Service crash loop detection - Task limit exhaustion - Socket refused/high connections - Timer missed trigger * fix: narrow systemd unit inactive query to reduce noise Add type="service" and name filter to the inactive unit alert to avoid false positives from legitimately inactive units.	2026-03-16 14:07:14 +01:00
Samuel Berthe	30bbedbc79	feat: add Cloud providers alerting rules (33 rules across 4 exporters) (#519 ) * feat: add Cloud providers alerting rules (33 rules across 4 exporters) New "Cloud providers" category with rules for: - AWS CloudWatch (13 rules): exporter health + EC2, RDS, SQS, ALB, Lambda - Google Cloud / Stackdriver (5 rules): scrape health, API quotas, staleness - DigitalOcean (10 rules): droplets, databases, k8s, load balancers, incidents - Azure (5 rules): API errors, rate limits, collection performance * fix: address PR review - move Cloud providers before Other, fix service name - Move "Cloud providers" group before "Other" in rules.yml for consistent ordering - Rename "Google Cloud / Stackdriver" to "Google Cloud Stackdriver" to avoid awkward /-/ in generated anchors and dist/rules/ paths - Fix README anchor link to match the new service name	2026-03-16 14:06:59 +01:00
Samuel Berthe	97aae5dabf	feat: add GitLab alerting rules (28 rules across 3 exporters) (#518 ) Add new GitLab service under "Other" category with 3 exporters: - Built-in exporter (18 rules): Puma, HTTP errors/latency, Sidekiq jobs, database connection pool, CI/CD pipelines, Ruby process health - Workhorse (3 rules): HTTP error rate, latency, in-flight requests - Gitaly (7 rules): gRPC errors, ResourceExhausted, RPC latency, CPU throttling, auth failures, circuit breaker All metrics verified against gitlabhq/gitlabhq source code. Several rules derived from GitLab Omnibus default alerting rules.	2026-03-16 04:48:52 +01:00
Samuel Berthe	e6cdcdb9e5	feat: add Apache Flink and Apache Spark alerting rules Add 20 new alerting rules under the Runtimes category: - Apache Flink (12 rules): job status, TaskManager registration, slot availability, restarts, checkpoints, backpressure, heap memory, GC, and record processing - Apache Spark (8 rules): worker health, waiting apps, memory/cores exhaustion, executor GC, task failures, and disk spill	2026-03-16 04:46:00 +01:00
Samuel Berthe	88e2c19017	feat: add Keycloak alerting rules (aerogear/keycloak-metrics-spi) (#517 ) * feat: add Keycloak alerting rules (aerogear/keycloak-metrics-spi) * fix: correct Keycloak metrics-spi metric names and query grouping	2026-03-16 04:40:15 +01:00
Samuel Berthe	20651aa10d	feat: add OpenStack alerting rules (openstack-exporter) (#515 ) * feat: add OpenStack alerting rules (openstack-exporter) Add 20 alerting rules for openstack-exporter/openstack-exporter covering Nova, Neutron, Cinder, Octavia, and Placement services. * docs: add OpenStack to README services list * fix: align OpenStack load balancer alert name with operating_status semantics The operating_status label uses ONLINE/OFFLINE/DEGRADED/ERROR values, not ACTIVE. Rename alert to "not online" and use the label in the description for clarity.	2026-03-16 03:43:51 +01:00
Samuel Berthe	bf7b902881	feat: add process-exporter alerting rules (ncabatoff/process-exporter) (#514 ) * feat: add process-exporter alerting rules (ncabatoff/process-exporter) * docs: add Process to README services list * fix: address PR review feedback for process-exporter rules - Rename service from "Process" to "Process Exporter" for clarity - Fix grammar: "file descriptors usage" → "file descriptor usage" - Clarify CPU alert description as core-equivalent percentage - Rename "high disk IO" to "high disk write IO" for accuracy	2026-03-16 03:31:18 +01:00
Samuel Berthe	2b239736cf	feat: add alerting rules for prometheus/memcached_exporter (#512 )	2026-03-16 03:25:38 +01:00
Samuel Berthe	f97f692596	feat: add Proxmox VE alerting rules (prometheus-pve-exporter) (#509 ) Add 9 alerting rules for Proxmox VE covering node/guest status, CPU, memory, storage, backup coverage, replication, and cluster quorum.	2026-03-16 03:12:06 +01:00
Samuel Berthe	be7a2e4d5d	feat: add IPMI exporter alerting rules (#510 ) * feat: add IPMI exporter alerting rules Add 17 alerting rules for prometheus-community/ipmi_exporter covering temperature, fan, voltage, current, power sensors, chassis status, and system event log monitoring. * docs: add IPMI to README service list * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-16 03:10:10 +01:00
Samuel Berthe	c064d2264e	feat: add Envoy proxy alerting rules using built-in metrics (#511 ) Add 19 alerting rules for Envoy proxy under "Reverse proxies and load balancers" using native metrics from /stats/prometheus endpoint. Covers: server health, HTTP error rates (downstream/upstream), connection saturation, cluster membership, health checks, outlier detection, SSL/TLS certificate expiry, circuit breakers, and request timeouts.	2026-03-16 03:03:57 +01:00
Samuel Berthe	89e703d763	feat: add alerting rules for cloudflare/ebpf_exporter (#508 ) * feat: add alerting rules for cloudflare/ebpf_exporter * docs: add eBPF to README service list	2026-03-16 02:56:04 +01:00
Samuel Berthe	3db9281508	feat: add SNMP exporter alerting rules (#507 ) Add 7 alerting rules for prometheus/snmp_exporter covering device availability, interface status, error rates, bandwidth utilization, and device restarts. Rules use standard IF-MIB and SNMPv2-MIB metrics.	2026-03-16 02:34:34 +01:00
Samuel Berthe	8f462ce962	adding claude.md	2026-03-15 19:59:01 +01:00
Samuel Berthe	080a792777	data: adding python/ruby/golang (#502 ) * data: adding python/ruby/golang * fix: address review feedback on runtime alerts - JVM non-heap: guard against unbounded metaspace (max_bytes = -1) - JVM old gen GC: note regex only matches CMS/G1/Parallel collectors - JVM/Python file descriptors: note process_* metrics are generic - Go memory usage: fix description (sys_bytes is runtime memory, not host) - Go goroutine spike: use deriv() instead of rate() on gauge - Go GC CPU fraction: note deprecation since Go 1.20 - Go GC duration: clarify quantile="1" is max, not p99 - Python uncollectable: use increase() on counter instead of raw threshold - Add threshold comments for workload-dependent defaults	2026-03-15 19:46:39 +01:00
Samuel Berthe	f0107caf9e	Update README.md	2026-01-15 12:33:35 +01:00
Samuel Berthe	65551ae19f	Update README.md	2026-01-15 02:42:42 +01:00
Samuel Berthe	2b5c8b0ec7	Update README.md	2026-01-15 02:39:24 +01:00
Samuel Berthe	d0d1b00a7b	Fix typo in OpenTelemetry Collector link	2025-11-05 17:15:10 +01:00
Samuel Berthe	e617c07179	Update README.md	2025-11-05 17:14:47 +01:00
Samuel Berthe	dfac84209d	Update README.md	2025-09-01 15:41:07 +02:00
Samuel Berthe	4be87d7796	Update README.md	2025-05-03 22:53:51 +02:00
Felix Bühler	10d00c66da	Add caddy.yml (#450 )	2025-02-04 14:23:14 +01:00
Samuel Berthe	fff8a80ae5	Update README.md	2024-12-08 21:24:45 +01:00
Samuel Berthe	b6a6c2e313	Update README.md	2024-07-02 09:33:01 +02:00
Samuel Berthe	847143ecc9	Update README.md	2024-05-13 10:42:04 +02:00
Samuel Berthe	85b102df08	Welcome @betterstack-community ✌️	2024-03-21 16:25:24 +01:00
Samuel Berthe	854688d17a	Update README.md	2024-02-09 20:24:10 +01:00
josedev-union	c6ff5a59dc	feat: Add rules for Graph Node (#387 ) Co-authored-by: josedev-union <josedev-union@users.noreply.github.com>	2024-01-20 20:33:26 +01:00
Samuel Berthe	32a097836a	Update README.md	2023-10-06 18:48:38 +02:00
Samuel Berthe	b19b403862	Update README.md	2023-08-15 20:05:13 +02:00
Samuel Berthe	5b6a86fa00	Update README.md	2023-08-15 20:03:06 +02:00
Samuel Berthe	ab7e29cfc0	Update README.md	2023-08-15 20:01:45 +02:00
Samuel Berthe	9efec14d26	chore: move from "https://awesome-prometheus-alerts.grep.to " to "https://samber.github.io/awesome-prometheus-alerts/"	2023-04-23 23:32:26 +02:00
Samuel Berthe	6ba9eb104c	feat: adding cloudflare exporter (#310 )	2022-10-03 16:57:24 +02:00
Yonah Dissen	55b049eb28	add argocd rules (#309 ) * add argocd rules * fix(argocd): move contrib into _data/rules.yml instead of dist/... Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2022-10-02 18:05:30 +02:00
Samuel Berthe	4662cd2812	doc: improve pulsar doc	2022-06-07 01:29:31 +02:00
Samuel Berthe	37722256d5	Adding jenkins	2021-12-27 12:49:32 +01:00
Samuel Berthe	3ff969670d	Update README.md	2021-11-21 18:54:56 +01:00
Andre Martins	36ca52e598	adding alerts to promtail and loki (#241 ) Co-authored-by: apmbktf <andre.pasqualinoto-martins@itau-unibanco.com.br> Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2021-10-03 22:12:59 +02:00
Igor Churmeev	3612c9cc3e	Add alerts for Hashicorp Vault (#238 ) Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2021-08-19 21:19:43 +02:00
Gjed	c2b8178304	Loki alerts (#218 ) Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2021-07-04 23:59:46 +02:00
Samuel Berthe	b9f09e7f93	fix(freeswitch): move to the networking section	2021-05-01 18:53:04 +02:00
Samuel Berthe	0ee7f1266f	minor improvements for ssl exporter	2021-01-20 18:09:36 +01:00
Samuel Berthe	f7c25e648c	data: adding netdata	2021-01-08 23:26:57 +01:00
Samuel Berthe	549980fd68	adding vmware link to readme	2021-01-08 21:07:09 +01:00
Samuel Berthe	778e101030	adding alerts for Ceph	2020-03-17 18:50:36 +01:00
Samuel Berthe	0b89a764ee	Adding exporters: sidekiq, pgbouncer and thanos. Adding rules to: prometheus, kubernetes, redis, docker and postgresql. Arranging exporters into categories. Showing number of rules. Thanks to Gitlab for opensourcing alerting rules!	2020-03-09 21:18:56 +01:00
Samuel Berthe	8f515ceae2	Improves repo intro	2020-03-08 19:23:28 +01:00
Samuel Berthe	b5469f2a59	Doc: organizing sections	2020-03-08 17:39:49 +01:00

1 2

83 commits