awesome-prometheus-alerts

mirror of https://github.com/samber/awesome-prometheus-alerts.git synced 2026-06-22 01:17:19 +08:00

Author	SHA1	Message	Date
Samuel Berthe	ca4fb01c6d	Update rules.yml	2024-06-14 20:15:44 +02:00
Samuel Berthe	1e4ea0b3e7	Update rules.yml	2024-06-06 22:53:29 +02:00
Samuel Berthe	9b0ac7d230	Update rules.yml	2024-05-23 14:44:45 +02:00
Samuel Berthe	1adecd9ee7	Update rules.yml	2024-05-15 08:08:58 +02:00
Enes Yalınkaya	9877561b6c	fix elasticsearch rate rules (#418 ) * fix elasticsearch rate rules * fix * fix * fix	2024-05-15 08:07:55 +02:00
R.Sicart	262e451625	kube hpa lint and improvement (#417 ) * fix: hpa alerts are using label but the queries remove it Signed-off-by: R.Sicart <roger.sicart@gmail.com> * fix: hpa alert is using label but the query removes it Signed-off-by: R.Sicart <roger.sicart@gmail.com> * feat: hpa scale max should not alert when min and max are the same Signed-off-by: R.Sicart <roger.sicart@gmail.com> --------- Signed-off-by: R.Sicart <roger.sicart@gmail.com>	2024-05-14 20:43:00 +02:00
R.Sicart	8460f9008e	fix: some kube api alert lint (#416 ) * fix: apiserver regexp matchers are automatically fully anchored Signed-off-by: R.Sicart <roger.sicart@gmail.com> * fix: apiserver errors alert is using label but the query removes it Signed-off-by: R.Sicart <roger.sicart@gmail.com> * fix: apiserver latency alert is using label but the query removes it Signed-off-by: R.Sicart <roger.sicart@gmail.com> --------- Signed-off-by: R.Sicart <roger.sicart@gmail.com>	2024-05-14 20:34:43 +02:00
Florian Schlichting	396083a2a1	Fix HaproxyBackendMaxActiveSession: look at current / limit (#413 ) haproxy_backend_max_sessions is the maximum number of sessions ever encountered during the lifetime of the HAProxy process. That is, it will never go down until HAProxy is restarted, so the alert continues to fire even though the situation has cleared! This doesn't make sense. Look at the currently active sessions instead.	2024-05-13 12:09:04 +02:00
Vijay Dharap	870bbd47d2	Fixed HPA rule to use more correct condition (#408 ) * Fixed HPA rule to use more correct condition * Update rules.yml --------- Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2024-05-13 11:10:55 +02:00
Ali	2547288c13	Added Clickhouse (#412 ) * Added Clickhouse * Update rules.yml Added reasonable time periods for each query to avoid false positives and in some cased give the system a short window to try to solve the issue. Also changed the severity level of authentication alerts from critical to info which seems more appropriate * Modified time period for alerts embedded-exporter.yml I made a few adjustments in time periods. See if they seem reasonable or not * Replication alerts time periods were adjusted IMHO, replication alerts must be sent right away.	2024-05-13 10:32:18 +02:00
enesyalinkaya	59e6a9165d	add new alerts for elasticsearch rules.yml (#411 ) This commit adds new Prometheus alert definitions to monitor indexing and query metrics in Elasticsearch clusters. These alerts are essential for detecting performance issues related to indexing and querying activities.	2024-05-06 01:32:00 +02:00
Sergey Shtoltz	aad1c4cd95	RedisOutOfConfiguredMaxmemory: checking if memory limit is set (#410 )	2024-05-02 20:48:46 +02:00
Samuel Berthe	267c3e8e70	Update rules.yml	2024-04-29 22:35:43 +02:00
Rastislav Pôbiš	2494ccdf31	Added prepared statements mysqld-exporter alert (#407 )	2024-03-26 16:56:15 +01:00
Samuel Berthe	1eb5c5834f	Update rules.yml	2024-03-11 23:28:06 +01:00
Samuel Berthe	90706282ad	Update rules.yml	2024-03-11 22:55:05 +01:00
Samuel Berthe	05c4716c2b	Fix KubernetesAPIserverlatency	2024-02-12 09:41:03 +01:00
Samuel Berthe	f5f6b338a3	fix: high/low cpu alert	2024-02-10 23:24:10 +01:00
Samuel Berthe	937cd35df7	💄	2024-02-10 20:04:17 +01:00
Samuel Berthe	5f57f09db0	fix(HostOutOfInodes): exclude msdosfs FS See #398	2024-02-10 20:01:19 +01:00
Marek Červenka	4eb0e910e7	SMART monitoring (#402 ) * SMART monitoring * query regex fix --------- Co-authored-by: Marek Cervenka <cervenka@ipex.cz>	2024-02-09 20:23:30 +01:00
Samuel Berthe	0727f2ef2e	Update rules.yml	2024-01-26 04:10:22 +01:00
josedev-union	c6ff5a59dc	feat: Add rules for Graph Node (#387 ) Co-authored-by: josedev-union <josedev-union@users.noreply.github.com>	2024-01-20 20:33:26 +01:00
michaelact	7fa11bf6cc	Add simple and meaningful `kube-state-metrics` alert summary (#394 ) * feat: add 'summary' to be overriden from rules.yml * chore: add simple and meaningful summary for kubernetes alerts	2023-12-01 18:25:11 +01:00
Samuel Berthe	a4de5323ad	Update rules.yml	2023-11-26 02:18:16 +01:00
Samuel Berthe	76de11d71b	Update rules.yml	2023-10-24 15:03:51 +02:00
Pierre Riteau	cbf7046afa	Fix capitalisation of RabbitMQ (#392 )	2023-10-13 17:09:10 +02:00
Vicky Wilson Jacob	7a8f883df6	feat: adding hadoop jmx exporter (#391 ) * adding hadoop exporter * added hadoop rules with jmx exporter * added hadoop rules with jmx exporter * Update rules.yml --------- Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2023-10-06 18:48:54 +02:00
Samuel Berthe	bacb433089	Update rules.yml	2023-09-18 20:14:57 +02:00
Samuel Berthe	053cde27e4	Update rules.yml	2023-08-22 15:51:53 +02:00
Pavel Timofeev	6b1685261d	Rework kube-state-metrics alerts (#381 ) * Rework kube-state-metrics alerts: - provide meaningful labels in summary as 'instance' label hardly makes sense in most of them - rename some alerts to tell more accurate what the problem is - adjust description trying to follow some kind of the message schema found in other alerts * move changes to _data/rules.yml * Update rules.yml --------- Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2023-08-20 00:39:22 +02:00
Samuel Berthe	c3d78786e8	fix ci	2023-08-15 20:27:13 +02:00
Roman Pertl	ecd92399d5	feat: adding patroni alert rules (#369 )	2023-08-15 19:54:15 +02:00
fzyzcjy	13e90b3aea	Update rules.yml (#371 )	2023-08-15 19:42:46 +02:00
Ted Hahn	94b9f3cfbb	Fix for Postgres max connections. Postgres does not limit connections by database, but total over the server. Additionally, alert labels didn't match across the pair. Using a min by on the right side deals with the possibility additional labels are present on your exporter. (#376 )	2023-08-15 19:39:41 +02:00
Samuel Berthe	15e3131547	Update rules.yml	2023-08-15 19:36:22 +02:00
Samuel Berthe	eb3220c8d7	Update rules.yml	2023-08-15 19:34:14 +02:00
Ivan Dudin	86e3e38a99	fix typo (#377 )	2023-08-07 19:43:10 +02:00
Samuel Berthe	ff76ceccde	Update rules.yml	2023-07-30 22:24:31 +02:00
Moritz	fe5f78171a	update rules.yml (#374 )	2023-07-30 22:21:20 +02:00
Samuel Berthe	8c811045e5	Update rules.yml	2023-07-29 18:20:58 +02:00
Samuel Berthe	32cf16a53d	Update rules.yml	2023-07-12 14:32:43 +02:00
Samuel Berthe	1bb6c602f7	Update rules.yml	2023-07-06 13:54:31 +02:00
Samuel Berthe	5d254811b4	Update rules.yml	2023-06-27 00:28:31 +02:00
Samuel Berthe	47b7748618	Update rules.yml	2023-06-22 18:40:33 +02:00
Samuel Berthe	3d0c5fcafd	Update rules.yml	2023-06-22 18:29:21 +02:00
Samuel Berthe	600a759344	Update rules.yml	2023-06-22 15:01:06 +02:00
Samuel Berthe	ee86c2d233	Update rules.yml	2023-06-22 15:00:40 +02:00
michaelact	7e8bc1a215	Add under-utilized container alerts (#322 ) * chore: add container under-utilized allerts * chore: resolve duplicated query and description	2023-05-21 22:58:04 +02:00
Paul-Élie Testud	c36014f03e	fix(nginx): fix nginx query for histogram_percentile (#351 )	2023-04-28 16:06:12 +02:00
deimosOmegaChan	b98b2a2777	fix node-exporter nodename regex expression (#349 ) nodename should not depends with the prefix "hostname"	2023-04-25 10:58:52 +02:00
Samuel Berthe	9efec14d26	chore: move from "https://awesome-prometheus-alerts.grep.to " to "https://samber.github.io/awesome-prometheus-alerts/"	2023-04-23 23:32:26 +02:00
Madhu Sudhan	8b9fc8864f	refactor: node-exporter queries to include hostname as label which will be helpful for alerting (#348 )	2023-04-23 22:16:08 +02:00
Mikael Lindström	8357165cfb	Update MongoDB replication lag alert to use seconds (#344 ) The mongodb_rs_members_optimeDate metric is in milliseconds, the replication lag query has been updated to reflect this.	2023-04-07 01:42:25 +02:00
Mikael Lindström	2617aa5dab	Fix MongoDB replication headroom query (#342 ) The query was changed to use `mongodb_oplog_stats_start` and `mongodb_oplog_stats_end` in #291 but these metrics does not represent the start and end of the oplog. The original head and tail metrics are calculated from the oplog and are consistent with the output of `db.getReplicationInfo()`.	2023-04-03 10:01:25 +02:00
Samuel Berthe	f9b43cf3bf	Update rules.yml	2023-03-24 14:36:52 +01:00
Kratik Jain	aa2988693b	Adding more rules for Thanos Monitoring (#340 ) * Adding more rules for Thanos Components Monitoring * lint * lint * lint --------- Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2023-03-15 18:26:24 +01:00
Samuel Berthe	59891728e4	Solves #336	2023-02-26 02:33:50 +01:00
Samuel Berthe	60cb26681f	Update rules.yml	2023-02-23 15:19:36 +01:00
Samuel Berthe	bde83bc9ee	Update rules.yml	2023-02-17 01:14:19 +01:00
alexandrumarian-portal	1e44e348ee	Hashicorp Vault cluster health (#338 ) * Hashicorp Vault cluster health	2023-02-17 01:13:41 +01:00
Samuel Berthe	65a0f969be	Update rules.yml	2023-02-14 14:02:35 +01:00
Yannick Markus	7aeccf2874	Add APC UPS & ZFS exporter (#331 ) * add apcupsd_exporter rules * add zfs_exporter rules	2023-02-12 20:01:26 +01:00
Jan Gosmann	df6d71bad5	Make ElasticsearchNoNewDocuments alert more robust (#334 ) Use `elasticsearch_indices_indexing_index_total` instead of `elasticsearch_indices_docs` because `elasticsearch_indices_docs` might not update without an index refresh [1]. Refreshes happen every second by default, but only if there have been search requests within the last 30 seconds [2]. If there are no search requests for a sufficiently long duration, the alert based on `elasticsearch_indices_docs` will fire mistakenly. Apart from that, `elasticsearch_indices_docs` has the gauge metric type (while `elasticsearch_indices_indexing_index_total` is of the counter type) and the `increase` function is not intended to be used with gauges. Drops in the document count would be treated as a reset to 0, thus showing an increase by all remaining documents. [1]: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html#index-stats-api-path-params [2]: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html	2023-01-30 17:06:40 +01:00
Samuel Berthe	5e84329360	Update rules.yml	2023-01-16 00:37:38 +01:00
Sören König	40478c50cc	Add under-utilized HPA alert (#330 ) This alert should inform when HPAs are scaled more than half the time at their minReplicas, which is an indication of possible cost savings. In addition, it is assumed that a minimum number of replicas should still be running for redundancy.	2023-01-16 00:36:59 +01:00
Samuel Berthe	160d0adcc2	Update rules.yml	2023-01-13 18:35:37 +01:00
Panos Rontogiannis	8f48bbfb25	Cert rules issues (#329 ) * add comment for BlackboxSslCertificateExpired rule * use last_over_time to make certificate rules less prone to flapping * add lower bound thresholds on BlackboxSslCertificateWillExpireSoon rules to avoid overlap * changed upper bound threshold for BlackboxSslCertificateWillExpireSoon to 20 days * make BlackboxSslCertificateWillExpireSoon description clearer * use days in certificate rules queries to improve notification values Co-authored-by: Panos Rontogiannis <pronto@admin.grnet.gr>	2023-01-06 11:27:46 +01:00
Samuel Berthe	032eb896f5	rearrange	2022-12-06 10:37:09 +01:00
michaelact	447bb94c4d	Add under-utilized host and hardware alerts (#320 ) * chore: add under-utilized alerts * docs: add under-utilized alerts * chore: add alert consideration times * chore: delete generated alert rules file * chore: not using for, instead in rule	2022-12-06 10:26:50 +01:00
Samuel Berthe	c00dd87733	fix kube rule	2022-12-04 23:12:35 +01:00
Samuel Berthe	a381fb5e22	Merge branch 'master' of github.com:samber/awesome-prometheus-alerts	2022-12-04 23:12:05 +01:00
Samuel Berthe	a0c32093cb	oops	2022-12-04 23:12:00 +01:00
MatthieuFin	a5f32a0fab	fix(rule): fixing KubernetesPodNotHealthy (#215 #253 ) (#263 )	2022-12-04 23:08:24 +01:00
michaelact	4466a07962	fix: add space for labels KubernetesJobFailed alert rule (#321 ) Co-authored-by: xb4dc0d3	2022-11-30 12:28:23 +01:00
Samuel Berthe	1b25cbe568	See #323	2022-11-30 12:26:36 +01:00
Samuel Berthe	5956d28148	data: fix haproxy rule #319	2022-11-15 09:47:34 +01:00
Samuel Berthe	f484d30d66	data: fix haproxy rule #319	2022-11-11 14:46:56 +01:00
Valery Voronov	1e46eacbe7	fix: added NodeNetworkUnavailable alerts, rm unused OOD alert (#318 )	2022-10-31 15:47:27 +01:00
Nicolai Antiferov	9419e3fe7e	fix: Update elasticsearch_exporter repository (#317 ) Was migrated some time ago to https://github.com/prometheus-community/elasticsearch_exporter Fix #316	2022-10-31 10:10:46 +01:00
Samuel Berthe	cdf4551ab7	Merge branch 'master' of github.com:samber/awesome-prometheus-alerts	2022-10-24 16:55:36 +02:00
Samuel Berthe	19c4223ce7	fix(minio): update queries	2022-10-24 16:54:38 +02:00
meoww-bot	98d8a7b53b	fix: check inodes space for all mountpoints (#315 )	2022-10-24 13:47:12 +02:00
Samuel Berthe	6ba9eb104c	feat: adding cloudflare exporter (#310 )	2022-10-03 16:57:24 +02:00
Yonah Dissen	55b049eb28	add argocd rules (#309 ) * add argocd rules * fix(argocd): move contrib into _data/rules.yml instead of dist/... Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2022-10-02 18:05:30 +02:00
meoww-bot	86d5efe399	Fix broken link (#305 )	2022-08-30 09:51:07 +02:00
Samuel Berthe	40c0ff32f0	oops	2022-08-28 17:47:17 +02:00
Brett	0887515f98	Added query for node warmup before reporing it's down (#304 ) Co-authored-by: Brett Yoakum <yoakum@adobe.com>	2022-08-28 16:31:15 +02:00
Samuel Berthe	b49a49c920	Update rules.yml	2022-08-16 20:17:46 +02:00
Samuel Berthe	250a71e95a	fix(postgresql): remove broken rules	2022-08-01 22:43:30 +02:00
Samuel Berthe	d8f7ecd5b4	adding zpool alert	2022-07-24 01:56:17 +02:00
Samuel Berthe	34081e4f43	fix #292	2022-07-24 00:42:21 +02:00
Samuel Berthe	9bbb65ffe1	Update rules.yml	2022-07-24 00:20:54 +02:00
Samuel Berthe	67266bbca6	Merge branch 'master' of github.com:samber/awesome-prometheus-alerts	2022-07-06 12:50:02 +02:00
Samuel Berthe	95af2b4d95	fix: fix quantile query	2022-07-06 12:49:49 +02:00
Pooya	03fdabbfc5	Changed metric names to match new metric names. (#291 ) * Changed alert names to match new alert names. * Added MongodbReplicaMemberHealth to check health of replica members health which is added in new metrics Co-authored-by: Pooya Dowlatabadi <pooya.dowlatabadi@arvancloud.com> Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>	2022-06-27 17:29:07 +02:00
Samuel Berthe	4201302285	Update rules.yml	2022-06-23 22:29:21 +02:00
Samuel Berthe	9bbe04799f	feat: build and publish into dist/rules	2022-06-15 01:42:18 +02:00
Samuel Berthe	cbc20228e2	fix #226	2022-06-14 22:12:00 +02:00
Samuel Berthe	10b810fd6e	fix #276	2022-06-14 22:03:34 +02:00

1 2 3 4 5 ...

451 commits