Commit graph

400 commits

Author SHA1 Message Date
Samuel Berthe
1e4ea0b3e7
Update rules.yml 2024-06-06 22:53:29 +02:00
Samuel Berthe
9b0ac7d230
Update rules.yml 2024-05-23 14:44:45 +02:00
Samuel Berthe
1adecd9ee7
Update rules.yml 2024-05-15 08:08:58 +02:00
Enes Yalınkaya
9877561b6c
fix elasticsearch rate rules (#418)
* fix elasticsearch rate rules

* fix

* fix

* fix
2024-05-15 08:07:55 +02:00
R.Sicart
262e451625
kube hpa lint and improvement (#417)
* fix: hpa alerts are using  label but the queries remove it

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* fix: hpa alert is using  label but the query removes it

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* feat: hpa scale max should not alert when min and max are the same

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

---------

Signed-off-by: R.Sicart <roger.sicart@gmail.com>
2024-05-14 20:43:00 +02:00
R.Sicart
8460f9008e
fix: some kube api alert lint (#416)
* fix: apiserver regexp matchers are automatically fully anchored

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* fix: apiserver errors alert is using  label but the query removes it

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* fix: apiserver latency alert is using  label but the query removes it

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

---------

Signed-off-by: R.Sicart <roger.sicart@gmail.com>
2024-05-14 20:34:43 +02:00
Florian Schlichting
396083a2a1
Fix HaproxyBackendMaxActiveSession: look at current / limit (#413)
haproxy_backend_max_sessions is the maximum number of sessions ever encountered during the lifetime of the HAProxy process. That is, it will never go down until HAProxy is restarted, so the alert continues to fire even though the situation has cleared!

This doesn't make sense. Look at the currently active sessions instead.
2024-05-13 12:09:04 +02:00
Vijay Dharap
870bbd47d2
Fixed HPA rule to use more correct condition (#408)
* Fixed HPA rule to use more correct condition

* Update rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2024-05-13 11:10:55 +02:00
Ali
2547288c13
Added Clickhouse (#412)
* Added Clickhouse

* Update rules.yml

Added reasonable time periods for each query to avoid false positives and in some cased give the system a short window to try to solve the issue.
Also changed the severity level of authentication alerts from critical to info which seems more appropriate

* Modified time period for alerts embedded-exporter.yml

I made a few adjustments in time periods.
See if they seem reasonable or not

* Replication alerts time periods were adjusted

IMHO, replication alerts must be sent right away.
2024-05-13 10:32:18 +02:00
enesyalinkaya
59e6a9165d
add new alerts for elasticsearch rules.yml (#411)
This commit adds new Prometheus alert definitions to monitor indexing and query metrics in Elasticsearch clusters. These alerts are essential for detecting performance issues related to indexing and querying activities.
2024-05-06 01:32:00 +02:00
Sergey Shtoltz
aad1c4cd95
RedisOutOfConfiguredMaxmemory: checking if memory limit is set (#410) 2024-05-02 20:48:46 +02:00
Samuel Berthe
267c3e8e70
Update rules.yml 2024-04-29 22:35:43 +02:00
Rastislav Pôbiš
2494ccdf31
Added prepared statements mysqld-exporter alert (#407) 2024-03-26 16:56:15 +01:00
Samuel Berthe
1eb5c5834f
Update rules.yml 2024-03-11 23:28:06 +01:00
Samuel Berthe
90706282ad
Update rules.yml 2024-03-11 22:55:05 +01:00
Samuel Berthe
05c4716c2b
Fix KubernetesAPIserverlatency 2024-02-12 09:41:03 +01:00
Samuel Berthe
f5f6b338a3
fix: high/low cpu alert 2024-02-10 23:24:10 +01:00
Samuel Berthe
937cd35df7
💄 2024-02-10 20:04:17 +01:00
Samuel Berthe
5f57f09db0
fix(HostOutOfInodes): exclude msdosfs FS
See #398
2024-02-10 20:01:19 +01:00
Marek Červenka
4eb0e910e7
SMART monitoring (#402)
* SMART monitoring

* query regex fix

---------

Co-authored-by: Marek Cervenka <cervenka@ipex.cz>
2024-02-09 20:23:30 +01:00
Samuel Berthe
0727f2ef2e
Update rules.yml 2024-01-26 04:10:22 +01:00
josedev-union
c6ff5a59dc
feat: Add rules for Graph Node (#387)
Co-authored-by: josedev-union <josedev-union@users.noreply.github.com>
2024-01-20 20:33:26 +01:00
michaelact
7fa11bf6cc
Add simple and meaningful kube-state-metrics alert summary (#394)
* feat: add 'summary' to be overriden from rules.yml

* chore: add simple and meaningful summary for kubernetes alerts
2023-12-01 18:25:11 +01:00
Samuel Berthe
a4de5323ad
Update rules.yml 2023-11-26 02:18:16 +01:00
Samuel Berthe
76de11d71b
Update rules.yml 2023-10-24 15:03:51 +02:00
Pierre Riteau
cbf7046afa
Fix capitalisation of RabbitMQ (#392) 2023-10-13 17:09:10 +02:00
Vicky Wilson Jacob
7a8f883df6
feat: adding hadoop jmx exporter (#391)
* adding hadoop exporter

* added hadoop rules with jmx exporter

* added hadoop rules with jmx exporter

* Update rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2023-10-06 18:48:54 +02:00
Samuel Berthe
bacb433089
Update rules.yml 2023-09-18 20:14:57 +02:00
Samuel Berthe
053cde27e4
Update rules.yml 2023-08-22 15:51:53 +02:00
Pavel Timofeev
6b1685261d
Rework kube-state-metrics alerts (#381)
* Rework kube-state-metrics alerts:
- provide meaningful labels in summary as 'instance' label hardly makes sense in most of them
- rename some alerts to tell more accurate what the problem is
- adjust description trying to follow some kind of the message schema found in other alerts

* move changes to _data/rules.yml

* Update rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2023-08-20 00:39:22 +02:00
Samuel Berthe
c3d78786e8
fix ci 2023-08-15 20:27:13 +02:00
Roman Pertl
ecd92399d5
feat: adding patroni alert rules (#369) 2023-08-15 19:54:15 +02:00
fzyzcjy
13e90b3aea
Update rules.yml (#371) 2023-08-15 19:42:46 +02:00
Ted Hahn
94b9f3cfbb
Fix for Postgres max connections. Postgres does not limit connections by database, but total over the server. Additionally, alert labels didn't match across the pair. Using a min by on the right side deals with the possibility additional labels are present on your exporter. (#376) 2023-08-15 19:39:41 +02:00
Samuel Berthe
15e3131547
Update rules.yml 2023-08-15 19:36:22 +02:00
Samuel Berthe
eb3220c8d7
Update rules.yml 2023-08-15 19:34:14 +02:00
Ivan Dudin
86e3e38a99
fix typo (#377) 2023-08-07 19:43:10 +02:00
Samuel Berthe
ff76ceccde
Update rules.yml 2023-07-30 22:24:31 +02:00
Moritz
fe5f78171a
update rules.yml (#374) 2023-07-30 22:21:20 +02:00
Samuel Berthe
8c811045e5
Update rules.yml 2023-07-29 18:20:58 +02:00
Samuel Berthe
32cf16a53d
Update rules.yml 2023-07-12 14:32:43 +02:00
Samuel Berthe
1bb6c602f7
Update rules.yml 2023-07-06 13:54:31 +02:00
Samuel Berthe
5d254811b4
Update rules.yml 2023-06-27 00:28:31 +02:00
Samuel Berthe
47b7748618
Update rules.yml 2023-06-22 18:40:33 +02:00
Samuel Berthe
3d0c5fcafd
Update rules.yml 2023-06-22 18:29:21 +02:00
Samuel Berthe
600a759344
Update rules.yml 2023-06-22 15:01:06 +02:00
Samuel Berthe
ee86c2d233
Update rules.yml 2023-06-22 15:00:40 +02:00
michaelact
7e8bc1a215
Add under-utilized container alerts (#322)
* chore: add container under-utilized allerts

* chore: resolve duplicated query and description
2023-05-21 22:58:04 +02:00
Paul-Élie Testud
c36014f03e
fix(nginx): fix nginx query for histogram_percentile (#351) 2023-04-28 16:06:12 +02:00
deimosOmegaChan
b98b2a2777
fix node-exporter nodename regex expression (#349)
nodename should not depends with the prefix "hostname"
2023-04-25 10:58:52 +02:00