* fix: hpa alerts are using label but the queries remove it
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* fix: hpa alert is using label but the query removes it
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* feat: hpa scale max should not alert when min and max are the same
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
---------
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* fix: apiserver regexp matchers are automatically fully anchored
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* fix: apiserver errors alert is using label but the query removes it
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* fix: apiserver latency alert is using label but the query removes it
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
---------
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
haproxy_backend_max_sessions is the maximum number of sessions ever encountered during the lifetime of the HAProxy process. That is, it will never go down until HAProxy is restarted, so the alert continues to fire even though the situation has cleared!
This doesn't make sense. Look at the currently active sessions instead.
* Added Clickhouse
* Update rules.yml
Added reasonable time periods for each query to avoid false positives and in some cased give the system a short window to try to solve the issue.
Also changed the severity level of authentication alerts from critical to info which seems more appropriate
* Modified time period for alerts embedded-exporter.yml
I made a few adjustments in time periods.
See if they seem reasonable or not
* Replication alerts time periods were adjusted
IMHO, replication alerts must be sent right away.
This commit adds new Prometheus alert definitions to monitor indexing and query metrics in Elasticsearch clusters. These alerts are essential for detecting performance issues related to indexing and querying activities.
* Rework kube-state-metrics alerts:
- provide meaningful labels in summary as 'instance' label hardly makes sense in most of them
- rename some alerts to tell more accurate what the problem is
- adjust description trying to follow some kind of the message schema found in other alerts
* move changes to _data/rules.yml
* Update rules.yml
---------
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>