Commit graph

153 commits

Author SHA1 Message Date
samber
cea78d7fd6 Publish 2025-11-05 16:08:52 +00:00
samber
4acbddb21a Publish 2025-11-05 16:04:56 +00:00
Samuel Berthe
6e2db98590
feat: add support for exporter-level comments (#481) 2025-11-05 17:04:30 +01:00
samber
ae8cfb0366 Publish 2025-10-13 12:24:59 +00:00
samber
606d6fc592 Publish 2025-09-15 13:04:10 +00:00
samber
b158ebb551 Publish 2025-09-14 17:22:29 +00:00
samber
5fbce5f513 Publish 2025-09-01 13:41:06 +00:00
Sajjad hassanzadeh
a2c31358d1
Add couchdb alerts (#472)
* add : additional essential clickhouse alerts

* Add new ClickHouse alert rules for monitoring

* linting

* add : couchdb roles config in rules.yml

* add : couchdb alerts in rules directory

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-09-01 15:40:42 +02:00
samber
3abc7144aa Publish 2025-08-28 21:07:00 +00:00
Sajjad hassanzadeh
7bced89d2d
add : additional essential clickhouse alerts (#471)
* add : additional essential clickhouse alerts

* Add new ClickHouse alert rules for monitoring

* linting

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-08-28 23:06:31 +02:00
samber
b04b11ce1d Publish 2025-06-25 11:32:39 +00:00
samber
ea63d8001a Publish 2025-06-17 17:16:15 +00:00
samber
6ebe6d8a8e Publish 2025-06-17 15:07:35 +00:00
samber
a3325114ea Publish 2025-05-21 21:04:42 +00:00
samber
67cf6892a4 Publish 2025-05-20 06:21:45 +00:00
jaqxues
98d6e7db05
Alloy: Fix incorrect alert (#464) 2025-05-20 08:21:14 +02:00
samber
becbe1be3b Publish 2025-05-08 17:49:45 +00:00
samber
fd9da90c1d Publish 2025-05-03 20:52:49 +00:00
samber
9f5c641bdd Publish 2025-04-23 08:31:10 +00:00
samber
aca1bdf1fb Publish 2025-04-23 08:28:06 +00:00
samber
198035eaf4 Publish 2025-04-23 07:58:55 +00:00
samber
a75d5124c5 Publish 2025-04-17 15:26:25 +00:00
samber
32a4bfb19b Publish 2025-03-27 16:23:49 +00:00
samber
93f9daecee Publish 2025-03-27 13:42:51 +00:00
Motte
69c8208e3c
Added PostgresqlReplicationLagHigh rule (#456)
* Added PostgresqlReplicationLagHigh rule

* Update PostgreSQL replication lag alert settings

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-03-27 14:42:22 +01:00
Pigueiras
97a31f34e5
Fix queries in elasticsearch latency alerts (#455)
The `elasticsearch_indices_search_fetch_total`,
`elasticsearch_indices_search_fetch_time_seconds`,
`elasticsearch_indices_indexing_index_time_seconds_total`
and `elasticsearch_indices_indexing_index_total` metrics
are counters.

Dividing these metrics doesn't make sense because a spike in
numerator would cause the alert to persist, even if subsequent
fetch/index operations are normal. Adding `increase` changes the query
to check if operations took, on average, more than X over
a 1-minute interval, which was likely the original intent of
this alert.
2025-03-26 22:15:24 +01:00
samber
7bcae33011 Publish 2025-02-20 15:18:08 +00:00
samber
9963b750ac Publish 2025-02-20 14:06:17 +00:00
samber
807db03d0d Publish 2025-02-19 14:25:58 +00:00
samber
4e49e77d29 Publish 2025-02-16 22:47:17 +00:00
dzaczek
11a78f0f06
Update google-cadvisor.yml (#382)
* Update google-cadvisor.yml

    Expression Explanation:
    The expression calculates the absolute change in CPU usage for containers by comparing the current rate of CPU usage (within the last 1 minute) with the rate of CPU usage from the previous minute. If this change exceeds 25%, the alert is triggered. Additionally, it compares the current rate of CPU usage with the rate from the previous 5 minutes to capture larger trends. If any of these conditions are met, the alert fires.
    
    Alert Details:
    - Alert Name: ContainerHighLowChangeCpuUsage
    - Trigger Condition: Absolute change in CPU usage exceeding 25%
    - Alert Severity: Informational (info)

* Add alert rule for high CPU usage change

* Change alert severity from warning to info

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:46:53 +01:00
samber
7889a9a29b Publish 2025-02-16 22:37:09 +00:00
samber
12b8acb1b8 Publish 2025-02-16 22:29:24 +00:00
asdf1234
4a7b9b5c72
Update mysqld-exporter.yml (#442)
* Update mysqld-exporter.yml

add some rules

* Add new MySQL monitoring rules

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:29:00 +01:00
samber
20f9a36615 Publish 2025-02-16 22:17:02 +00:00
Felix Bühler
10d00c66da
Add caddy.yml (#450) 2025-02-04 14:23:14 +01:00
guruevi
70ac7d9cae
Various updates and quality of life changes (#405)
* smartctl_exporter publishes both drive_trip and current drive temperatures. Since most of the alerts are going to be permanent, it does not make sense to wait for the alert to be on for a certain time. Temperature sensors likewise vary, using the last sample is not sufficient to alert on potential issues.

* Add an option to run GitHub Action manually

* Add an option to force running the action for testing purposes

* Set variables correctly

* Set variables correctly

* Publish

* Clean up some more metrics

* Publish

* Minor bug fixes

* Publish

* Removed queries that throw errors when systems are upgraded. Also fixed and simplified a few Postgres queries.

* Publish

* Refined some more queries

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* Publish

* Query fails if instance names are not unique across jobs. This fixes it.

* Publish

* Ruby is out of date

---------

Co-authored-by: samber <samber@users.noreply.github.com>
2025-01-28 06:06:47 +01:00
sunlei
cbb2337438
fix: formatting errors (#448)
* fix: formatting errors

* Update query format in rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-01-12 22:01:21 +01:00
samber
53a369769d Publish 2024-12-16 11:19:08 +00:00
samber
4533f23b79 Publish 2024-12-16 11:17:17 +00:00
dxrayz
52d4a8c744
Update postgres-exporter.yml (#444)
Modify PostgresqlConfigurationChanged for prevent error: "many-to-many matching not allowed: matching labels must be unique on one side" in cases when you have multiple instances of postgres
2024-12-16 12:16:05 +01:00
samber
c5203e94d0 Publish 2024-12-08 20:29:15 +00:00
samber
4e38ae2087 Publish 2024-12-05 22:38:38 +00:00
samber
8a220b1b8a Publish 2024-11-30 09:31:05 +00:00
Martin Anderson
353ef1ed95
RabbitMQ: add too many ready messages alert (#441)
* RabbitMQ: add too many ready messages alert

* Add RabbitMQ ready messages alert rule

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2024-11-30 10:29:57 +01:00
samber
14949721ba Publish 2024-10-28 21:25:18 +00:00
samber
4aa45dee05 Publish 2024-08-28 06:49:52 +00:00
Somrat Dutta
8c0bdc2b24
feat: Add NATS and JetStream Prometheus alert rules (#430)
* feat: Add comprehensive NATS and JetStream Prometheus alert rules

- Added multiple Prometheus alert rules for monitoring NATS server and JetStream metrics.
- Included alerts for:
  - High connection count
  - High pending bytes
  - High subscriptions count
  - High routes count
  - High memory usage
  - Slow consumers
  - NATS server downtime
  - High CPU usage
  - High number of active connections
  - High JetStream store and memory usage
  - Subscription limits exceeded
  - High pending messages
  - Authentication timeouts
  - Errors in NATS (JetStream API errors)
  - JetStream consumers limit exceeded
  - Exceeding max payload size
  - Leaf node connection issues
  - Ping operations limit exceeded
  - Write deadline exceeded
- Ensured consistency between `exporter.yml` and `rules.yml` files.
- Improved overall NATS and JetStream monitoring to prevent performance degradation and ensure system reliability.

This commit enhances the visibility of NATS and JetStream operations by providing key metrics to alert on potential issues and optimize system performance.

* Update rules.yml

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* fix indentation

---------

Co-authored-by: somratdutta <duttasomratand.com>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
Co-authored-by: somrat.dutta <somrat.dutta@nutanix.com>
2024-08-20 20:37:03 +02:00
samber
02687db33d Publish 2024-08-20 16:32:36 +00:00
samber
58ade95b8b Publish 2024-07-02 07:34:59 +00:00