Commit graph

120 commits

Author SHA1 Message Date
samber
7889a9a29b Publish 2025-02-16 22:37:09 +00:00
samber
12b8acb1b8 Publish 2025-02-16 22:29:24 +00:00
asdf1234
4a7b9b5c72
Update mysqld-exporter.yml (#442)
* Update mysqld-exporter.yml

add some rules

* Add new MySQL monitoring rules

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:29:00 +01:00
samber
20f9a36615 Publish 2025-02-16 22:17:02 +00:00
Felix Bühler
10d00c66da
Add caddy.yml (#450) 2025-02-04 14:23:14 +01:00
guruevi
70ac7d9cae
Various updates and quality of life changes (#405)
* smartctl_exporter publishes both drive_trip and current drive temperatures. Since most of the alerts are going to be permanent, it does not make sense to wait for the alert to be on for a certain time. Temperature sensors likewise vary, using the last sample is not sufficient to alert on potential issues.

* Add an option to run GitHub Action manually

* Add an option to force running the action for testing purposes

* Set variables correctly

* Set variables correctly

* Publish

* Clean up some more metrics

* Publish

* Minor bug fixes

* Publish

* Removed queries that throw errors when systems are upgraded. Also fixed and simplified a few Postgres queries.

* Publish

* Refined some more queries

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* Publish

* Query fails if instance names are not unique across jobs. This fixes it.

* Publish

* Ruby is out of date

---------

Co-authored-by: samber <samber@users.noreply.github.com>
2025-01-28 06:06:47 +01:00
sunlei
cbb2337438
fix: formatting errors (#448)
* fix: formatting errors

* Update query format in rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-01-12 22:01:21 +01:00
samber
53a369769d Publish 2024-12-16 11:19:08 +00:00
samber
4533f23b79 Publish 2024-12-16 11:17:17 +00:00
dxrayz
52d4a8c744
Update postgres-exporter.yml (#444)
Modify PostgresqlConfigurationChanged for prevent error: "many-to-many matching not allowed: matching labels must be unique on one side" in cases when you have multiple instances of postgres
2024-12-16 12:16:05 +01:00
samber
c5203e94d0 Publish 2024-12-08 20:29:15 +00:00
samber
4e38ae2087 Publish 2024-12-05 22:38:38 +00:00
samber
8a220b1b8a Publish 2024-11-30 09:31:05 +00:00
Martin Anderson
353ef1ed95
RabbitMQ: add too many ready messages alert (#441)
* RabbitMQ: add too many ready messages alert

* Add RabbitMQ ready messages alert rule

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2024-11-30 10:29:57 +01:00
samber
14949721ba Publish 2024-10-28 21:25:18 +00:00
samber
4aa45dee05 Publish 2024-08-28 06:49:52 +00:00
Somrat Dutta
8c0bdc2b24
feat: Add NATS and JetStream Prometheus alert rules (#430)
* feat: Add comprehensive NATS and JetStream Prometheus alert rules

- Added multiple Prometheus alert rules for monitoring NATS server and JetStream metrics.
- Included alerts for:
  - High connection count
  - High pending bytes
  - High subscriptions count
  - High routes count
  - High memory usage
  - Slow consumers
  - NATS server downtime
  - High CPU usage
  - High number of active connections
  - High JetStream store and memory usage
  - Subscription limits exceeded
  - High pending messages
  - Authentication timeouts
  - Errors in NATS (JetStream API errors)
  - JetStream consumers limit exceeded
  - Exceeding max payload size
  - Leaf node connection issues
  - Ping operations limit exceeded
  - Write deadline exceeded
- Ensured consistency between `exporter.yml` and `rules.yml` files.
- Improved overall NATS and JetStream monitoring to prevent performance degradation and ensure system reliability.

This commit enhances the visibility of NATS and JetStream operations by providing key metrics to alert on potential issues and optimize system performance.

* Update rules.yml

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* fix indentation

---------

Co-authored-by: somratdutta <duttasomratand.com>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
Co-authored-by: somrat.dutta <somrat.dutta@nutanix.com>
2024-08-20 20:37:03 +02:00
samber
02687db33d Publish 2024-08-20 16:32:36 +00:00
samber
58ade95b8b Publish 2024-07-02 07:34:59 +00:00
Greg
9557d4b50e
feat(meilisearch): add basic set of rules (#425)
* feat(meilisearch): add basic meilisearch rules

* fix(query): use == instead of =

* fix(data): set correct name and use ==

* chore(meilisearch): remove index filter
2024-07-02 09:33:08 +02:00
samber
60c235975c Publish 2024-06-14 18:16:53 +00:00
samber
1ee046b739 Publish 2024-06-06 20:54:49 +00:00
samber
8759c50440 Publish 2024-05-23 12:45:56 +00:00
samber
7dd767c4b4 Publish 2024-05-15 06:10:06 +00:00
samber
826be5877f Publish 2024-05-14 18:44:11 +00:00
R.Sicart
262e451625
kube hpa lint and improvement (#417)
* fix: hpa alerts are using  label but the queries remove it

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* fix: hpa alert is using  label but the query removes it

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* feat: hpa scale max should not alert when min and max are the same

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

---------

Signed-off-by: R.Sicart <roger.sicart@gmail.com>
2024-05-14 20:43:00 +02:00
samber
81079a2a7e Publish 2024-05-14 18:35:54 +00:00
samber
04886da968 Publish 2024-05-13 10:10:12 +00:00
samber
613401a960 Publish 2024-05-13 09:12:01 +00:00
samber
84b0569c97 Publish 2024-05-13 08:33:30 +00:00
Ali
2547288c13
Added Clickhouse (#412)
* Added Clickhouse

* Update rules.yml

Added reasonable time periods for each query to avoid false positives and in some cased give the system a short window to try to solve the issue.
Also changed the severity level of authentication alerts from critical to info which seems more appropriate

* Modified time period for alerts embedded-exporter.yml

I made a few adjustments in time periods.
See if they seem reasonable or not

* Replication alerts time periods were adjusted

IMHO, replication alerts must be sent right away.
2024-05-13 10:32:18 +02:00
samber
515fca9c10 Publish 2024-05-05 23:33:11 +00:00
samber
5c0963558a Publish 2024-05-02 18:49:56 +00:00
samber
b77cb3467c Publish 2024-04-29 20:36:49 +00:00
samber
6b05a59ad9 Publish 2024-03-26 15:57:31 +00:00
Rastislav Pôbiš
2494ccdf31
Added prepared statements mysqld-exporter alert (#407) 2024-03-26 16:56:15 +01:00
samber
693c9e51b2 Publish 2024-03-11 22:29:17 +00:00
samber
7b3cef8bf9 Publish 2024-03-11 21:56:16 +00:00
samber
e2d3dadbc5 Publish 2024-02-12 08:42:15 +00:00
samber
c3258de6c7 Publish 2024-02-10 22:25:26 +00:00
samber
284db65e46 Publish 2024-02-10 19:02:28 +00:00
samber
0dba950ccc Publish 2024-02-09 19:25:17 +00:00
Brett Beutell
56a7e0d03a
Update rule for host memory underutilization to use avg_over_time instead of rate, since node_memory_MemAvailable_bytes is a gauge (#400) 2024-01-26 04:09:35 +01:00
samber
df4016bf6a Publish 2024-01-20 19:34:37 +00:00
josedev-union
c6ff5a59dc
feat: Add rules for Graph Node (#387)
Co-authored-by: josedev-union <josedev-union@users.noreply.github.com>
2024-01-20 20:33:26 +01:00
samber
6ee065c636 Publish 2023-12-01 17:26:16 +00:00
samber
7d05d142d5 Publish 2023-11-26 01:19:24 +00:00
samber
308b3c52dd Publish 2023-10-24 13:05:40 +00:00
samber
97da7f97b6 Publish 2023-10-13 15:10:33 +00:00
samber
82f2798620 Publish 2023-10-06 16:50:22 +00:00