Commit graph

765 commits

Author SHA1 Message Date
dzaczek
11a78f0f06
Update google-cadvisor.yml (#382)
* Update google-cadvisor.yml

    Expression Explanation:
    The expression calculates the absolute change in CPU usage for containers by comparing the current rate of CPU usage (within the last 1 minute) with the rate of CPU usage from the previous minute. If this change exceeds 25%, the alert is triggered. Additionally, it compares the current rate of CPU usage with the rate from the previous 5 minutes to capture larger trends. If any of these conditions are met, the alert fires.
    
    Alert Details:
    - Alert Name: ContainerHighLowChangeCpuUsage
    - Trigger Condition: Absolute change in CPU usage exceeding 25%
    - Alert Severity: Informational (info)

* Add alert rule for high CPU usage change

* Change alert severity from warning to info

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:46:53 +01:00
samber
7889a9a29b Publish 2025-02-16 22:37:09 +00:00
Samuel Berthe
add097c489
data: revert 5f57f09 (see #398) 2025-02-16 23:36:44 +01:00
samber
12b8acb1b8 Publish 2025-02-16 22:29:24 +00:00
asdf1234
4a7b9b5c72
Update mysqld-exporter.yml (#442)
* Update mysqld-exporter.yml

add some rules

* Add new MySQL monitoring rules

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:29:00 +01:00
samber
20f9a36615 Publish 2025-02-16 22:17:02 +00:00
Samuel Berthe
fb857e8b39
data: fix rules 2025-02-16 23:16:36 +01:00
Samuel Berthe
2f9c0c0483
upgrade ruby version 2025-02-16 23:15:43 +01:00
Samuel Berthe
eb92a79898
upgrade github action ruby version 2025-02-04 16:44:40 +01:00
Samuel Berthe
ae12871fa9
Update rules.yml 2025-02-04 16:40:21 +01:00
Felix Bühler
10d00c66da
Add caddy.yml (#450) 2025-02-04 14:23:14 +01:00
guruevi
70ac7d9cae
Various updates and quality of life changes (#405)
* smartctl_exporter publishes both drive_trip and current drive temperatures. Since most of the alerts are going to be permanent, it does not make sense to wait for the alert to be on for a certain time. Temperature sensors likewise vary, using the last sample is not sufficient to alert on potential issues.

* Add an option to run GitHub Action manually

* Add an option to force running the action for testing purposes

* Set variables correctly

* Set variables correctly

* Publish

* Clean up some more metrics

* Publish

* Minor bug fixes

* Publish

* Removed queries that throw errors when systems are upgraded. Also fixed and simplified a few Postgres queries.

* Publish

* Refined some more queries

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* Publish

* Query fails if instance names are not unique across jobs. This fixes it.

* Publish

* Ruby is out of date

---------

Co-authored-by: samber <samber@users.noreply.github.com>
2025-01-28 06:06:47 +01:00
Samuel Berthe
fc6b3faadc
Fix from #405 2025-01-28 06:04:10 +01:00
Samuel Berthe
d916b7c6ab
Fix from #405 2025-01-28 05:58:49 +01:00
sunlei
cbb2337438
fix: formatting errors (#448)
* fix: formatting errors

* Update query format in rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-01-12 22:01:21 +01:00
samber
53a369769d Publish 2024-12-16 11:19:08 +00:00
Samuel Berthe
bdcc67c04e
Update rules.yml 2024-12-16 12:17:59 +01:00
Samuel Berthe
84a3b517a8
Update rules.yml 2024-12-16 12:17:26 +01:00
samber
4533f23b79 Publish 2024-12-16 11:17:17 +00:00
dxrayz
52d4a8c744
Update postgres-exporter.yml (#444)
Modify PostgresqlConfigurationChanged for prevent error: "many-to-many matching not allowed: matching labels must be unique on one side" in cases when you have multiple instances of postgres
2024-12-16 12:16:05 +01:00
samber
c5203e94d0 Publish 2024-12-08 20:29:15 +00:00
Samuel Berthe
a8d7c43b30
Update rules.yml 2024-12-08 21:28:07 +01:00
Samuel Berthe
fff8a80ae5
Update README.md 2024-12-08 21:24:45 +01:00
samber
4e38ae2087 Publish 2024-12-05 22:38:38 +00:00
Samuel Berthe
8c3d06502f
Update rules.yml 2024-12-05 23:37:28 +01:00
samber
8a220b1b8a Publish 2024-11-30 09:31:05 +00:00
Martin Anderson
353ef1ed95
RabbitMQ: add too many ready messages alert (#441)
* RabbitMQ: add too many ready messages alert

* Add RabbitMQ ready messages alert rule

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2024-11-30 10:29:57 +01:00
samber
14949721ba Publish 2024-10-28 21:25:18 +00:00
sipr-invivo
bb75cb2c68
feat: Add rule to Kubernetes Job not starting (#436) 2024-10-28 22:24:10 +01:00
dependabot[bot]
f9e683896f
build(deps-dev): bump rexml from 3.3.7 to 3.3.9 (#438)
Bumps [rexml](https://github.com/ruby/rexml) from 3.3.7 to 3.3.9.
- [Release notes](https://github.com/ruby/rexml/releases)
- [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md)
- [Commits](https://github.com/ruby/rexml/compare/v3.3.7...v3.3.9)

---
updated-dependencies:
- dependency-name: rexml
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-28 20:17:58 +01:00
Samuel Berthe
c41fda1d92
Update alertmanager.md 2024-10-06 17:31:23 +02:00
Samuel Berthe
7313acce36
Create FUNDING.json 2024-10-05 18:57:43 +02:00
Samuel Berthe
640f06588d
Delete FUNDING.json 2024-10-05 18:21:35 +02:00
Samuel Berthe
cd5b39a1f0
Create FUNDING.json 2024-10-05 18:06:22 +02:00
dependabot[bot]
35596c866f
build(deps): bump webrick from 1.7.0 to 1.8.2 (#435)
Bumps [webrick](https://github.com/ruby/webrick) from 1.7.0 to 1.8.2.
- [Release notes](https://github.com/ruby/webrick/releases)
- [Commits](https://github.com/ruby/webrick/compare/v1.7.0...v1.8.2)

---
updated-dependencies:
- dependency-name: webrick
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-27 22:24:21 +02:00
Samuel Berthe
d6d6ae4ef8
fix: Gemfile to reduce vulnerabilities (#434)
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-RUBY-WEBRICK-8068535

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
2024-09-26 11:31:21 +02:00
dependabot[bot]
65a5f586cb
build(deps-dev): bump rexml from 3.3.3 to 3.3.6 (#431)
Bumps [rexml](https://github.com/ruby/rexml) from 3.3.3 to 3.3.6.
- [Release notes](https://github.com/ruby/rexml/releases)
- [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md)
- [Commits](https://github.com/ruby/rexml/compare/v3.3.3...v3.3.6)

---
updated-dependencies:
- dependency-name: rexml
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-09 20:09:20 +02:00
samber
4aa45dee05 Publish 2024-08-28 06:49:52 +00:00
Samuel Berthe
f08e8df514
oops 2024-08-28 08:48:42 +02:00
Samuel Berthe
995ab4d27a
Update rules.yml 2024-08-28 08:46:41 +02:00
Samuel Berthe
3bf8d6d824
fix: Gemfile to reduce vulnerabilities (#432)
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-RUBY-REXML-7814166

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
2024-08-24 10:42:21 +02:00
Somrat Dutta
8c0bdc2b24
feat: Add NATS and JetStream Prometheus alert rules (#430)
* feat: Add comprehensive NATS and JetStream Prometheus alert rules

- Added multiple Prometheus alert rules for monitoring NATS server and JetStream metrics.
- Included alerts for:
  - High connection count
  - High pending bytes
  - High subscriptions count
  - High routes count
  - High memory usage
  - Slow consumers
  - NATS server downtime
  - High CPU usage
  - High number of active connections
  - High JetStream store and memory usage
  - Subscription limits exceeded
  - High pending messages
  - Authentication timeouts
  - Errors in NATS (JetStream API errors)
  - JetStream consumers limit exceeded
  - Exceeding max payload size
  - Leaf node connection issues
  - Ping operations limit exceeded
  - Write deadline exceeded
- Ensured consistency between `exporter.yml` and `rules.yml` files.
- Improved overall NATS and JetStream monitoring to prevent performance degradation and ensure system reliability.

This commit enhances the visibility of NATS and JetStream operations by providing key metrics to alert on potential issues and optimize system performance.

* Update rules.yml

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* fix indentation

---------

Co-authored-by: somratdutta <duttasomratand.com>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
Co-authored-by: somrat.dutta <somrat.dutta@nutanix.com>
2024-08-20 20:37:03 +02:00
samber
02687db33d Publish 2024-08-20 16:32:36 +00:00
Samuel Berthe
d1715de751
fix PostgresqlInvalidIndex rule 2024-08-20 18:31:18 +02:00
dependabot[bot]
61da73d517
build(deps-dev): bump rexml from 3.3.2 to 3.3.3 (#428)
Bumps [rexml](https://github.com/ruby/rexml) from 3.3.2 to 3.3.3.
- [Release notes](https://github.com/ruby/rexml/releases)
- [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md)
- [Commits](https://github.com/ruby/rexml/compare/v3.3.2...v3.3.3)

---
updated-dependencies:
- dependency-name: rexml
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-02 14:14:26 +02:00
dependabot[bot]
225607cf7f
build(deps-dev): bump nokogiri from 1.15.6 to 1.16.5 (#427)
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.15.6 to 1.16.5.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.15.6...v1.16.5)

---
updated-dependencies:
- dependency-name: nokogiri
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-30 17:25:23 +02:00
Samuel Berthe
2c764df932
fix: Gemfile & Gemfile.lock to reduce vulnerabilities (#426)
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-RUBY-REXML-7462086

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
2024-07-18 10:14:45 +02:00
samber
58ade95b8b Publish 2024-07-02 07:34:59 +00:00
Samuel Berthe
47e74f65e0
Update rules.yml 2024-07-02 09:33:51 +02:00
Greg
9557d4b50e
feat(meilisearch): add basic set of rules (#425)
* feat(meilisearch): add basic meilisearch rules

* fix(query): use == instead of =

* fix(data): set correct name and use ==

* chore(meilisearch): remove index filter
2024-07-02 09:33:08 +02:00