Commit graph

793 commits

Author SHA1 Message Date
samber
becbe1be3b Publish 2025-05-08 17:49:45 +00:00
andrii.k
e0e3cdda1d
update istio 4xx alert description (#463) 2025-05-08 19:49:18 +02:00
Samuel Berthe
4be87d7796
Update README.md 2025-05-03 22:53:51 +02:00
samber
fd9da90c1d Publish 2025-05-03 20:52:49 +00:00
Carsten Thiel
79f45a5146
Adding rules for checking FluxCD (#458) 2025-05-03 22:52:26 +02:00
samber
9f5c641bdd Publish 2025-04-23 08:31:10 +00:00
samber
aca1bdf1fb Publish 2025-04-23 08:28:06 +00:00
Samuel Berthe
4666830538
Update rules.yml 2025-04-23 10:18:08 +02:00
samber
198035eaf4 Publish 2025-04-23 07:58:55 +00:00
Roger
b3d25fafcf
feature/kubestate exporter check if node is scheduling disabeld (#462)
* feature/kubestate-exporter-check-if-node-is-scheduling-disabeld

* commented added

* typo in expr

* move code to right file


---------

Co-authored-by: Roger Sikorski <roger.sikorski@zweiloewen.com>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-04-23 09:58:29 +02:00
dependabot[bot]
6446bb44be
build(deps-dev): bump nokogiri from 1.18.4 to 1.18.8 (#460)
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.18.4 to 1.18.8.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.18.4...v1.18.8)

---
updated-dependencies:
- dependency-name: nokogiri
  dependency-version: 1.18.8
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-22 11:37:56 +02:00
samber
a75d5124c5 Publish 2025-04-17 15:26:25 +00:00
Samuel Berthe
3b440fec7b
Remove buggy HostRequiresReboot rule
Closing #459
2025-04-17 17:26:00 +02:00
samber
32a4bfb19b Publish 2025-03-27 16:23:49 +00:00
Samuel Berthe
8b730ef059
Update rules.yml 2025-03-27 17:23:19 +01:00
samber
93f9daecee Publish 2025-03-27 13:42:51 +00:00
Motte
69c8208e3c
Added PostgresqlReplicationLagHigh rule (#456)
* Added PostgresqlReplicationLagHigh rule

* Update PostgreSQL replication lag alert settings

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-03-27 14:42:22 +01:00
Pigueiras
97a31f34e5
Fix queries in elasticsearch latency alerts (#455)
The `elasticsearch_indices_search_fetch_total`,
`elasticsearch_indices_search_fetch_time_seconds`,
`elasticsearch_indices_indexing_index_time_seconds_total`
and `elasticsearch_indices_indexing_index_total` metrics
are counters.

Dividing these metrics doesn't make sense because a spike in
numerator would cause the alert to persist, even if subsequent
fetch/index operations are normal. Adding `increase` changes the query
to check if operations took, on average, more than X over
a 1-minute interval, which was likely the original intent of
this alert.
2025-03-26 22:15:24 +01:00
dependabot[bot]
242054f7dc
build(deps-dev): bump uri from 0.13.1 to 0.13.2 (#454)
Bumps [uri](https://github.com/ruby/uri) from 0.13.1 to 0.13.2.
- [Release notes](https://github.com/ruby/uri/releases)
- [Commits](https://github.com/ruby/uri/compare/v0.13.1...v0.13.2)

---
updated-dependencies:
- dependency-name: uri
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-23 16:30:56 +01:00
dependabot[bot]
4335f85830
build(deps-dev): bump nokogiri from 1.18.3 to 1.18.4 (#453)
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.18.3 to 1.18.4.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.18.3...v1.18.4)

---
updated-dependencies:
- dependency-name: nokogiri
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-23 16:26:08 +01:00
samber
7bcae33011 Publish 2025-02-20 15:18:08 +00:00
Samuel Berthe
2127c4ce90
Update rules.yml 2025-02-20 16:17:39 +01:00
samber
9963b750ac Publish 2025-02-20 14:06:17 +00:00
Roman
c189984d0f
fix node-exporter.yaml missing parentheses (#452) 2025-02-20 15:05:48 +01:00
samber
807db03d0d Publish 2025-02-19 14:25:58 +00:00
Samuel Berthe
6838196343
fix: remove duplicated rule 2025-02-19 15:25:29 +01:00
dependabot[bot]
0f4b45d127
build(deps-dev): bump nokogiri from 1.16.7 to 1.18.3 (#451)
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.16.7 to 1.18.3.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/v1.18.3/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.16.7...v1.18.3)

---
updated-dependencies:
- dependency-name: nokogiri
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-19 14:33:37 +01:00
samber
4e49e77d29 Publish 2025-02-16 22:47:17 +00:00
dzaczek
11a78f0f06
Update google-cadvisor.yml (#382)
* Update google-cadvisor.yml

    Expression Explanation:
    The expression calculates the absolute change in CPU usage for containers by comparing the current rate of CPU usage (within the last 1 minute) with the rate of CPU usage from the previous minute. If this change exceeds 25%, the alert is triggered. Additionally, it compares the current rate of CPU usage with the rate from the previous 5 minutes to capture larger trends. If any of these conditions are met, the alert fires.
    
    Alert Details:
    - Alert Name: ContainerHighLowChangeCpuUsage
    - Trigger Condition: Absolute change in CPU usage exceeding 25%
    - Alert Severity: Informational (info)

* Add alert rule for high CPU usage change

* Change alert severity from warning to info

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:46:53 +01:00
samber
7889a9a29b Publish 2025-02-16 22:37:09 +00:00
Samuel Berthe
add097c489
data: revert 5f57f09 (see #398) 2025-02-16 23:36:44 +01:00
samber
12b8acb1b8 Publish 2025-02-16 22:29:24 +00:00
asdf1234
4a7b9b5c72
Update mysqld-exporter.yml (#442)
* Update mysqld-exporter.yml

add some rules

* Add new MySQL monitoring rules

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-02-16 23:29:00 +01:00
samber
20f9a36615 Publish 2025-02-16 22:17:02 +00:00
Samuel Berthe
fb857e8b39
data: fix rules 2025-02-16 23:16:36 +01:00
Samuel Berthe
2f9c0c0483
upgrade ruby version 2025-02-16 23:15:43 +01:00
Samuel Berthe
eb92a79898
upgrade github action ruby version 2025-02-04 16:44:40 +01:00
Samuel Berthe
ae12871fa9
Update rules.yml 2025-02-04 16:40:21 +01:00
Felix Bühler
10d00c66da
Add caddy.yml (#450) 2025-02-04 14:23:14 +01:00
guruevi
70ac7d9cae
Various updates and quality of life changes (#405)
* smartctl_exporter publishes both drive_trip and current drive temperatures. Since most of the alerts are going to be permanent, it does not make sense to wait for the alert to be on for a certain time. Temperature sensors likewise vary, using the last sample is not sufficient to alert on potential issues.

* Add an option to run GitHub Action manually

* Add an option to force running the action for testing purposes

* Set variables correctly

* Set variables correctly

* Publish

* Clean up some more metrics

* Publish

* Minor bug fixes

* Publish

* Removed queries that throw errors when systems are upgraded. Also fixed and simplified a few Postgres queries.

* Publish

* Refined some more queries

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* PostgreSQL now has optimized autovacuum behavior

* Publish

* Publish

* Query fails if instance names are not unique across jobs. This fixes it.

* Publish

* Ruby is out of date

---------

Co-authored-by: samber <samber@users.noreply.github.com>
2025-01-28 06:06:47 +01:00
Samuel Berthe
fc6b3faadc
Fix from #405 2025-01-28 06:04:10 +01:00
Samuel Berthe
d916b7c6ab
Fix from #405 2025-01-28 05:58:49 +01:00
sunlei
cbb2337438
fix: formatting errors (#448)
* fix: formatting errors

* Update query format in rules.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-01-12 22:01:21 +01:00
samber
53a369769d Publish 2024-12-16 11:19:08 +00:00
Samuel Berthe
bdcc67c04e
Update rules.yml 2024-12-16 12:17:59 +01:00
Samuel Berthe
84a3b517a8
Update rules.yml 2024-12-16 12:17:26 +01:00
samber
4533f23b79 Publish 2024-12-16 11:17:17 +00:00
dxrayz
52d4a8c744
Update postgres-exporter.yml (#444)
Modify PostgresqlConfigurationChanged for prevent error: "many-to-many matching not allowed: matching labels must be unique on one side" in cases when you have multiple instances of postgres
2024-12-16 12:16:05 +01:00
samber
c5203e94d0 Publish 2024-12-08 20:29:15 +00:00
Samuel Berthe
a8d7c43b30
Update rules.yml 2024-12-08 21:28:07 +01:00