Samuel Berthe
e82b504e00
fixes #251
2022-06-14 21:29:12 +02:00
Bastien Dronneau
bac2e99aee
docs(postgresql): add auto prefix in order to match query ( #288 )
2022-06-14 21:19:00 +02:00
Samuel Berthe
b36ea8f45d
data: adding rule "Host CPU high iowait"
2022-06-09 02:04:45 +02:00
Samuel Berthe
0207783284
data: change postgresql exporter name
2022-06-09 01:00:35 +02:00
Samuel Berthe
3faf1332a1
fix: PrometheusAllTargetsMissing ( #283 )
2022-06-09 00:43:40 +02:00
Samuel Berthe
2323541f2d
data: adding mgob query
2022-06-09 00:23:17 +02:00
Samuel Berthe
08d482f314
doc: add postgrseql bloat
2022-06-07 02:32:09 +02:00
Samuel Berthe
4662cd2812
doc: improve pulsar doc
2022-06-07 01:29:31 +02:00
Marcel Körtgen
074e3e6d04
Add pulsar rules ( #286 )
...
* Add pulsar rules
* Add webrick, cf.:
- https://github.com/github/pages-gem/issues/752
* Update gems (minitest / ruby 3 issue)
* Add repo info (workaround), cf.
- https://github.com/jekyll/jekyll/issues/4705
2022-06-07 01:21:10 +02:00
Samuel Berthe
4d26719d41
removed some rules
2022-04-19 00:07:31 +02:00
Samuel Berthe
97810b6537
change severity of PostgresqlConfigurationChanged to info
2022-04-18 23:37:17 +02:00
Samuel Berthe
8941f71c6c
chore(ci): adding test with promtool ( #281 )
2022-04-18 23:30:32 +02:00
Samuel Berthe
4d161ee0a5
feat(jenkins): add "jenkins outdated plugin" rule
2022-04-18 20:29:36 +02:00
Samuel
718b002826
fix / increases requires interval ( #279 )
2022-04-18 20:17:33 +02:00
Koen Dierckx
21ddd2f752
Added Alert manager job alert ( #272 )
...
Co-authored-by: DIERCKXK <koen.dierckx@vito.be>
2022-01-23 19:36:36 +01:00
armondressler
038e46743d
fixed erroneous usage of rate() function on gauges ( #270 )
...
Co-authored-by: Dressler Armon, B2B-PAP-HLT-DO-ENG <armon.dressler@swisscom.com>
2022-01-16 03:24:36 +01:00
MikeN. Paxos
78a7e61050
added jenkins alert rules for jenkins metrics plugin ( #268 )
...
* added jenkins alert rules
* Update rules.yml
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-12-27 12:48:07 +01:00
Samuel Berthe
fd0f2805c0
Renaming kube_hpa_* to kube_horizontalpodautoscaler_*
...
Fixes #266
2021-12-07 23:16:40 +01:00
Samuel Berthe
f3ef333a3e
doc: remove comment
2021-12-07 23:14:23 +01:00
Damon Vincent
a12f5263c2
Filter parent groups from Docker container alerts ( #267 )
2021-12-07 23:05:27 +01:00
Samuel Berthe
2ca7f5bebe
doc: more explicit description for HostClock* rules ( #265 )
2021-12-02 20:54:23 +01:00
Lauri Võsandi
2be7e9684c
Add HostNetworkBondDegraded ( #260 )
2021-12-02 20:48:11 +01:00
John Losito
1a7690a1a3
Add rule for reboot-required ( #262 )
2021-12-02 20:45:33 +01:00
leemos
ee3c878b06
apiserver_request_count has been turned off ( #264 )
2021-12-02 20:23:56 +01:00
Torsten Bøgh Köster
4e1a26cab3
Add Solr rules ( #258 )
2021-11-21 18:53:32 +01:00
chaoxiaodi
7a40d7f423
Update rules.yml ( #252 )
2021-10-27 14:00:35 +02:00
Samuel Berthe
7857afab6e
fix(rule): fixing KubernetesOutOfCapacity ( #227 )
2021-10-17 17:14:44 +02:00
Samuel Berthe
a978cfb5a1
doc: more explicit "ContainerAbsent" and "ContainerKilled" rules ( #247 )
2021-10-10 20:13:30 +02:00
Samuel Berthe
4e0d99dd09
fix(mongodb): fix query for MongodbReplicationHeadroom rule ( #250 )
2021-10-10 20:12:06 +02:00
kayge
2d9e4ae431
Cleaning up typos in rules.yml ( #248 )
2021-10-09 01:05:15 +02:00
Andre Martins
36ca52e598
adding alerts to promtail and loki ( #241 )
...
Co-authored-by: apmbktf <andre.pasqualinoto-martins@itau-unibanco.com.br>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-10-03 22:12:59 +02:00
Christian Zenker
7c67f02ee6
The metric is called 'thanos_compact_halted' ( #243 )
...
According to https://github.com/thanos-io/thanos/blob/main/examples/alerts/alerts.md
2021-09-21 15:48:27 +02:00
Ondřej Nový
abfae043bb
Fix typo in description ( #242 )
2021-09-19 23:37:51 +02:00
Samuel Berthe
a225087b06
prevent +inf max value
2021-08-19 23:45:58 +02:00
gökhan
b9222993ac
istio pilot duplicate cluster ( #220 )
2021-08-19 21:23:27 +02:00
Guillaume
6fcdcff5e3
Fix bad syntax for Haproxy rules ( #232 )
...
Aggregations require parentheses around expressions
2021-08-19 21:22:39 +02:00
flf2ko
a02a7e6eab
Fix "percentil" typo in Etcd rules ( #234 )
2021-08-19 21:21:16 +02:00
Krasimir Nedelchev
3d69117f33
Add missing parenthesis to rule ( #237 )
2021-08-19 21:20:11 +02:00
Igor Churmeev
3612c9cc3e
Add alerts for Hashicorp Vault ( #238 )
...
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-08-19 21:19:43 +02:00
Andre Martins
b47359c2fd
added alerts to cortex ( #240 )
...
* added alerts to cortex
* Update rules.yml
Co-authored-by: apmbktf <andre.pasqualinoto-martins@itau-unibanco.com.br>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-08-19 20:31:46 +02:00
Benjamin Dos Santos
7304d40539
fix(HostNetworkInterfaceSaturated): display network interface name in description ( #239 )
...
`$labels.interface` doesn't exist, use `$labels.device` instead
2021-08-16 16:29:12 +02:00
Gjed
c2b8178304
Loki alerts ( #218 )
...
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2021-07-04 23:59:46 +02:00
asteny
243c0280cf
Haproxy 2 embedded exporter fixes ( #229 )
2021-07-04 23:28:58 +02:00
Alexandros Orfanos
6a6f89bad5
Add php-fpm max-children alert ( #224 )
2021-06-29 12:37:54 +02:00
Alberto del Barrio
0ba7c2a47e
fix typo ( #228 )
2021-06-27 14:16:42 +02:00
Samuel Berthe
092d0f8bda
fix(haproxy): some query were using wrong metrics name
2021-05-01 22:48:54 +02:00
Samuel Berthe
e044fddd11
feat(data): reverse traefik exporters order
2021-05-01 22:12:12 +02:00
Samuel Berthe
af30d0f06c
fix(node_exporter): better alert description for EDAC + network errors ( #204 )
2021-05-01 22:01:10 +02:00
Samuel Berthe
135d4b7c1a
fix(data): for KubernetesPodNotHealthy, insert a step of subquery execution time
2021-05-01 20:30:35 +02:00
Samuel Berthe
54b1e674b2
fix(data): fix pg replicatino lag query
2021-05-01 19:58:42 +02:00
Moritz
335ba16032
Fix upper/lowercase of systemd ( #207 )
...
The're quite clear on how they want it to be written:
https://unix.stackexchange.com/review/suggested-edits/372414
2021-05-01 19:44:06 +02:00
Samuel Berthe
1c44cd7818
feat(data): adding k8s rule - detect container killed by oomkiller
2021-05-01 19:33:03 +02:00
Gustavo Kazuo Motizuki
18672ff0f9
Improve KubernetesOutOfCapacity alert ( #211 )
2021-05-01 19:27:46 +02:00
Samuel
97c48862d7
fix(haproxy) ( #213 )
2021-05-01 18:58:46 +02:00
Samuel Berthe
b9f09e7f93
fix(freeswitch): move to the networking section
2021-05-01 18:53:04 +02:00
Samuel
823b8edd7e
feat(freeswitch) ( #214 )
2021-05-01 18:45:36 +02:00
Samuel Berthe
c3ba0cf199
data: rename coredns metric
2021-03-28 00:34:56 +01:00
Samuel Berthe
b9db2c0c68
data: fix some elasticsearch rules
2021-02-26 11:31:06 +01:00
Samuel Berthe
1d0fd50033
fix(data): quickfix on cassandra, because i merged a little bit to fast pr-196
2021-02-22 14:44:45 +01:00
ko-christ
24ae7de2f5
Fill in PrometheusRules for instaclustr/cassandra-exporter ( #196 )
2021-02-22 14:38:40 +01:00
Samuel Berthe
19f9316868
Merge pull request #197 from yasharne/new_minio
2021-02-22 14:09:38 +01:00
Yashar Nesabian
f166c909f1
removed old minio rules
2021-02-22 11:35:49 +03:30
Samuel Berthe
ca31cc8a71
fix(data): fix node exporter temperature alarm
2021-02-21 19:05:10 +01:00
Yashar Nesabian
def11767bf
added minio disk space usage missed condition
2021-02-16 21:33:33 +03:30
Yashar Nesabian
4c5ff1fc68
Added new minio alert rules
2021-02-16 21:06:14 +03:30
Samuel Berthe
6d7ef1cdbb
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2021-02-07 20:47:59 +01:00
Samuel Berthe
4138f78ea2
feat(ui): adding navbar
2021-02-07 20:46:45 +01:00
Samuel Berthe
417fb2e691
Merge pull request #189 from strangeman/zookeeper-alerts
2021-02-02 14:16:30 +01:00
Anton Markelov
040cbe1ace
add suggested changes
2021-02-02 14:19:32 +02:00
Samuel Berthe
b1e2e02db9
💄
2021-02-01 15:46:53 +01:00
Anton Markelov
b619efac76
deal with proposed changes
2021-02-01 13:15:51 +02:00
Anton Markelov
647508e520
add alerts for kafka burrow exporter
2021-02-01 11:01:36 +02:00
Anton Markelov
1f7712b332
add alerts for dabealu/zookeeper-exporter
2021-02-01 10:48:43 +02:00
Bertrand Mailhe
cbc281cea7
fix rule -Container Volume usage-
...
Signed-off-by: Bertrand Mailhe <bmailhe@leadformance.com>
2021-01-25 17:50:22 +01:00
Samuel Berthe
0ee7f1266f
minor improvements for ssl exporter
2021-01-20 18:09:36 +01:00
Samuel Berthe
8d0826020b
Merge pull request #184 from yasharne/master
...
added ssl/tls exporter alert rules
2021-01-20 18:02:00 +01:00
Yashar Nesabian
916ac1af8f
added ssl/tls exporter alert rules
2021-01-20 14:51:23 +03:30
Samuel Berthe
6f76f46eff
redis: adding comment for exporter flag
2021-01-19 17:11:40 +01:00
Per Lundberg
b3674d96c5
Redis: add alternative maxmemory alert
2021-01-19 15:40:53 +02:00
Samuel Berthe
93b2f1390a
fixing #181 : k8s request latency tracking
2021-01-13 11:22:21 +01:00
Samuel Berthe
d3f514b7e4
data: fixing etcd query
2021-01-11 11:01:14 +01:00
Samuel Berthe
f5e05c55d0
data: some netdata disk alerts
2021-01-08 23:48:04 +01:00
Samuel Berthe
3b9cd87f3d
data: adding nomad
2021-01-08 23:40:54 +01:00
Samuel Berthe
f7c25e648c
data: adding netdata
2021-01-08 23:26:57 +01:00
Heckel, Robert J
ce12720abc
updating per samber's comment.
2021-01-08 12:13:56 -06:00
Heckel, Robert J
b033ca9e8d
Adding some basic rules snagged from the defaults.
2021-01-07 10:40:49 -06:00
Benjamin Dos Santos
0f24d8cc9e
refactor: improve some haproxy v2 rules
2021-01-06 21:09:21 +01:00
Samuel Berthe
df602d6e47
typo haproxy error status
2021-01-06 15:38:07 +01:00
Samuel Berthe
fe00569998
Merge pull request #172 from bdossantos/chore/haproxy2
...
chore: add Prometheus alerts for HAProxy v2
2021-01-06 15:37:19 +01:00
Gert Vilain
de8e2f6cd9
Remove duplicate kubernetes job failed
2021-01-05 20:49:25 +01:00
Benjamin Dos Santos
1b7c36666c
chore: add Prometheus alerts for HAProxy v2
...
ref #87
2021-01-05 16:45:52 +01:00
Samuel Berthe
209fdf86e8
reduce p99 quantile aggregation duration
2021-01-05 12:30:32 +01:00
Samuel Berthe
5d7d99a658
Merge pull request #171 from tosin-ogunrinde/master
2021-01-03 21:45:45 +01:00
Tosin Ogunrinde
21817c3551
Improve JVM "JVM memory filling up" alert by summing up all the heap areas which include a separate entry for the Eden Space, Survivor Space and Tenured Gen.
2020-12-31 09:16:09 +00:00
Tosin Ogunrinde
ebf402aa7d
Improve JVM "JVM memory filling up" alert by summing up all the heap areas which include a separate entry for the Eden Space, Survivor Space and Tenured Gen.
2020-12-31 09:06:36 +00:00
Samuel Berthe
97345d3b6f
mysql restart alert: severity=info
2020-12-31 00:47:14 +01:00
Samuel Berthe
971bbe03ec
Add FOR clause to alerting rules (when necessary)
2020-12-31 00:27:12 +01:00
Samuel Berthe
3a352d08dc
fix k8s rule: longer alert check time
2020-12-30 19:13:02 +01:00
Samuel Berthe
d3ecfaaad3
Merge pull request #139 from xkfen/istio
2020-12-30 18:47:28 +01:00
Samuel Berthe
2f6d4921c6
fix initial istio alerts
2020-12-30 18:46:50 +01:00
Samuel Berthe
fa4325218f
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2020-12-30 17:46:58 +01:00
Samuel Berthe
ed62bdc567
alerts node_exporter: improve network and disk rules
2020-12-30 17:45:30 +01:00
Tosin Ogunrinde
0add93363f
Fix JVM "JVM memory filling up" alert
2020-12-30 00:30:08 +00:00
Samuel Berthe
f686698f68
Merge pull request #166 from cityofships/fix_es
...
Fix Elasticsearch "No new documents" alert
2020-12-28 16:50:47 +01:00
Samuel Berthe
965fefab89
fix alert description
2020-12-28 16:40:11 +01:00
Carl Düvel
a7c5155002
Add cpu steal alert
2020-12-21 19:06:45 +01:00
Piotr Parczewski
f7d08e364b
Fix Elasticsearch "No new documents" alert.
...
Prometheus rate() function calculates the per-second average rate
of increase. This means the alert gets triggered whenever during
last 10 minutes there were less than 1 document ingested *per second*
(60 documents per minute).
Signed-off-by: Piotr Parczewski <piotr@stackhpc.com>
2020-12-17 15:00:01 +01:00
Per Lundberg
f673fe72c3
Update rules.yml
...
Fixes bug in previous commit. `or` has lower precedence than `<` in PromQL so hence the need for the grouping using parentheses.
2020-11-27 11:08:46 +02:00
Per Lundberg
00dd58eace
Fix Redis missing master query
...
The previous approach fails because of the "missing data" semantics in Prometheus. If the Redis server is down, PromQL will typically return "no data" instead of 0 for a `count()`; this is by design in Prometheus.
This suggestion as given by @slovdahl works around this by returning an vector with a single `0` entry in this case, making the query work as intended.
2020-11-25 16:06:05 +02:00
Samuel Berthe
2186841f29
Merge pull request #140 from yasharne/percona_mongodb
2020-11-15 18:12:20 +01:00
Vincent Fiset
6ed4358452
remove replset_oplog based alerts
2020-11-09 11:14:01 -05:00
Samuel Berthe
3ccfaa47ea
remove useless brackets
2020-11-07 18:08:02 +01:00
Samuel Berthe
9f144acb30
haproxy: fix description of request errors
2020-11-07 18:07:20 +01:00
Samuel Berthe
be20363602
rate is better than irate for alerting
2020-11-07 17:46:18 +01:00
Liudmyla Derkach
e6113ff2db
feat: adding few useful rabbitmq alerts
2020-10-30 19:10:52 +02:00
Yashar Nesabian
2a2ecf8a8c
change alert rules which were using avg to show more accurate value based on the replica set
2020-10-24 22:03:42 +03:30
Felix Breidenstein
1b6cd55200
Adapt rules for windows to new exporter
2020-10-20 14:52:36 +02:00
Nabil BENDAFI
e024c542ed
feat(kubernetes): add Out of capacity
2020-10-16 12:15:56 +02:00
Samuel Berthe
ead7db708e
alert on containers CPU: add a comment to exclude cAdvisor
2020-10-11 21:38:48 +02:00
Samuel Berthe
50b4c499fa
rules: adding a few cassandra alerts
2020-10-11 19:55:18 +02:00
Samuel Berthe
0cf82fd3e7
Merge branch 'master' into NetworkSpeed
2020-10-11 19:39:59 +02:00
Samuel Berthe
06205cd91c
Update rules.yml
2020-10-11 19:39:17 +02:00
Samuel Berthe
89252f999f
Merge branch 'master' into master
2020-10-11 19:26:04 +02:00
Samuel Berthe
66e6581b07
Merge pull request #121 from osterik/master
...
check free space for all mountpoints
2020-10-11 19:22:27 +02:00
Samuel Berthe
ea7e6d6aa9
Merge pull request #125 from mcanevet/patch-1
...
Fix HAProxy rules
2020-10-11 18:21:41 +02:00
Samuel Berthe
8616b0241c
Merge pull request #130 from nabilbendafi/feature/traefik_rules
2020-10-11 18:10:06 +02:00
Samuel Berthe
e8572f618b
Merge pull request #133 from tux-00/master
2020-10-11 18:07:11 +02:00
Samuel Berthe
2f6b9832fa
Update rules.yml
2020-10-11 18:06:06 +02:00
Samuel Berthe
8af9ca4ba8
Merge pull request #134 from nanorobocop/fix-prometheus-job-missing-alert
...
Fix PrometheusJobMissing alert
2020-10-11 17:48:42 +02:00
Samuel Berthe
2e6e46da45
Merge branch 'master' into master
2020-10-11 17:42:51 +02:00
Samuel Berthe
c469d26c4d
Merge pull request #137 from Ozarklake/sql_server_rules
2020-10-11 17:37:40 +02:00
Samuel Berthe
bafcd1e922
Update rules.yml
2020-10-11 17:35:46 +02:00
Samuel Berthe
e60fc805f6
Merge pull request #138 from nirav-chotai/nchotai/fix-hpa-alerts
...
[PLEASE_MERGE] Fix HPA alerts
2020-10-11 17:24:13 +02:00
Samuel Berthe
45103f0a0d
Merge branch 'master' into master
2020-10-11 17:10:20 +02:00
Samuel Berthe
7a609adf18
adding comment to container OOM killer warning
2020-10-11 16:11:44 +02:00
Samuel Berthe
cf70272309
fix(container memory limit): filter by containers having max memory setting
2020-10-11 16:08:54 +02:00
Samuel Berthe
4128004475
Merge pull request #119 from fernandocarletti/patch-1
...
fix: container ContainerMemoryUsage alert
2020-10-11 16:06:33 +02:00
Samuel Berthe
f67162bf57
Merge pull request #148 from fsschmitt/fix/disk-latency-unit
...
Fix time unit on disk read/write latency rule
2020-10-11 15:49:15 +02:00
fsschmitt
4266b4d326
Fix time unit on disk read/write latency rule
2020-10-06 14:36:22 +01:00
fsschmitt
5288c9a2f5
Fix node_md_disks state from fail to failed
2020-10-06 13:33:50 +01:00
Daniel Andrzejewski
fc4797db9e
small fix
2020-09-17 15:19:14 +02:00
Daniel Andrzejewski
6c5f708179
node_disk_write_time_seconds_total is in seconds, not in milliseconds. node_disk_write_time_seconds_total should be grater than 0, otherwise you get +Inf result.
2020-09-17 15:13:42 +02:00
Yashar Nesabian
d6b39a7f3f
More accurate alerts
...
added `mondodb instance down` alert and changed the `too many
connections` alert to fire when the connections are more than 80% of the
available connections.
removed `mongodb_replset_member_state` based alerts as I don't have
enough information on them
2020-08-09 10:35:39 +04:30
Yashar Nesabian
3ce1084f5b
Added percona mongodb alert rules
2020-08-03 10:45:32 +04:30
kaifen.xie
a04eef39c0
add istio
2020-07-25 23:24:36 +08:00
Nirav Chotai
8fb5da83de
Fix HPA alerts
...
- Fixing KubernetesHpaMetricAvailability
- Fixing KubernetesHpaScalingAbility
2020-07-24 13:32:44 +08:00
Ozarklake
88e812c78e
add sql server rules
2020-07-17 15:02:41 +08:00
Ozarklake
4e66d17d01
add sql server rules
2020-07-17 14:58:26 +08:00
Ozarklake
e009c5d8b5
Optimizing mysql slow query alert rules
2020-07-14 12:55:17 +08:00
Mansur Marvanov
05e521c0a8
Fix PrometheusJobMissing alert
2020-07-09 16:36:45 +09:00