Felix Breidenstein
1b6cd55200
Adapt rules for windows to new exporter
2020-10-20 14:52:36 +02:00
Nabil BENDAFI
e024c542ed
feat(kubernetes): add Out of capacity
2020-10-16 12:15:56 +02:00
Samuel Berthe
ead7db708e
alert on containers CPU: add a comment to exclude cAdvisor
2020-10-11 21:38:48 +02:00
Samuel Berthe
50b4c499fa
rules: adding a few cassandra alerts
2020-10-11 19:55:18 +02:00
Samuel Berthe
0cf82fd3e7
Merge branch 'master' into NetworkSpeed
2020-10-11 19:39:59 +02:00
Samuel Berthe
06205cd91c
Update rules.yml
2020-10-11 19:39:17 +02:00
Samuel Berthe
89252f999f
Merge branch 'master' into master
2020-10-11 19:26:04 +02:00
Samuel Berthe
66e6581b07
Merge pull request #121 from osterik/master
...
check free space for all mountpoints
2020-10-11 19:22:27 +02:00
Samuel Berthe
ea7e6d6aa9
Merge pull request #125 from mcanevet/patch-1
...
Fix HAProxy rules
2020-10-11 18:21:41 +02:00
Samuel Berthe
8616b0241c
Merge pull request #130 from nabilbendafi/feature/traefik_rules
2020-10-11 18:10:06 +02:00
Samuel Berthe
e8572f618b
Merge pull request #133 from tux-00/master
2020-10-11 18:07:11 +02:00
Samuel Berthe
2f6b9832fa
Update rules.yml
2020-10-11 18:06:06 +02:00
Samuel Berthe
8af9ca4ba8
Merge pull request #134 from nanorobocop/fix-prometheus-job-missing-alert
...
Fix PrometheusJobMissing alert
2020-10-11 17:48:42 +02:00
Samuel Berthe
2e6e46da45
Merge branch 'master' into master
2020-10-11 17:42:51 +02:00
Samuel Berthe
c469d26c4d
Merge pull request #137 from Ozarklake/sql_server_rules
2020-10-11 17:37:40 +02:00
Samuel Berthe
bafcd1e922
Update rules.yml
2020-10-11 17:35:46 +02:00
Samuel Berthe
e60fc805f6
Merge pull request #138 from nirav-chotai/nchotai/fix-hpa-alerts
...
[PLEASE_MERGE] Fix HPA alerts
2020-10-11 17:24:13 +02:00
Samuel Berthe
45103f0a0d
Merge branch 'master' into master
2020-10-11 17:10:20 +02:00
Samuel Berthe
7a609adf18
adding comment to container OOM killer warning
2020-10-11 16:11:44 +02:00
Samuel Berthe
cf70272309
fix(container memory limit): filter by containers having max memory setting
2020-10-11 16:08:54 +02:00
Samuel Berthe
4128004475
Merge pull request #119 from fernandocarletti/patch-1
...
fix: container ContainerMemoryUsage alert
2020-10-11 16:06:33 +02:00
Samuel Berthe
f67162bf57
Merge pull request #148 from fsschmitt/fix/disk-latency-unit
...
Fix time unit on disk read/write latency rule
2020-10-11 15:49:15 +02:00
fsschmitt
4266b4d326
Fix time unit on disk read/write latency rule
2020-10-06 14:36:22 +01:00
fsschmitt
5288c9a2f5
Fix node_md_disks state from fail to failed
2020-10-06 13:33:50 +01:00
Daniel Andrzejewski
fc4797db9e
small fix
2020-09-17 15:19:14 +02:00
Daniel Andrzejewski
6c5f708179
node_disk_write_time_seconds_total is in seconds, not in milliseconds. node_disk_write_time_seconds_total should be grater than 0, otherwise you get +Inf result.
2020-09-17 15:13:42 +02:00
Yashar Nesabian
d6b39a7f3f
More accurate alerts
...
added `mondodb instance down` alert and changed the `too many
connections` alert to fire when the connections are more than 80% of the
available connections.
removed `mongodb_replset_member_state` based alerts as I don't have
enough information on them
2020-08-09 10:35:39 +04:30
Yashar Nesabian
3ce1084f5b
Added percona mongodb alert rules
2020-08-03 10:45:32 +04:30
kaifen.xie
a04eef39c0
add istio
2020-07-25 23:24:36 +08:00
Nirav Chotai
8fb5da83de
Fix HPA alerts
...
- Fixing KubernetesHpaMetricAvailability
- Fixing KubernetesHpaScalingAbility
2020-07-24 13:32:44 +08:00
Ozarklake
88e812c78e
add sql server rules
2020-07-17 15:02:41 +08:00
Ozarklake
4e66d17d01
add sql server rules
2020-07-17 14:58:26 +08:00
Ozarklake
e009c5d8b5
Optimizing mysql slow query alert rules
2020-07-14 12:55:17 +08:00
Mansur Marvanov
05e521c0a8
Fix PrometheusJobMissing alert
2020-07-09 16:36:45 +09:00
tux
add6d9c2f3
Add official rabbitmq exporter rules
2020-06-30 15:48:42 +02:00
Nabil BENDAFI
b324c6f32f
feat(traefik): add rules for Traefik v2
...
Fixes #7
2020-06-23 13:40:01 +02:00
Mickaël Canévet
24f7095cd5
Fix HAProxy rules
2020-05-29 10:11:54 +02:00
Ilya Kisleyko
663b0e94da
check free space for all mountpoints
2020-05-20 20:04:32 +03:00
Anton Smolkov
bbbe14f2bd
Update rules.yml
...
WMI memory alert had opposite meaning, triggered on 90% free instead of 90% used
2020-05-19 11:07:11 +03:00
Fernando Carletti
e6de413146
fix: container ContainerMemoryUsage alert
2020-05-18 17:38:05 -05:00
Rob Brown
5050fd64d5
Correct "device" to "interface"
2020-05-14 16:57:19 +01:00
Samuel Berthe
da1e4f6301
💄 replacing "error" severity by "critical", repo wide
2020-05-14 17:20:19 +02:00
Rob Brown
5d3e812fd7
Add HostNetworkNot1GbSpeed rule
2020-05-14 15:00:24 +01:00
Samuel Berthe
7293bca720
Merge pull request #107 from robert-will-brown/NetworkTransmitErrors
2020-05-09 21:32:40 +02:00
Samuel Berthe
b081f28f5d
Merge pull request #112 from robert-will-brown/SpeedTestExporter
2020-05-09 21:31:33 +02:00
Samuel Berthe
660312d0ea
fix OOM killer threshold
2020-05-09 21:25:13 +02:00
Samuel Berthe
6d6b41e241
Merge pull request #108 from robert-will-brown/EdacMemoryErrors
2020-05-09 21:23:01 +02:00
Rob Brown
8faa295745
Add SpeedTest stanza
2020-05-09 10:20:55 +01:00
Rob Brown
ee4e046c66
Add "> 0" at the end of NetworkTransmitErrors queries
2020-05-09 10:18:21 +01:00
Samuel Berthe
d5f6388899
renaming some mysql alerts
2020-05-09 02:11:18 +02:00
Rob Brown
5d83e393cc
Add initial Speedtest Exporter rules
2020-05-08 15:25:54 +01:00
Rob Brown
8912db93bc
Fix "greater than" value
2020-05-04 19:04:52 +01:00
Rob Brown
4b22c078ea
Align EDAC errors with comments
2020-05-04 18:47:20 +01:00
Samuel Berthe
718cd2188c
shame on me
2020-05-04 00:10:43 +02:00
Samuel Berthe
eb8dc736a3
improve acuracy for context switching query
2020-05-04 00:05:33 +02:00
Samuel Berthe
790139211e
fix typo: postgresql replication lag
2020-05-03 23:23:21 +02:00
Samuel Berthe
648b83250a
improve accuracy "Kubernetes Pod not healthy" query
2020-05-03 18:01:25 +02:00
Ondrej Zalesky
d3d13946e6
fix "Kubernetes Pod not healthy" query
2020-04-30 22:53:25 +02:00
Rob Brown
981e82d649
Add HostEDACUncorrectableErrorsdetected and HostEDACCorrectableErrorsdetected rules
2020-04-30 13:27:30 +01:00
Rob Brown
f87e6d300d
Added spacing as per standard
2020-04-30 12:39:12 +01:00
Rob Brown
c57a5e6e36
Add HostNetworkReceiveErrors and HostNetworkTransmitErrors rules
2020-04-30 12:38:23 +01:00
Samuel Berthe
951d80121f
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2020-04-06 09:13:29 +02:00
Samuel Berthe
e97023d2a4
linkerd2: adding first rule
2020-04-06 09:01:51 +02:00
Selçuk Arıbalı
c98a04784e
FIX KubernetesPodnothealthy Alert
...
Kube state metrics assigns value of current pod phase with 1, so according to that Kubernetes Pod not healthy fixed.
2020-04-02 21:01:04 +03:00
Samuel Berthe
c20227b458
oops: adding one-to-one vector matching to mysql subqueries
2020-03-31 16:02:28 +02:00
Matthias Crauwels
79b5ad3b5d
removed avg grouping where possible
2020-03-31 11:42:05 +02:00
Matthias Crauwels
4860250360
added some extra MySQL checks
2020-03-30 11:24:58 +02:00
Samuel Berthe
d9286f6c39
doc: add instructions to rules yaml file
2020-03-28 15:12:21 +01:00
Samuel Berthe
2cda73aa3a
fix(kubernetes): min_over_time takes a time range as paremeter
2020-03-26 16:19:26 +01:00
Samuel Berthe
329583ac36
Fix typo and make pg and mysql similar
2020-03-25 16:44:49 +01:00
luhellma
5559e0140b
fix: double usage in query and alert configuration
2020-03-25 16:34:04 +01:00
luhellma
5d8f911d97
feat: Add new rules for MySQLd_exporter from prometheus
2020-03-25 11:57:29 +01:00
luhellma
a4fc086b9a
fix wrong number of equal sign in query
2020-03-20 15:22:20 +01:00
luhellma
3d41e2b3ca
Add rules for apache
2020-03-20 15:08:13 +01:00
Alexander Knipping
caaea2eeb7
Fix typo in DeadManSwitch alert
...
Rename it from snitch into switch.
2020-03-18 15:21:38 +01:00
Samuel Berthe
34e62cb327
nginx: adding latency metric
2020-03-17 22:26:46 +01:00
Samuel Berthe
07dde61116
elasticsearch: adding disk watermark alerts
2020-03-17 21:19:58 +01:00
Samuel Berthe
2ecdb636b2
oops
2020-03-17 21:08:09 +01:00
Samuel Berthe
c653b37e15
adding rules to prometheus self monitoring
2020-03-17 20:56:49 +01:00
Samuel Berthe
fc3e72041c
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2020-03-17 19:05:57 +01:00
Samuel Berthe
5125c683c5
adding alerts for Ceph
2020-03-17 18:50:08 +01:00
Alexander Knipping
c82df5d005
Fix PrometheusRuleEvaluationSlow
...
Fixes the rule PrometheusRuleEvaluationSlow as it should fire if
prometheus_rule_group_last_duration_seconds takes longer than
prometheus_rule_group_interval_seconds.
prometheus_rule_group_last_duration_seconds: The duration of the last rule group evaluation.
prometheus_rule_group_interval_seconds: The interval of a rule group.
2020-03-17 15:14:40 +01:00
Samuel Berthe
5b457b0e52
adding github buttons to layout
2020-03-09 23:31:27 +01:00
Samuel Berthe
f554b72671
Add alert for kubernetes api latency
2020-03-09 21:55:17 +01:00
Samuel Berthe
0b89a764ee
Adding exporters: sidekiq, pgbouncer and thanos.
...
Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!
2020-03-09 21:18:56 +01:00
Samuel Berthe
affacde49b
adding prometheus internal alerts
2020-03-09 00:16:17 +01:00
Samuel Berthe
99e3e64252
Insert Commit Message Here
2020-03-08 22:21:30 +01:00
Samuel Berthe
77eccab0e9
some random changes on rules
2020-03-08 20:30:22 +01:00
Samuel Berthe
542adc3ca7
Adding minio rules
2020-03-08 18:55:53 +01:00
Samuel Berthe
b5469f2a59
Doc: organizing sections
2020-03-08 17:39:49 +01:00
Samuel Berthe
5bace11107
data: ensure alert name prefix
2020-03-08 17:24:39 +01:00
Samuel Berthe
953878df03
HAProxy 1.*: adding rules
2020-03-08 17:17:06 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy
2020-03-08 16:10:33 +01:00
Samuel Berthe
718a039313
Adding an alert for prometheus internals: rule evaluation slowing down
2020-03-08 15:08:11 +01:00
Samuel Berthe
072a435f32
Fixing @jpds queries ;) 🚀
2020-03-08 14:41:36 +01:00
Samuel Berthe
f620fe31ee
Merge pull request #36 from jpds/prom-errors
...
_data/rules.yml: Added Prometheus error alerts.
2020-03-08 14:29:18 +01:00
Samuel Berthe
6ba051d747
doc: adding a comment to PostgresqlReplicationLag alert
2020-03-07 19:30:58 +01:00
Samuel Berthe
05a2c9604b
Renaming some alert categories
2020-03-07 19:06:54 +01:00
Samuel Berthe
6edcdc75af
my brain is out for vacation, please forgive me
2020-03-07 18:57:09 +01:00
Samuel Berthe
b97ece8c69
Adding alerts for criteo/cassandra_exporter
2020-03-07 18:51:34 +01:00
Samuel Berthe
cde4e243ae
no quotes no cry
2020-03-07 17:59:42 +01:00
Samuel Berthe
0add8466c6
Merge pull request #82 from samber/feat-nodeexporter-raid
...
Added RAID alerts (node-exporter)
2020-03-07 17:51:39 +01:00
Samuel Berthe
ab477bb21e
Added RAID alerts
2020-03-07 17:50:41 +01:00
Danilo Magalhães
5bd2e03c51
Update rules.yml
...
Group by instance and name instead of only instance.
Change from container_spec_memory_limit_bytes to correct max memory metric container_spec_memory_limit_bytes.
2020-02-27 11:08:09 +00:00
Samuel Berthe
a9c9629cb5
oops
2020-01-25 00:16:49 +01:00
Samuel Berthe
134264026a
Does not alert on tmpfs volume filling-up. Closing #77
2020-01-25 00:13:01 +01:00
iamdenchik
29b66f9b3e
fix check free disk space
2020-01-15 12:40:19 +05:00
Mateusz Legięcki
a72feb4ff6
Fix Etcd rule: Insufficient Members
2020-01-03 12:58:25 +01:00
Mahesh Paolini-Subramanya
88b55f1dee
Replace 'ip' by 'instance' in some rules
...
The metrics return 'instance', not 'ip'
This PR fixes the rules to use 'instance'
2019-12-27 09:18:16 -05:00
Rob Brown
ce51db2a6f
Added Prometheus Not connected to alertmanager alert
2019-12-18 15:38:23 +00:00
Rob Brown
97ecdab26c
Added "Disk will fill in 4 hours" alert
2019-12-18 15:32:52 +00:00
Rob Brown
58f843dbc6
Added hardware temperature alerts
2019-12-12 17:29:23 +00:00
Josef Kříž
d10e30aed0
Fixed rabbitmq cluster down rule
2019-12-02 13:12:02 +01:00
Maxime Brunet
1e2a35e058
elasticsearch: Alert for no new docs on data nodes only
...
We can have nodes that are not masters, but don not hold any data. For example the client/coordinating nodes set up by the `stable/elasticsearch` helm chart:
https://github.com/helm/charts/tree/master/stable/elasticsearch#client-and-coordinating-nodes
And we can also have nodes being data and master nodes simultaneously.
So I think, this alert has to look for `es_data_node="true"` to be correct.
2019-11-06 15:23:26 -05:00
Samuel Berthe
9306d8947f
PG: Alert in case of high rollback ratio ( #64 )
...
PG: Alert in case of high rollback ratio
2019-10-31 12:02:03 +01:00
Samuel Berthe
0c9a24a4e7
feat(pg): alert in case of high rollback ratio
2019-10-31 12:00:53 +01:00
Samuel Berthe
cca2872ade
typo
2019-10-31 11:47:57 +01:00
Samuel Berthe
768fac56ae
Merge pull request #62 from jdorel/patch-1
...
SllCertificateExpired synthax
2019-10-29 12:15:15 +01:00
Samuel Berthe
20744c3d3d
Update rules.yml
2019-10-29 12:12:43 +01:00
Jonas DOREL
80aebe84e9
Add Kubernetes alerts from kube-state-metric exporter
2019-10-29 11:59:14 +01:00
Jonas DOREL
267a064d26
SllCertificateExpired synthax
...
Match other alert names, without the `has` part.
2019-10-29 11:39:01 +01:00
Samuel Berthe
82cf3ac1ef
adding cassandra
2019-10-26 17:48:22 +02:00
Samuel Berthe
4f9e88bad4
improving blackbox alerts
2019-10-26 17:43:18 +02:00
Samuel Berthe
dfa5446cd5
adding comments in data structure
2019-10-26 17:25:35 +02:00
Samuel Berthe
8f6c85774a
Clean data file
2019-09-25 16:36:10 +02:00
olivier beyler
e3628c5ba8
Add OpenEBS and Minio alert
...
Signed-off-by: olivier beyler <olivier.beyler@orange.com>
2019-09-25 16:13:44 +02:00
Samuel Berthe
1f4a1f8052
Updating Traefik -> Traefik v1.*
2019-09-25 14:23:16 +02:00
Andrey Dudin
6d9866cefb
Fix typo in query of PG DeadLocks
2019-09-25 02:42:44 +03:00
Samuel Berthe
f7f94ed81e
Fixed time interval (10min->10m)
2019-09-13 18:08:04 +02:00
timfeirg
37ef9a6f5c
free memory should include node_memory_Slab_bytes
2019-09-03 15:47:17 +08:00
Samuel Berthe
51e7231b3d
fix(blackbox exporter): alert when http >= 400 instead of 300
2019-08-29 19:03:54 +02:00
Jonas Kongslund
9bd8b3698f
Add CollectorError alert for WMI exporter
2019-08-22 13:52:15 +04:00
louis
e9f247783b
add alerts for traefik
2019-08-08 14:32:47 +02:00
Jonas Kongslund
d789cc314c
Add ProbeFailed alert for the Blackbox exporter
2019-07-25 13:01:47 +04:00
Dam Viet
e2c731229b
fix rule Container Volume usage
2019-07-17 16:59:56 +07:00
Dam Viet
6d6d6ac6a7
update
2019-07-15 15:13:23 +07:00
Dam Viet
db26f248f8
fix rule Container Volume usage
2019-07-15 14:56:52 +07:00
Dam Viet
4b7ecc82e2
suggest fix Container Memory usage
2019-07-15 14:54:13 +07:00
Samuel Berthe
a9019cb063
🤘 🎸
2019-07-14 20:00:55 +02:00
Samuel Berthe
3cdc7d625a
_data/rules.yml: Added CoreDNS panic alert. ( #35 )
...
_data/rules.yml: Added CoreDNS panic alert.
2019-07-14 18:06:21 +02:00
Samuel Berthe
089ab714c0
Update rules.yml
2019-07-14 18:06:08 +02:00
Samuel Berthe
e189294c94
_data/rules.yml: Added Kubernetes volume alert rule. ( #32 )
...
_data/rules.yml: Added Kubernetes volume alert rule.
2019-07-14 17:59:49 +02:00
Samuel Berthe
78dc1ba144
Update rules.yml
2019-07-14 17:59:39 +02:00
Samuel Berthe
3d6e520ac1
fix(node-exporter): better cpu load query
2019-07-14 17:51:21 +02:00
Samuel Berthe
ca22d8d3d9
Fixed windows disk usage computation
2019-07-14 17:31:52 +02:00
anon
70211339af
more alerts and removed IIS Process from wmi_service_status
2019-07-14 08:46:00 +02:00
anon
f033e06045
Name feedback from samber
2019-07-12 08:57:10 +02:00
anon
3b6235ccb3
add wmi_exporter example
2019-07-09 11:56:41 +02:00
Jonathan Davies
ddc19224be
_data/rules.yml: Added AlertManager config reload rule.
2019-06-25 16:06:55 +01:00
Jonathan Davies
2574946609
_data/rules.yml: Use humanize instead of % printf.
2019-06-25 15:54:47 +01:00
Jonathan Davies
c7ca57f57f
_data/rules.yml: Added volume full in four days alert rule.
2019-06-25 14:45:17 +01:00
Jonathan Davies
f7e8d60800
_data/rules.yml: Added Prometheus error alerts.
2019-06-25 13:08:32 +01:00
Jonathan Davies
37109f8ccd
_data/rules.yml: Added CoreDNS panic alert.
2019-06-24 22:25:40 +01:00
Jonathan Davies
3ccf6ae3d0
_data/rules.yml: Added Kubernetes volume alert rule.
2019-06-24 16:09:02 +01:00
Jonathan Davies
49d93c6f4f
_data/rules.yml: Added Prometheus configuration reload alert rule.
2019-06-24 14:31:09 +01:00
anon
bb5dba262f
correct wrong AND to OR
2019-06-17 14:25:43 +02:00
Jonas DOREL
e685a7ddef
Add systemd failed services alerts
2019-06-06 15:44:56 +02:00
Samuel Berthe
ab6612b94f
Improves Juniper rules
2019-05-21 11:59:08 +02:00
Samuel Berthe
e17edc9e99
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2019-05-21 11:52:40 +02:00
AngelFreak
51d0357e15
Changed from 09 to 10 for 10GBit, and fix severity duplicate
2019-05-20 09:37:01 +02:00
AngelFreak
0a2a4e2aaf
Remove redundant example, and changed notation for easier reading
2019-05-16 10:32:18 +02:00
AngelFreak
5e40343cbc
Add Juniper rules
2019-05-15 15:20:48 +02:00
Samuel Berthe
14c34eaf1a
Merge pull request #24 from mxssl/master
...
Add blackbox rules
2019-03-11 09:22:36 +01:00
mxssl
8de107aeee
Add blackbox rules
2019-03-03 00:05:47 +03:00
Samuel Berthe
78f26c73b0
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2019-02-20 13:28:15 +01:00
Samuel Berthe
273fd6b9e3
Adding Etcd metrics
2019-02-20 13:28:12 +01:00
Sofrony Pavel
63cf6bd5da
explicit for names
2019-02-16 08:18:38 +03:00
Sofrony Pavel
e26d73d615
consul alerts
2019-02-15 16:37:06 +03:00
Sofrony Pavel
8136b239be
add _bytes && _total for metrics
2019-02-14 22:52:41 +03:00
Sofrony Pavel
ff7ef5f6bd
node has swap alert
2019-02-14 22:37:22 +03:00
Sofrony Pavel
51eedcf616
fix memory metric name
2019-02-14 22:37:04 +03:00
Sofrony Pavel
d889a9594f
LA (2 task per core)
2019-02-14 22:36:35 +03:00
Samuel Berthe
df6432f61e
Update rules.yml
2019-02-11 21:26:41 +01:00
Samuel Berthe
61d889767e
Update rules.yml
2019-02-11 21:26:00 +01:00
Sofrony Pavel
0999af4aa8
consistent naming for severity
2019-02-11 16:58:15 +03:00
Sofrony Pavel
eab8b1a86d
Elasticsearch Heap Usage warning (>80%)
2019-02-11 16:50:26 +03:00
Sofrony Pavel
52ce326823
Elasticsearch alert rules
2019-02-11 15:46:46 +03:00
Marcela Sena
3aa92fbc9a
Fixing in-sync replica condition
...
If the in-sync replicas minimum set by topic is less than 3, an alert is needed.
2018-11-01 11:21:56 -07:00
Samuel BERTHE
23e5627567
Merge branch 'master' into kafka-insync
2018-10-31 22:09:19 +01:00
MarceStarlet
81409cd1c2
Adding in-sync replica by topic metric rule
2018-10-31 13:16:29 -07:00
Carol
7899b35aaf
kafka - metric consumer group
2018-10-31 12:40:08 -03:00
Samuel Berthe
0bc4a1633c
Jekyll based doc
2018-10-22 00:53:32 +02:00