Commit graph

398 commits

Author SHA1 Message Date
Samuel Berthe
fa4325218f
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2020-12-30 17:46:58 +01:00
Samuel Berthe
ed62bdc567
alerts node_exporter: improve network and disk rules 2020-12-30 17:45:30 +01:00
Tosin Ogunrinde
0add93363f Fix JVM "JVM memory filling up" alert 2020-12-30 00:30:08 +00:00
Samuel Berthe
f686698f68
Merge pull request #166 from cityofships/fix_es
Fix Elasticsearch "No new documents" alert
2020-12-28 16:50:47 +01:00
Samuel Berthe
965fefab89
fix alert description 2020-12-28 16:40:11 +01:00
Carl Düvel
a7c5155002
Add cpu steal alert 2020-12-21 19:06:45 +01:00
Piotr Parczewski
f7d08e364b
Fix Elasticsearch "No new documents" alert.
Prometheus rate() function calculates the per-second average rate
of increase. This means the alert gets triggered whenever during
last 10 minutes there were less than 1 document ingested *per second*
(60 documents per minute).

Signed-off-by: Piotr Parczewski <piotr@stackhpc.com>
2020-12-17 15:00:01 +01:00
Per Lundberg
f673fe72c3
Update rules.yml
Fixes bug in previous commit. `or` has lower precedence than `<` in PromQL so hence the need for the grouping using parentheses.
2020-11-27 11:08:46 +02:00
Per Lundberg
00dd58eace
Fix Redis missing master query
The previous approach fails because of the "missing data" semantics in Prometheus. If the Redis server is down, PromQL will typically return "no data" instead of 0 for a `count()`; this is by design in Prometheus.

This suggestion as given by @slovdahl works around this by returning an vector with a single `0` entry in this case, making the query work as intended.
2020-11-25 16:06:05 +02:00
Samuel Berthe
2186841f29
Merge pull request #140 from yasharne/percona_mongodb 2020-11-15 18:12:20 +01:00
Vincent Fiset
6ed4358452 remove replset_oplog based alerts 2020-11-09 11:14:01 -05:00
Samuel Berthe
3ccfaa47ea
remove useless brackets 2020-11-07 18:08:02 +01:00
Samuel Berthe
9f144acb30
haproxy: fix description of request errors 2020-11-07 18:07:20 +01:00
Samuel Berthe
be20363602
rate is better than irate for alerting 2020-11-07 17:46:18 +01:00
Liudmyla Derkach
e6113ff2db feat: adding few useful rabbitmq alerts 2020-10-30 19:10:52 +02:00
Yashar Nesabian
2a2ecf8a8c change alert rules which were using avg to show more accurate value based on the replica set 2020-10-24 22:03:42 +03:30
Felix Breidenstein
1b6cd55200 Adapt rules for windows to new exporter 2020-10-20 14:52:36 +02:00
Nabil BENDAFI
e024c542ed feat(kubernetes): add Out of capacity 2020-10-16 12:15:56 +02:00
Samuel Berthe
ead7db708e
alert on containers CPU: add a comment to exclude cAdvisor 2020-10-11 21:38:48 +02:00
Samuel Berthe
50b4c499fa
rules: adding a few cassandra alerts 2020-10-11 19:55:18 +02:00
Samuel Berthe
0cf82fd3e7
Merge branch 'master' into NetworkSpeed 2020-10-11 19:39:59 +02:00
Samuel Berthe
06205cd91c
Update rules.yml 2020-10-11 19:39:17 +02:00
Samuel Berthe
89252f999f
Merge branch 'master' into master 2020-10-11 19:26:04 +02:00
Samuel Berthe
66e6581b07
Merge pull request #121 from osterik/master
check free space for all mountpoints
2020-10-11 19:22:27 +02:00
Samuel Berthe
ea7e6d6aa9
Merge pull request #125 from mcanevet/patch-1
Fix HAProxy rules
2020-10-11 18:21:41 +02:00
Samuel Berthe
8616b0241c
Merge pull request #130 from nabilbendafi/feature/traefik_rules 2020-10-11 18:10:06 +02:00
Samuel Berthe
e8572f618b
Merge pull request #133 from tux-00/master 2020-10-11 18:07:11 +02:00
Samuel Berthe
2f6b9832fa
Update rules.yml 2020-10-11 18:06:06 +02:00
Samuel Berthe
8af9ca4ba8
Merge pull request #134 from nanorobocop/fix-prometheus-job-missing-alert
Fix PrometheusJobMissing alert
2020-10-11 17:48:42 +02:00
Samuel Berthe
2e6e46da45
Merge branch 'master' into master 2020-10-11 17:42:51 +02:00
Samuel Berthe
c469d26c4d
Merge pull request #137 from Ozarklake/sql_server_rules 2020-10-11 17:37:40 +02:00
Samuel Berthe
bafcd1e922
Update rules.yml 2020-10-11 17:35:46 +02:00
Samuel Berthe
e60fc805f6
Merge pull request #138 from nirav-chotai/nchotai/fix-hpa-alerts
[PLEASE_MERGE] Fix HPA alerts
2020-10-11 17:24:13 +02:00
Samuel Berthe
45103f0a0d
Merge branch 'master' into master 2020-10-11 17:10:20 +02:00
Samuel Berthe
7a609adf18
adding comment to container OOM killer warning 2020-10-11 16:11:44 +02:00
Samuel Berthe
cf70272309
fix(container memory limit): filter by containers having max memory setting 2020-10-11 16:08:54 +02:00
Samuel Berthe
4128004475
Merge pull request #119 from fernandocarletti/patch-1
fix: container ContainerMemoryUsage alert
2020-10-11 16:06:33 +02:00
Samuel Berthe
f67162bf57
Merge pull request #148 from fsschmitt/fix/disk-latency-unit
Fix time unit on disk read/write latency rule
2020-10-11 15:49:15 +02:00
fsschmitt
4266b4d326 Fix time unit on disk read/write latency rule 2020-10-06 14:36:22 +01:00
fsschmitt
5288c9a2f5 Fix node_md_disks state from fail to failed 2020-10-06 13:33:50 +01:00
Daniel Andrzejewski
fc4797db9e small fix 2020-09-17 15:19:14 +02:00
Daniel Andrzejewski
6c5f708179 node_disk_write_time_seconds_total is in seconds, not in milliseconds. node_disk_write_time_seconds_total should be grater than 0, otherwise you get +Inf result. 2020-09-17 15:13:42 +02:00
Yashar Nesabian
d6b39a7f3f More accurate alerts
added `mondodb instance down` alert and changed the `too many
connections` alert to fire when the connections are more than 80% of the
available connections.
removed `mongodb_replset_member_state` based alerts as I don't have
enough information on them
2020-08-09 10:35:39 +04:30
Yashar Nesabian
3ce1084f5b Added percona mongodb alert rules 2020-08-03 10:45:32 +04:30
kaifen.xie
a04eef39c0 add istio 2020-07-25 23:24:36 +08:00
Nirav Chotai
8fb5da83de
Fix HPA alerts
- Fixing KubernetesHpaMetricAvailability
- Fixing KubernetesHpaScalingAbility
2020-07-24 13:32:44 +08:00
Ozarklake
88e812c78e add sql server rules 2020-07-17 15:02:41 +08:00
Ozarklake
4e66d17d01 add sql server rules 2020-07-17 14:58:26 +08:00
Ozarklake
e009c5d8b5
Optimizing mysql slow query alert rules 2020-07-14 12:55:17 +08:00
Mansur Marvanov
05e521c0a8 Fix PrometheusJobMissing alert 2020-07-09 16:36:45 +09:00
tux
add6d9c2f3 Add official rabbitmq exporter rules 2020-06-30 15:48:42 +02:00
Nabil BENDAFI
b324c6f32f feat(traefik): add rules for Traefik v2
Fixes #7
2020-06-23 13:40:01 +02:00
Mickaël Canévet
24f7095cd5
Fix HAProxy rules 2020-05-29 10:11:54 +02:00
Ilya Kisleyko
663b0e94da
check free space for all mountpoints 2020-05-20 20:04:32 +03:00
Anton Smolkov
bbbe14f2bd
Update rules.yml
WMI memory alert had opposite meaning, triggered on 90% free instead of 90% used
2020-05-19 11:07:11 +03:00
Fernando Carletti
e6de413146
fix: container ContainerMemoryUsage alert 2020-05-18 17:38:05 -05:00
Rob Brown
5050fd64d5 Correct "device" to "interface" 2020-05-14 16:57:19 +01:00
Samuel Berthe
da1e4f6301
💄 replacing "error" severity by "critical", repo wide 2020-05-14 17:20:19 +02:00
Rob Brown
5d3e812fd7 Add HostNetworkNot1GbSpeed rule 2020-05-14 15:00:24 +01:00
Samuel Berthe
7293bca720
Merge pull request #107 from robert-will-brown/NetworkTransmitErrors 2020-05-09 21:32:40 +02:00
Samuel Berthe
b081f28f5d
Merge pull request #112 from robert-will-brown/SpeedTestExporter 2020-05-09 21:31:33 +02:00
Samuel Berthe
660312d0ea
fix OOM killer threshold 2020-05-09 21:25:13 +02:00
Samuel Berthe
6d6b41e241
Merge pull request #108 from robert-will-brown/EdacMemoryErrors 2020-05-09 21:23:01 +02:00
Rob Brown
8faa295745 Add SpeedTest stanza 2020-05-09 10:20:55 +01:00
Rob Brown
ee4e046c66 Add "> 0" at the end of NetworkTransmitErrors queries 2020-05-09 10:18:21 +01:00
Samuel Berthe
d5f6388899
renaming some mysql alerts 2020-05-09 02:11:18 +02:00
Rob Brown
5d83e393cc Add initial Speedtest Exporter rules 2020-05-08 15:25:54 +01:00
Rob Brown
8912db93bc Fix "greater than" value 2020-05-04 19:04:52 +01:00
Rob Brown
4b22c078ea Align EDAC errors with comments 2020-05-04 18:47:20 +01:00
Samuel Berthe
718cd2188c
shame on me 2020-05-04 00:10:43 +02:00
Samuel Berthe
eb8dc736a3
improve acuracy for context switching query 2020-05-04 00:05:33 +02:00
Samuel Berthe
790139211e
fix typo: postgresql replication lag 2020-05-03 23:23:21 +02:00
Samuel Berthe
648b83250a
improve accuracy "Kubernetes Pod not healthy" query 2020-05-03 18:01:25 +02:00
Ondrej Zalesky
d3d13946e6 fix "Kubernetes Pod not healthy" query 2020-04-30 22:53:25 +02:00
Rob Brown
981e82d649 Add HostEDACUncorrectableErrorsdetected and HostEDACCorrectableErrorsdetected rules 2020-04-30 13:27:30 +01:00
Rob Brown
f87e6d300d Added spacing as per standard 2020-04-30 12:39:12 +01:00
Rob Brown
c57a5e6e36 Add HostNetworkReceiveErrors and HostNetworkTransmitErrors rules 2020-04-30 12:38:23 +01:00
Samuel Berthe
951d80121f
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2020-04-06 09:13:29 +02:00
Samuel Berthe
e97023d2a4
linkerd2: adding first rule 2020-04-06 09:01:51 +02:00
Selçuk Arıbalı
c98a04784e
FIX KubernetesPodnothealthy Alert
Kube state metrics assigns value of current pod phase with 1, so according to that Kubernetes Pod not healthy fixed.
2020-04-02 21:01:04 +03:00
Samuel Berthe
c20227b458
oops: adding one-to-one vector matching to mysql subqueries 2020-03-31 16:02:28 +02:00
Matthias Crauwels
79b5ad3b5d
removed avg grouping where possible 2020-03-31 11:42:05 +02:00
Matthias Crauwels
4860250360
added some extra MySQL checks 2020-03-30 11:24:58 +02:00
Samuel Berthe
d9286f6c39
doc: add instructions to rules yaml file 2020-03-28 15:12:21 +01:00
Samuel Berthe
2cda73aa3a
fix(kubernetes): min_over_time takes a time range as paremeter 2020-03-26 16:19:26 +01:00
Samuel Berthe
329583ac36
Fix typo and make pg and mysql similar 2020-03-25 16:44:49 +01:00
luhellma
5559e0140b fix: double usage in query and alert configuration 2020-03-25 16:34:04 +01:00
luhellma
5d8f911d97 feat: Add new rules for MySQLd_exporter from prometheus 2020-03-25 11:57:29 +01:00
luhellma
a4fc086b9a fix wrong number of equal sign in query 2020-03-20 15:22:20 +01:00
luhellma
3d41e2b3ca Add rules for apache 2020-03-20 15:08:13 +01:00
Alexander Knipping
caaea2eeb7 Fix typo in DeadManSwitch alert
Rename it from snitch into switch.
2020-03-18 15:21:38 +01:00
Samuel Berthe
34e62cb327
nginx: adding latency metric 2020-03-17 22:26:46 +01:00
Samuel Berthe
07dde61116
elasticsearch: adding disk watermark alerts 2020-03-17 21:19:58 +01:00
Samuel Berthe
2ecdb636b2
oops 2020-03-17 21:08:09 +01:00
Samuel Berthe
c653b37e15
adding rules to prometheus self monitoring 2020-03-17 20:56:49 +01:00
Samuel Berthe
fc3e72041c
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2020-03-17 19:05:57 +01:00
Samuel Berthe
5125c683c5
adding alerts for Ceph 2020-03-17 18:50:08 +01:00
Alexander Knipping
c82df5d005 Fix PrometheusRuleEvaluationSlow
Fixes the rule PrometheusRuleEvaluationSlow as it should fire if
prometheus_rule_group_last_duration_seconds takes longer than
prometheus_rule_group_interval_seconds.

prometheus_rule_group_last_duration_seconds: The duration of the last rule group evaluation.
prometheus_rule_group_interval_seconds: The interval of a rule group.
2020-03-17 15:14:40 +01:00
Samuel Berthe
5b457b0e52
adding github buttons to layout 2020-03-09 23:31:27 +01:00
Samuel Berthe
f554b72671
Add alert for kubernetes api latency 2020-03-09 21:55:17 +01:00
Samuel Berthe
0b89a764ee
Adding exporters: sidekiq, pgbouncer and thanos.
Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!
2020-03-09 21:18:56 +01:00
Samuel Berthe
affacde49b
adding prometheus internal alerts 2020-03-09 00:16:17 +01:00
Samuel Berthe
99e3e64252
Insert Commit Message Here 2020-03-08 22:21:30 +01:00
Samuel Berthe
77eccab0e9
some random changes on rules 2020-03-08 20:30:22 +01:00
Samuel Berthe
542adc3ca7
Adding minio rules 2020-03-08 18:55:53 +01:00
Samuel Berthe
b5469f2a59
Doc: organizing sections 2020-03-08 17:39:49 +01:00
Samuel Berthe
5bace11107
data: ensure alert name prefix 2020-03-08 17:24:39 +01:00
Samuel Berthe
953878df03
HAProxy 1.*: adding rules 2020-03-08 17:17:06 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy 2020-03-08 16:10:33 +01:00
Samuel Berthe
718a039313
Adding an alert for prometheus internals: rule evaluation slowing down 2020-03-08 15:08:11 +01:00
Samuel Berthe
072a435f32
Fixing @jpds queries ;) 🚀 2020-03-08 14:41:36 +01:00
Samuel Berthe
f620fe31ee
Merge pull request #36 from jpds/prom-errors
_data/rules.yml: Added Prometheus error alerts.
2020-03-08 14:29:18 +01:00
Samuel Berthe
6ba051d747
doc: adding a comment to PostgresqlReplicationLag alert 2020-03-07 19:30:58 +01:00
Samuel Berthe
05a2c9604b
Renaming some alert categories 2020-03-07 19:06:54 +01:00
Samuel Berthe
6edcdc75af
my brain is out for vacation, please forgive me 2020-03-07 18:57:09 +01:00
Samuel Berthe
b97ece8c69
Adding alerts for criteo/cassandra_exporter 2020-03-07 18:51:34 +01:00
Samuel Berthe
cde4e243ae
no quotes no cry 2020-03-07 17:59:42 +01:00
Samuel Berthe
0add8466c6
Merge pull request #82 from samber/feat-nodeexporter-raid
Added RAID alerts (node-exporter)
2020-03-07 17:51:39 +01:00
Samuel Berthe
ab477bb21e
Added RAID alerts 2020-03-07 17:50:41 +01:00
Danilo Magalhães
5bd2e03c51
Update rules.yml
Group by instance and name instead of only instance.  
Change from container_spec_memory_limit_bytes to correct max memory metric container_spec_memory_limit_bytes.
2020-02-27 11:08:09 +00:00
Samuel Berthe
a9c9629cb5 oops 2020-01-25 00:16:49 +01:00
Samuel Berthe
134264026a
Does not alert on tmpfs volume filling-up. Closing #77 2020-01-25 00:13:01 +01:00
iamdenchik
29b66f9b3e fix check free disk space 2020-01-15 12:40:19 +05:00
Mateusz Legięcki
a72feb4ff6
Fix Etcd rule: Insufficient Members 2020-01-03 12:58:25 +01:00
Mahesh Paolini-Subramanya
88b55f1dee Replace 'ip' by 'instance' in some rules
The metrics return 'instance', not 'ip'
This PR fixes the rules to use 'instance'
2019-12-27 09:18:16 -05:00
Rob Brown
ce51db2a6f Added Prometheus Not connected to alertmanager alert 2019-12-18 15:38:23 +00:00
Rob Brown
97ecdab26c Added "Disk will fill in 4 hours" alert 2019-12-18 15:32:52 +00:00
Rob Brown
58f843dbc6 Added hardware temperature alerts 2019-12-12 17:29:23 +00:00
Josef Kříž
d10e30aed0
Fixed rabbitmq cluster down rule 2019-12-02 13:12:02 +01:00
Maxime Brunet
1e2a35e058
elasticsearch: Alert for no new docs on data nodes only
We can have nodes that are not masters, but don not hold any data. For example the client/coordinating nodes set up by the `stable/elasticsearch` helm chart:
https://github.com/helm/charts/tree/master/stable/elasticsearch#client-and-coordinating-nodes

And we can also have nodes being data and master nodes simultaneously.
So I think, this alert has to look for `es_data_node="true"` to be correct.
2019-11-06 15:23:26 -05:00
Samuel Berthe
9306d8947f
PG: Alert in case of high rollback ratio (#64)
PG: Alert in case of high rollback ratio
2019-10-31 12:02:03 +01:00
Samuel Berthe
0c9a24a4e7 feat(pg): alert in case of high rollback ratio 2019-10-31 12:00:53 +01:00
Samuel Berthe
cca2872ade
typo 2019-10-31 11:47:57 +01:00
Samuel Berthe
768fac56ae
Merge pull request #62 from jdorel/patch-1
SllCertificateExpired synthax
2019-10-29 12:15:15 +01:00
Samuel Berthe
20744c3d3d
Update rules.yml 2019-10-29 12:12:43 +01:00
Jonas DOREL
80aebe84e9 Add Kubernetes alerts from kube-state-metric exporter 2019-10-29 11:59:14 +01:00
Jonas DOREL
267a064d26
SllCertificateExpired synthax
Match other alert names, without the `has` part.
2019-10-29 11:39:01 +01:00
Samuel Berthe
82cf3ac1ef adding cassandra 2019-10-26 17:48:22 +02:00
Samuel Berthe
4f9e88bad4 improving blackbox alerts 2019-10-26 17:43:18 +02:00
Samuel Berthe
dfa5446cd5 adding comments in data structure 2019-10-26 17:25:35 +02:00
Samuel Berthe
8f6c85774a
Clean data file 2019-09-25 16:36:10 +02:00
olivier beyler
e3628c5ba8 Add OpenEBS and Minio alert
Signed-off-by: olivier beyler <olivier.beyler@orange.com>
2019-09-25 16:13:44 +02:00
Samuel Berthe
1f4a1f8052
Updating Traefik -> Traefik v1.* 2019-09-25 14:23:16 +02:00
Andrey Dudin
6d9866cefb
Fix typo in query of PG DeadLocks 2019-09-25 02:42:44 +03:00
Samuel Berthe
f7f94ed81e
Fixed time interval (10min->10m) 2019-09-13 18:08:04 +02:00
timfeirg
37ef9a6f5c
free memory should include node_memory_Slab_bytes 2019-09-03 15:47:17 +08:00
Samuel Berthe
51e7231b3d fix(blackbox exporter): alert when http >= 400 instead of 300 2019-08-29 19:03:54 +02:00
Jonas Kongslund
9bd8b3698f Add CollectorError alert for WMI exporter 2019-08-22 13:52:15 +04:00
louis
e9f247783b add alerts for traefik 2019-08-08 14:32:47 +02:00
Jonas Kongslund
d789cc314c Add ProbeFailed alert for the Blackbox exporter 2019-07-25 13:01:47 +04:00
Dam Viet
e2c731229b
fix rule Container Volume usage 2019-07-17 16:59:56 +07:00
Dam Viet
6d6d6ac6a7 update 2019-07-15 15:13:23 +07:00
Dam Viet
db26f248f8 fix rule Container Volume usage 2019-07-15 14:56:52 +07:00
Dam Viet
4b7ecc82e2 suggest fix Container Memory usage 2019-07-15 14:54:13 +07:00
Samuel Berthe
a9019cb063
🤘 🎸 2019-07-14 20:00:55 +02:00
Samuel Berthe
3cdc7d625a
_data/rules.yml: Added CoreDNS panic alert. (#35)
_data/rules.yml: Added CoreDNS panic alert.
2019-07-14 18:06:21 +02:00
Samuel Berthe
089ab714c0
Update rules.yml 2019-07-14 18:06:08 +02:00
Samuel Berthe
e189294c94
_data/rules.yml: Added Kubernetes volume alert rule. (#32)
_data/rules.yml: Added Kubernetes volume alert rule.
2019-07-14 17:59:49 +02:00
Samuel Berthe
78dc1ba144
Update rules.yml 2019-07-14 17:59:39 +02:00
Samuel Berthe
3d6e520ac1
fix(node-exporter): better cpu load query 2019-07-14 17:51:21 +02:00
Samuel Berthe
ca22d8d3d9
Fixed windows disk usage computation 2019-07-14 17:31:52 +02:00
anon
70211339af more alerts and removed IIS Process from wmi_service_status 2019-07-14 08:46:00 +02:00
anon
f033e06045 Name feedback from samber 2019-07-12 08:57:10 +02:00
anon
3b6235ccb3 add wmi_exporter example 2019-07-09 11:56:41 +02:00
Jonathan Davies
ddc19224be _data/rules.yml: Added AlertManager config reload rule. 2019-06-25 16:06:55 +01:00
Jonathan Davies
2574946609 _data/rules.yml: Use humanize instead of % printf. 2019-06-25 15:54:47 +01:00
Jonathan Davies
c7ca57f57f _data/rules.yml: Added volume full in four days alert rule. 2019-06-25 14:45:17 +01:00
Jonathan Davies
f7e8d60800 _data/rules.yml: Added Prometheus error alerts. 2019-06-25 13:08:32 +01:00
Jonathan Davies
37109f8ccd _data/rules.yml: Added CoreDNS panic alert. 2019-06-24 22:25:40 +01:00
Jonathan Davies
3ccf6ae3d0 _data/rules.yml: Added Kubernetes volume alert rule. 2019-06-24 16:09:02 +01:00
Jonathan Davies
49d93c6f4f _data/rules.yml: Added Prometheus configuration reload alert rule. 2019-06-24 14:31:09 +01:00
anon
bb5dba262f correct wrong AND to OR 2019-06-17 14:25:43 +02:00
Jonas DOREL
e685a7ddef Add systemd failed services alerts 2019-06-06 15:44:56 +02:00
Samuel Berthe
ab6612b94f Improves Juniper rules 2019-05-21 11:59:08 +02:00
Samuel Berthe
e17edc9e99 Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2019-05-21 11:52:40 +02:00
AngelFreak
51d0357e15 Changed from 09 to 10 for 10GBit, and fix severity duplicate 2019-05-20 09:37:01 +02:00
AngelFreak
0a2a4e2aaf Remove redundant example, and changed notation for easier reading 2019-05-16 10:32:18 +02:00
AngelFreak
5e40343cbc Add Juniper rules 2019-05-15 15:20:48 +02:00
Samuel Berthe
14c34eaf1a
Merge pull request #24 from mxssl/master
Add blackbox rules
2019-03-11 09:22:36 +01:00
mxssl
8de107aeee Add blackbox rules 2019-03-03 00:05:47 +03:00
Samuel Berthe
78f26c73b0 Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2019-02-20 13:28:15 +01:00
Samuel Berthe
273fd6b9e3 Adding Etcd metrics 2019-02-20 13:28:12 +01:00
Sofrony Pavel
63cf6bd5da
explicit for names 2019-02-16 08:18:38 +03:00
Sofrony Pavel
e26d73d615
consul alerts 2019-02-15 16:37:06 +03:00
Sofrony Pavel
8136b239be
add _bytes && _total for metrics 2019-02-14 22:52:41 +03:00
Sofrony Pavel
ff7ef5f6bd
node has swap alert 2019-02-14 22:37:22 +03:00
Sofrony Pavel
51eedcf616
fix memory metric name 2019-02-14 22:37:04 +03:00
Sofrony Pavel
d889a9594f
LA (2 task per core) 2019-02-14 22:36:35 +03:00
Samuel Berthe
df6432f61e
Update rules.yml 2019-02-11 21:26:41 +01:00
Samuel Berthe
61d889767e
Update rules.yml 2019-02-11 21:26:00 +01:00
Sofrony Pavel
0999af4aa8
consistent naming for severity 2019-02-11 16:58:15 +03:00
Sofrony Pavel
eab8b1a86d
Elasticsearch Heap Usage warning (>80%) 2019-02-11 16:50:26 +03:00
Sofrony Pavel
52ce326823
Elasticsearch alert rules 2019-02-11 15:46:46 +03:00
Marcela Sena
3aa92fbc9a
Fixing in-sync replica condition
If the in-sync replicas minimum set by topic is less than 3, an alert is needed.
2018-11-01 11:21:56 -07:00
Samuel BERTHE
23e5627567
Merge branch 'master' into kafka-insync 2018-10-31 22:09:19 +01:00
MarceStarlet
81409cd1c2 Adding in-sync replica by topic metric rule 2018-10-31 13:16:29 -07:00
Carol
7899b35aaf kafka - metric consumer group 2018-10-31 12:40:08 -03:00
Samuel Berthe
0bc4a1633c Jekyll based doc 2018-10-22 00:53:32 +02:00