Samuel Berthe
965fefab89
fix alert description
2020-12-28 16:40:11 +01:00
Carl Düvel
a7c5155002
Add cpu steal alert
2020-12-21 19:06:45 +01:00
Piotr Parczewski
f7d08e364b
Fix Elasticsearch "No new documents" alert.
...
Prometheus rate() function calculates the per-second average rate
of increase. This means the alert gets triggered whenever during
last 10 minutes there were less than 1 document ingested *per second*
(60 documents per minute).
Signed-off-by: Piotr Parczewski <piotr@stackhpc.com>
2020-12-17 15:00:01 +01:00
Per Lundberg
f673fe72c3
Update rules.yml
...
Fixes bug in previous commit. `or` has lower precedence than `<` in PromQL so hence the need for the grouping using parentheses.
2020-11-27 11:08:46 +02:00
Per Lundberg
00dd58eace
Fix Redis missing master query
...
The previous approach fails because of the "missing data" semantics in Prometheus. If the Redis server is down, PromQL will typically return "no data" instead of 0 for a `count()`; this is by design in Prometheus.
This suggestion as given by @slovdahl works around this by returning an vector with a single `0` entry in this case, making the query work as intended.
2020-11-25 16:06:05 +02:00
Samuel Berthe
2186841f29
Merge pull request #140 from yasharne/percona_mongodb
2020-11-15 18:12:20 +01:00
Vincent Fiset
6ed4358452
remove replset_oplog based alerts
2020-11-09 11:14:01 -05:00
Samuel Berthe
3ccfaa47ea
remove useless brackets
2020-11-07 18:08:02 +01:00
Samuel Berthe
9f144acb30
haproxy: fix description of request errors
2020-11-07 18:07:20 +01:00
Samuel Berthe
be20363602
rate is better than irate for alerting
2020-11-07 17:46:18 +01:00
Liudmyla Derkach
e6113ff2db
feat: adding few useful rabbitmq alerts
2020-10-30 19:10:52 +02:00
Yashar Nesabian
2a2ecf8a8c
change alert rules which were using avg to show more accurate value based on the replica set
2020-10-24 22:03:42 +03:30
Felix Breidenstein
1b6cd55200
Adapt rules for windows to new exporter
2020-10-20 14:52:36 +02:00
Nabil BENDAFI
e024c542ed
feat(kubernetes): add Out of capacity
2020-10-16 12:15:56 +02:00
Samuel Berthe
ead7db708e
alert on containers CPU: add a comment to exclude cAdvisor
2020-10-11 21:38:48 +02:00
Samuel Berthe
50b4c499fa
rules: adding a few cassandra alerts
2020-10-11 19:55:18 +02:00
Samuel Berthe
0cf82fd3e7
Merge branch 'master' into NetworkSpeed
2020-10-11 19:39:59 +02:00
Samuel Berthe
06205cd91c
Update rules.yml
2020-10-11 19:39:17 +02:00
Samuel Berthe
89252f999f
Merge branch 'master' into master
2020-10-11 19:26:04 +02:00
Samuel Berthe
66e6581b07
Merge pull request #121 from osterik/master
...
check free space for all mountpoints
2020-10-11 19:22:27 +02:00
Samuel Berthe
ea7e6d6aa9
Merge pull request #125 from mcanevet/patch-1
...
Fix HAProxy rules
2020-10-11 18:21:41 +02:00
Samuel Berthe
8616b0241c
Merge pull request #130 from nabilbendafi/feature/traefik_rules
2020-10-11 18:10:06 +02:00
Samuel Berthe
e8572f618b
Merge pull request #133 from tux-00/master
2020-10-11 18:07:11 +02:00
Samuel Berthe
2f6b9832fa
Update rules.yml
2020-10-11 18:06:06 +02:00
Samuel Berthe
8af9ca4ba8
Merge pull request #134 from nanorobocop/fix-prometheus-job-missing-alert
...
Fix PrometheusJobMissing alert
2020-10-11 17:48:42 +02:00
Samuel Berthe
2e6e46da45
Merge branch 'master' into master
2020-10-11 17:42:51 +02:00
Samuel Berthe
c469d26c4d
Merge pull request #137 from Ozarklake/sql_server_rules
2020-10-11 17:37:40 +02:00
Samuel Berthe
bafcd1e922
Update rules.yml
2020-10-11 17:35:46 +02:00
Samuel Berthe
e60fc805f6
Merge pull request #138 from nirav-chotai/nchotai/fix-hpa-alerts
...
[PLEASE_MERGE] Fix HPA alerts
2020-10-11 17:24:13 +02:00
Samuel Berthe
45103f0a0d
Merge branch 'master' into master
2020-10-11 17:10:20 +02:00
Samuel Berthe
7a609adf18
adding comment to container OOM killer warning
2020-10-11 16:11:44 +02:00
Samuel Berthe
cf70272309
fix(container memory limit): filter by containers having max memory setting
2020-10-11 16:08:54 +02:00
Samuel Berthe
4128004475
Merge pull request #119 from fernandocarletti/patch-1
...
fix: container ContainerMemoryUsage alert
2020-10-11 16:06:33 +02:00
Samuel Berthe
f67162bf57
Merge pull request #148 from fsschmitt/fix/disk-latency-unit
...
Fix time unit on disk read/write latency rule
2020-10-11 15:49:15 +02:00
fsschmitt
4266b4d326
Fix time unit on disk read/write latency rule
2020-10-06 14:36:22 +01:00
fsschmitt
5288c9a2f5
Fix node_md_disks state from fail to failed
2020-10-06 13:33:50 +01:00
Daniel Andrzejewski
fc4797db9e
small fix
2020-09-17 15:19:14 +02:00
Daniel Andrzejewski
6c5f708179
node_disk_write_time_seconds_total is in seconds, not in milliseconds. node_disk_write_time_seconds_total should be grater than 0, otherwise you get +Inf result.
2020-09-17 15:13:42 +02:00
Yashar Nesabian
d6b39a7f3f
More accurate alerts
...
added `mondodb instance down` alert and changed the `too many
connections` alert to fire when the connections are more than 80% of the
available connections.
removed `mongodb_replset_member_state` based alerts as I don't have
enough information on them
2020-08-09 10:35:39 +04:30
Yashar Nesabian
3ce1084f5b
Added percona mongodb alert rules
2020-08-03 10:45:32 +04:30
kaifen.xie
a04eef39c0
add istio
2020-07-25 23:24:36 +08:00
Nirav Chotai
8fb5da83de
Fix HPA alerts
...
- Fixing KubernetesHpaMetricAvailability
- Fixing KubernetesHpaScalingAbility
2020-07-24 13:32:44 +08:00
Ozarklake
88e812c78e
add sql server rules
2020-07-17 15:02:41 +08:00
Ozarklake
4e66d17d01
add sql server rules
2020-07-17 14:58:26 +08:00
Ozarklake
e009c5d8b5
Optimizing mysql slow query alert rules
2020-07-14 12:55:17 +08:00
Mansur Marvanov
05e521c0a8
Fix PrometheusJobMissing alert
2020-07-09 16:36:45 +09:00
tux
add6d9c2f3
Add official rabbitmq exporter rules
2020-06-30 15:48:42 +02:00
Nabil BENDAFI
b324c6f32f
feat(traefik): add rules for Traefik v2
...
Fixes #7
2020-06-23 13:40:01 +02:00
Mickaël Canévet
24f7095cd5
Fix HAProxy rules
2020-05-29 10:11:54 +02:00
Ilya Kisleyko
663b0e94da
check free space for all mountpoints
2020-05-20 20:04:32 +03:00
Anton Smolkov
bbbe14f2bd
Update rules.yml
...
WMI memory alert had opposite meaning, triggered on 90% free instead of 90% used
2020-05-19 11:07:11 +03:00
Fernando Carletti
e6de413146
fix: container ContainerMemoryUsage alert
2020-05-18 17:38:05 -05:00
Rob Brown
5050fd64d5
Correct "device" to "interface"
2020-05-14 16:57:19 +01:00
Samuel Berthe
da1e4f6301
💄 replacing "error" severity by "critical", repo wide
2020-05-14 17:20:19 +02:00
Rob Brown
5d3e812fd7
Add HostNetworkNot1GbSpeed rule
2020-05-14 15:00:24 +01:00
Samuel Berthe
7293bca720
Merge pull request #107 from robert-will-brown/NetworkTransmitErrors
2020-05-09 21:32:40 +02:00
Samuel Berthe
b081f28f5d
Merge pull request #112 from robert-will-brown/SpeedTestExporter
2020-05-09 21:31:33 +02:00
Samuel Berthe
660312d0ea
fix OOM killer threshold
2020-05-09 21:25:13 +02:00
Samuel Berthe
6d6b41e241
Merge pull request #108 from robert-will-brown/EdacMemoryErrors
2020-05-09 21:23:01 +02:00
Rob Brown
8faa295745
Add SpeedTest stanza
2020-05-09 10:20:55 +01:00
Rob Brown
ee4e046c66
Add "> 0" at the end of NetworkTransmitErrors queries
2020-05-09 10:18:21 +01:00
Samuel Berthe
d5f6388899
renaming some mysql alerts
2020-05-09 02:11:18 +02:00
Rob Brown
5d83e393cc
Add initial Speedtest Exporter rules
2020-05-08 15:25:54 +01:00
Rob Brown
8912db93bc
Fix "greater than" value
2020-05-04 19:04:52 +01:00
Rob Brown
4b22c078ea
Align EDAC errors with comments
2020-05-04 18:47:20 +01:00
Samuel Berthe
718cd2188c
shame on me
2020-05-04 00:10:43 +02:00
Samuel Berthe
eb8dc736a3
improve acuracy for context switching query
2020-05-04 00:05:33 +02:00
Samuel Berthe
790139211e
fix typo: postgresql replication lag
2020-05-03 23:23:21 +02:00
Samuel Berthe
648b83250a
improve accuracy "Kubernetes Pod not healthy" query
2020-05-03 18:01:25 +02:00
Ondrej Zalesky
d3d13946e6
fix "Kubernetes Pod not healthy" query
2020-04-30 22:53:25 +02:00
Rob Brown
981e82d649
Add HostEDACUncorrectableErrorsdetected and HostEDACCorrectableErrorsdetected rules
2020-04-30 13:27:30 +01:00
Rob Brown
f87e6d300d
Added spacing as per standard
2020-04-30 12:39:12 +01:00
Rob Brown
c57a5e6e36
Add HostNetworkReceiveErrors and HostNetworkTransmitErrors rules
2020-04-30 12:38:23 +01:00
Samuel Berthe
951d80121f
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2020-04-06 09:13:29 +02:00
Samuel Berthe
e97023d2a4
linkerd2: adding first rule
2020-04-06 09:01:51 +02:00
Selçuk Arıbalı
c98a04784e
FIX KubernetesPodnothealthy Alert
...
Kube state metrics assigns value of current pod phase with 1, so according to that Kubernetes Pod not healthy fixed.
2020-04-02 21:01:04 +03:00
Samuel Berthe
c20227b458
oops: adding one-to-one vector matching to mysql subqueries
2020-03-31 16:02:28 +02:00
Matthias Crauwels
79b5ad3b5d
removed avg grouping where possible
2020-03-31 11:42:05 +02:00
Matthias Crauwels
4860250360
added some extra MySQL checks
2020-03-30 11:24:58 +02:00
Samuel Berthe
d9286f6c39
doc: add instructions to rules yaml file
2020-03-28 15:12:21 +01:00
Samuel Berthe
2cda73aa3a
fix(kubernetes): min_over_time takes a time range as paremeter
2020-03-26 16:19:26 +01:00
Samuel Berthe
329583ac36
Fix typo and make pg and mysql similar
2020-03-25 16:44:49 +01:00
luhellma
5559e0140b
fix: double usage in query and alert configuration
2020-03-25 16:34:04 +01:00
luhellma
5d8f911d97
feat: Add new rules for MySQLd_exporter from prometheus
2020-03-25 11:57:29 +01:00
luhellma
a4fc086b9a
fix wrong number of equal sign in query
2020-03-20 15:22:20 +01:00
luhellma
3d41e2b3ca
Add rules for apache
2020-03-20 15:08:13 +01:00
Alexander Knipping
caaea2eeb7
Fix typo in DeadManSwitch alert
...
Rename it from snitch into switch.
2020-03-18 15:21:38 +01:00
Samuel Berthe
34e62cb327
nginx: adding latency metric
2020-03-17 22:26:46 +01:00
Samuel Berthe
07dde61116
elasticsearch: adding disk watermark alerts
2020-03-17 21:19:58 +01:00
Samuel Berthe
2ecdb636b2
oops
2020-03-17 21:08:09 +01:00
Samuel Berthe
c653b37e15
adding rules to prometheus self monitoring
2020-03-17 20:56:49 +01:00
Samuel Berthe
fc3e72041c
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2020-03-17 19:05:57 +01:00
Samuel Berthe
5125c683c5
adding alerts for Ceph
2020-03-17 18:50:08 +01:00
Alexander Knipping
c82df5d005
Fix PrometheusRuleEvaluationSlow
...
Fixes the rule PrometheusRuleEvaluationSlow as it should fire if
prometheus_rule_group_last_duration_seconds takes longer than
prometheus_rule_group_interval_seconds.
prometheus_rule_group_last_duration_seconds: The duration of the last rule group evaluation.
prometheus_rule_group_interval_seconds: The interval of a rule group.
2020-03-17 15:14:40 +01:00
Samuel Berthe
5b457b0e52
adding github buttons to layout
2020-03-09 23:31:27 +01:00
Samuel Berthe
f554b72671
Add alert for kubernetes api latency
2020-03-09 21:55:17 +01:00
Samuel Berthe
0b89a764ee
Adding exporters: sidekiq, pgbouncer and thanos.
...
Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!
2020-03-09 21:18:56 +01:00
Samuel Berthe
affacde49b
adding prometheus internal alerts
2020-03-09 00:16:17 +01:00
Samuel Berthe
99e3e64252
Insert Commit Message Here
2020-03-08 22:21:30 +01:00
Samuel Berthe
77eccab0e9
some random changes on rules
2020-03-08 20:30:22 +01:00
Samuel Berthe
542adc3ca7
Adding minio rules
2020-03-08 18:55:53 +01:00
Samuel Berthe
b5469f2a59
Doc: organizing sections
2020-03-08 17:39:49 +01:00
Samuel Berthe
5bace11107
data: ensure alert name prefix
2020-03-08 17:24:39 +01:00
Samuel Berthe
953878df03
HAProxy 1.*: adding rules
2020-03-08 17:17:06 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy
2020-03-08 16:10:33 +01:00
Samuel Berthe
718a039313
Adding an alert for prometheus internals: rule evaluation slowing down
2020-03-08 15:08:11 +01:00
Samuel Berthe
072a435f32
Fixing @jpds queries ;) 🚀
2020-03-08 14:41:36 +01:00
Samuel Berthe
f620fe31ee
Merge pull request #36 from jpds/prom-errors
...
_data/rules.yml: Added Prometheus error alerts.
2020-03-08 14:29:18 +01:00
Samuel Berthe
6ba051d747
doc: adding a comment to PostgresqlReplicationLag alert
2020-03-07 19:30:58 +01:00
Samuel Berthe
05a2c9604b
Renaming some alert categories
2020-03-07 19:06:54 +01:00
Samuel Berthe
6edcdc75af
my brain is out for vacation, please forgive me
2020-03-07 18:57:09 +01:00
Samuel Berthe
b97ece8c69
Adding alerts for criteo/cassandra_exporter
2020-03-07 18:51:34 +01:00
Samuel Berthe
cde4e243ae
no quotes no cry
2020-03-07 17:59:42 +01:00
Samuel Berthe
0add8466c6
Merge pull request #82 from samber/feat-nodeexporter-raid
...
Added RAID alerts (node-exporter)
2020-03-07 17:51:39 +01:00
Samuel Berthe
ab477bb21e
Added RAID alerts
2020-03-07 17:50:41 +01:00
Danilo Magalhães
5bd2e03c51
Update rules.yml
...
Group by instance and name instead of only instance.
Change from container_spec_memory_limit_bytes to correct max memory metric container_spec_memory_limit_bytes.
2020-02-27 11:08:09 +00:00
Samuel Berthe
a9c9629cb5
oops
2020-01-25 00:16:49 +01:00
Samuel Berthe
134264026a
Does not alert on tmpfs volume filling-up. Closing #77
2020-01-25 00:13:01 +01:00
iamdenchik
29b66f9b3e
fix check free disk space
2020-01-15 12:40:19 +05:00
Mateusz Legięcki
a72feb4ff6
Fix Etcd rule: Insufficient Members
2020-01-03 12:58:25 +01:00
Mahesh Paolini-Subramanya
88b55f1dee
Replace 'ip' by 'instance' in some rules
...
The metrics return 'instance', not 'ip'
This PR fixes the rules to use 'instance'
2019-12-27 09:18:16 -05:00
Rob Brown
ce51db2a6f
Added Prometheus Not connected to alertmanager alert
2019-12-18 15:38:23 +00:00
Rob Brown
97ecdab26c
Added "Disk will fill in 4 hours" alert
2019-12-18 15:32:52 +00:00
Rob Brown
58f843dbc6
Added hardware temperature alerts
2019-12-12 17:29:23 +00:00
Josef Kříž
d10e30aed0
Fixed rabbitmq cluster down rule
2019-12-02 13:12:02 +01:00
Maxime Brunet
1e2a35e058
elasticsearch: Alert for no new docs on data nodes only
...
We can have nodes that are not masters, but don not hold any data. For example the client/coordinating nodes set up by the `stable/elasticsearch` helm chart:
https://github.com/helm/charts/tree/master/stable/elasticsearch#client-and-coordinating-nodes
And we can also have nodes being data and master nodes simultaneously.
So I think, this alert has to look for `es_data_node="true"` to be correct.
2019-11-06 15:23:26 -05:00
Samuel Berthe
9306d8947f
PG: Alert in case of high rollback ratio ( #64 )
...
PG: Alert in case of high rollback ratio
2019-10-31 12:02:03 +01:00
Samuel Berthe
0c9a24a4e7
feat(pg): alert in case of high rollback ratio
2019-10-31 12:00:53 +01:00
Samuel Berthe
cca2872ade
typo
2019-10-31 11:47:57 +01:00
Samuel Berthe
768fac56ae
Merge pull request #62 from jdorel/patch-1
...
SllCertificateExpired synthax
2019-10-29 12:15:15 +01:00
Samuel Berthe
20744c3d3d
Update rules.yml
2019-10-29 12:12:43 +01:00
Jonas DOREL
80aebe84e9
Add Kubernetes alerts from kube-state-metric exporter
2019-10-29 11:59:14 +01:00
Jonas DOREL
267a064d26
SllCertificateExpired synthax
...
Match other alert names, without the `has` part.
2019-10-29 11:39:01 +01:00
Samuel Berthe
82cf3ac1ef
adding cassandra
2019-10-26 17:48:22 +02:00
Samuel Berthe
4f9e88bad4
improving blackbox alerts
2019-10-26 17:43:18 +02:00
Samuel Berthe
dfa5446cd5
adding comments in data structure
2019-10-26 17:25:35 +02:00
Samuel Berthe
8f6c85774a
Clean data file
2019-09-25 16:36:10 +02:00
olivier beyler
e3628c5ba8
Add OpenEBS and Minio alert
...
Signed-off-by: olivier beyler <olivier.beyler@orange.com>
2019-09-25 16:13:44 +02:00
Samuel Berthe
1f4a1f8052
Updating Traefik -> Traefik v1.*
2019-09-25 14:23:16 +02:00
Andrey Dudin
6d9866cefb
Fix typo in query of PG DeadLocks
2019-09-25 02:42:44 +03:00
Samuel Berthe
f7f94ed81e
Fixed time interval (10min->10m)
2019-09-13 18:08:04 +02:00
timfeirg
37ef9a6f5c
free memory should include node_memory_Slab_bytes
2019-09-03 15:47:17 +08:00
Samuel Berthe
51e7231b3d
fix(blackbox exporter): alert when http >= 400 instead of 300
2019-08-29 19:03:54 +02:00
Jonas Kongslund
9bd8b3698f
Add CollectorError alert for WMI exporter
2019-08-22 13:52:15 +04:00
louis
e9f247783b
add alerts for traefik
2019-08-08 14:32:47 +02:00
Jonas Kongslund
d789cc314c
Add ProbeFailed alert for the Blackbox exporter
2019-07-25 13:01:47 +04:00
Dam Viet
e2c731229b
fix rule Container Volume usage
2019-07-17 16:59:56 +07:00
Dam Viet
6d6d6ac6a7
update
2019-07-15 15:13:23 +07:00
Dam Viet
db26f248f8
fix rule Container Volume usage
2019-07-15 14:56:52 +07:00
Dam Viet
4b7ecc82e2
suggest fix Container Memory usage
2019-07-15 14:54:13 +07:00
Samuel Berthe
a9019cb063
🤘 🎸
2019-07-14 20:00:55 +02:00
Samuel Berthe
3cdc7d625a
_data/rules.yml: Added CoreDNS panic alert. ( #35 )
...
_data/rules.yml: Added CoreDNS panic alert.
2019-07-14 18:06:21 +02:00
Samuel Berthe
089ab714c0
Update rules.yml
2019-07-14 18:06:08 +02:00
Samuel Berthe
e189294c94
_data/rules.yml: Added Kubernetes volume alert rule. ( #32 )
...
_data/rules.yml: Added Kubernetes volume alert rule.
2019-07-14 17:59:49 +02:00
Samuel Berthe
78dc1ba144
Update rules.yml
2019-07-14 17:59:39 +02:00
Samuel Berthe
3d6e520ac1
fix(node-exporter): better cpu load query
2019-07-14 17:51:21 +02:00
Samuel Berthe
ca22d8d3d9
Fixed windows disk usage computation
2019-07-14 17:31:52 +02:00
anon
70211339af
more alerts and removed IIS Process from wmi_service_status
2019-07-14 08:46:00 +02:00
anon
f033e06045
Name feedback from samber
2019-07-12 08:57:10 +02:00
anon
3b6235ccb3
add wmi_exporter example
2019-07-09 11:56:41 +02:00
Jonathan Davies
ddc19224be
_data/rules.yml: Added AlertManager config reload rule.
2019-06-25 16:06:55 +01:00
Jonathan Davies
2574946609
_data/rules.yml: Use humanize instead of % printf.
2019-06-25 15:54:47 +01:00
Jonathan Davies
c7ca57f57f
_data/rules.yml: Added volume full in four days alert rule.
2019-06-25 14:45:17 +01:00
Jonathan Davies
f7e8d60800
_data/rules.yml: Added Prometheus error alerts.
2019-06-25 13:08:32 +01:00
Jonathan Davies
37109f8ccd
_data/rules.yml: Added CoreDNS panic alert.
2019-06-24 22:25:40 +01:00
Jonathan Davies
3ccf6ae3d0
_data/rules.yml: Added Kubernetes volume alert rule.
2019-06-24 16:09:02 +01:00
Jonathan Davies
49d93c6f4f
_data/rules.yml: Added Prometheus configuration reload alert rule.
2019-06-24 14:31:09 +01:00
anon
bb5dba262f
correct wrong AND to OR
2019-06-17 14:25:43 +02:00
Jonas DOREL
e685a7ddef
Add systemd failed services alerts
2019-06-06 15:44:56 +02:00
Samuel Berthe
ab6612b94f
Improves Juniper rules
2019-05-21 11:59:08 +02:00
Samuel Berthe
e17edc9e99
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2019-05-21 11:52:40 +02:00
AngelFreak
51d0357e15
Changed from 09 to 10 for 10GBit, and fix severity duplicate
2019-05-20 09:37:01 +02:00
AngelFreak
0a2a4e2aaf
Remove redundant example, and changed notation for easier reading
2019-05-16 10:32:18 +02:00
AngelFreak
5e40343cbc
Add Juniper rules
2019-05-15 15:20:48 +02:00
Samuel Berthe
14c34eaf1a
Merge pull request #24 from mxssl/master
...
Add blackbox rules
2019-03-11 09:22:36 +01:00
mxssl
8de107aeee
Add blackbox rules
2019-03-03 00:05:47 +03:00
Samuel Berthe
78f26c73b0
Merge branch 'master' of github.com:samber/awesome-prometheus-alerts
2019-02-20 13:28:15 +01:00
Samuel Berthe
273fd6b9e3
Adding Etcd metrics
2019-02-20 13:28:12 +01:00
Sofrony Pavel
63cf6bd5da
explicit for names
2019-02-16 08:18:38 +03:00
Sofrony Pavel
e26d73d615
consul alerts
2019-02-15 16:37:06 +03:00
Sofrony Pavel
8136b239be
add _bytes && _total for metrics
2019-02-14 22:52:41 +03:00
Sofrony Pavel
ff7ef5f6bd
node has swap alert
2019-02-14 22:37:22 +03:00
Sofrony Pavel
51eedcf616
fix memory metric name
2019-02-14 22:37:04 +03:00
Sofrony Pavel
d889a9594f
LA (2 task per core)
2019-02-14 22:36:35 +03:00
Samuel Berthe
df6432f61e
Update rules.yml
2019-02-11 21:26:41 +01:00
Samuel Berthe
61d889767e
Update rules.yml
2019-02-11 21:26:00 +01:00
Sofrony Pavel
0999af4aa8
consistent naming for severity
2019-02-11 16:58:15 +03:00
Sofrony Pavel
eab8b1a86d
Elasticsearch Heap Usage warning (>80%)
2019-02-11 16:50:26 +03:00
Sofrony Pavel
52ce326823
Elasticsearch alert rules
2019-02-11 15:46:46 +03:00
Marcela Sena
3aa92fbc9a
Fixing in-sync replica condition
...
If the in-sync replicas minimum set by topic is less than 3, an alert is needed.
2018-11-01 11:21:56 -07:00
Samuel BERTHE
23e5627567
Merge branch 'master' into kafka-insync
2018-10-31 22:09:19 +01:00
MarceStarlet
81409cd1c2
Adding in-sync replica by topic metric rule
2018-10-31 13:16:29 -07:00
Carol
7899b35aaf
kafka - metric consumer group
2018-10-31 12:40:08 -03:00
Samuel Berthe
0bc4a1633c
Jekyll based doc
2018-10-22 00:53:32 +02:00