Commit graph

176 commits

Author SHA1 Message Date
Samuel Berthe
5bace11107
data: ensure alert name prefix 2020-03-08 17:24:39 +01:00
Samuel Berthe
953878df03
HAProxy 1.*: adding rules 2020-03-08 17:17:06 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy 2020-03-08 16:10:33 +01:00
Samuel Berthe
c4d35090eb
Improves readme and contributing guidelines 2020-03-08 15:19:48 +01:00
Samuel Berthe
90a9a08b7c
Improves readme and contributing guidelines 2020-03-08 15:17:55 +01:00
Samuel Berthe
718a039313
Adding an alert for prometheus internals: rule evaluation slowing down 2020-03-08 15:08:11 +01:00
Samuel Berthe
072a435f32
Fixing @jpds queries ;) 🚀 2020-03-08 14:41:36 +01:00
Samuel Berthe
f620fe31ee
Merge pull request #36 from jpds/prom-errors
_data/rules.yml: Added Prometheus error alerts.
2020-03-08 14:29:18 +01:00
Samuel Berthe
de778a9aec
don't ask french people to write in english without error 2020-03-07 20:12:03 +01:00
Samuel Berthe
d19171f5c6
doc: adding disclamer about alert thresholds 2020-03-07 20:06:11 +01:00
Samuel Berthe
1a56c3032f
Merge pull request #84 from samber/doc-postgresql-replication-lag
Adding a comment to PostgresqlReplicationLag alert
2020-03-07 19:34:25 +01:00
Samuel Berthe
6ba051d747
doc: adding a comment to PostgresqlReplicationLag alert 2020-03-07 19:30:58 +01:00
Samuel Berthe
05a2c9604b
Renaming some alert categories 2020-03-07 19:06:54 +01:00
Samuel Berthe
6edcdc75af
my brain is out for vacation, please forgive me 2020-03-07 18:57:09 +01:00
Samuel Berthe
ab126b1de6
Merge pull request #83 from samber/feat-cassandra-criteo
Adding alerts for criteo/cassandra_exporter
2020-03-07 18:54:43 +01:00
Samuel Berthe
b97ece8c69
Adding alerts for criteo/cassandra_exporter 2020-03-07 18:51:34 +01:00
Samuel Berthe
75a17a79be
please contribute 🙏 2020-03-07 18:11:09 +01:00
Samuel Berthe
a2d92e25c5
please contribute 🙏 2020-03-07 18:09:41 +01:00
Samuel Berthe
cde4e243ae
no quotes no cry 2020-03-07 17:59:42 +01:00
Samuel Berthe
0add8466c6
Merge pull request #82 from samber/feat-nodeexporter-raid
Added RAID alerts (node-exporter)
2020-03-07 17:51:39 +01:00
Samuel Berthe
ab477bb21e
Added RAID alerts 2020-03-07 17:50:41 +01:00
Samuel Berthe
9cb6dc1bd0
build: upgrade dependencies 2020-03-07 17:30:52 +01:00
Samuel Berthe
e0b556a623
Merge pull request #80 from danilomagalhaes/patch-2
Update rules.yml
2020-03-07 17:05:08 +01:00
Danilo Magalhães
5bd2e03c51
Update rules.yml
Group by instance and name instead of only instance.  
Change from container_spec_memory_limit_bytes to correct max memory metric container_spec_memory_limit_bytes.
2020-02-27 11:08:09 +00:00
Samuel Berthe
a9c9629cb5 oops 2020-01-25 00:16:49 +01:00
Samuel Berthe
134264026a
Does not alert on tmpfs volume filling-up. Closing #77 2020-01-25 00:13:01 +01:00
Samuel Berthe
67b322ae5b
fix check free disk space (#75)
fix check free disk space
2020-01-15 14:28:23 +01:00
iamdenchik
29b66f9b3e fix check free disk space 2020-01-15 12:40:19 +05:00
Samuel Berthe
d699a0d924
oops 2020-01-14 17:18:03 +01:00
Samuel Berthe
b8685adee4
Update GA 2020-01-14 17:15:57 +01:00
Samuel Berthe
2ec17b215f
Merge pull request #73 from Behoston/patch-1
Fix Etcd rule: Insufficient Members
2020-01-03 15:50:30 +01:00
Mateusz Legięcki
a72feb4ff6
Fix Etcd rule: Insufficient Members 2020-01-03 12:58:25 +01:00
Samuel Berthe
97225efc72
Merge pull request #72 from dieswaytoofast/fix_instance_usage
Replace 'ip' by 'instance' in some rules
2019-12-27 15:53:00 +01:00
Mahesh Paolini-Subramanya
88b55f1dee Replace 'ip' by 'instance' in some rules
The metrics return 'instance', not 'ip'
This PR fixes the rules to use 'instance'
2019-12-27 09:18:16 -05:00
Samuel Berthe
580366554d
Merge pull request #71 from robert-will-brown/prometheus-alerts
Prometheus alerts
2019-12-19 20:39:06 +01:00
Rob Brown
ce51db2a6f Added Prometheus Not connected to alertmanager alert 2019-12-18 15:38:23 +00:00
Rob Brown
97ecdab26c Added "Disk will fill in 4 hours" alert 2019-12-18 15:32:52 +00:00
Samuel Berthe
6aeb60cb02
Merge pull request #69 from robert-will-brown/master
Added hardware temperature alerts
2019-12-13 16:31:24 +01:00
Rob Brown
58f843dbc6 Added hardware temperature alerts 2019-12-12 17:29:23 +00:00
Samuel Berthe
c2c9a58959
Merge pull request #68 from pepakriz/patch-1
Fixed `rabbitmq cluster down` rule
2019-12-02 19:29:31 +01:00
Josef Kříž
d10e30aed0
Fixed rabbitmq cluster down rule 2019-12-02 13:12:02 +01:00
Samuel Berthe
febd5f93e0
Merge pull request #67 from mattiasr/patch-1
Fixed typo in alertmanager.md
2019-11-19 21:38:24 +01:00
Mattias Ryrlén
6d33f32b43
Fixed typo in alertmanager.md
Assumed betch should be batch
2019-11-19 17:04:47 +01:00
Samuel Berthe
7eb68b1c4b
Merge pull request #65 from maxbrunet/patch-1
elasticsearch: Alert for no new docs on data nodes only
2019-11-07 15:20:37 +01:00
Maxime Brunet
1e2a35e058
elasticsearch: Alert for no new docs on data nodes only
We can have nodes that are not masters, but don not hold any data. For example the client/coordinating nodes set up by the `stable/elasticsearch` helm chart:
https://github.com/helm/charts/tree/master/stable/elasticsearch#client-and-coordinating-nodes

And we can also have nodes being data and master nodes simultaneously.
So I think, this alert has to look for `es_data_node="true"` to be correct.
2019-11-06 15:23:26 -05:00
Samuel Berthe
9306d8947f
PG: Alert in case of high rollback ratio (#64)
PG: Alert in case of high rollback ratio
2019-10-31 12:02:03 +01:00
Samuel Berthe
0c9a24a4e7 feat(pg): alert in case of high rollback ratio 2019-10-31 12:00:53 +01:00
Samuel Berthe
cca2872ade
typo 2019-10-31 11:47:57 +01:00
Samuel Berthe
768fac56ae
Merge pull request #62 from jdorel/patch-1
SllCertificateExpired synthax
2019-10-29 12:15:15 +01:00
Samuel Berthe
20744c3d3d
Update rules.yml 2019-10-29 12:12:43 +01:00