Commit graph

343 commits

Author SHA1 Message Date
Samuel Berthe
b5469f2a59
Doc: organizing sections 2020-03-08 17:39:49 +01:00
Samuel Berthe
5bace11107
data: ensure alert name prefix 2020-03-08 17:24:39 +01:00
Samuel Berthe
953878df03
HAProxy 1.*: adding rules 2020-03-08 17:17:06 +01:00
Samuel Berthe
7dbbbb0e09
Doc: organizing lb and reverse proxy 2020-03-08 16:10:33 +01:00
Samuel Berthe
718a039313
Adding an alert for prometheus internals: rule evaluation slowing down 2020-03-08 15:08:11 +01:00
Samuel Berthe
072a435f32
Fixing @jpds queries ;) 🚀 2020-03-08 14:41:36 +01:00
Samuel Berthe
f620fe31ee
Merge pull request #36 from jpds/prom-errors
_data/rules.yml: Added Prometheus error alerts.
2020-03-08 14:29:18 +01:00
Samuel Berthe
6ba051d747
doc: adding a comment to PostgresqlReplicationLag alert 2020-03-07 19:30:58 +01:00
Samuel Berthe
05a2c9604b
Renaming some alert categories 2020-03-07 19:06:54 +01:00
Samuel Berthe
6edcdc75af
my brain is out for vacation, please forgive me 2020-03-07 18:57:09 +01:00
Samuel Berthe
b97ece8c69
Adding alerts for criteo/cassandra_exporter 2020-03-07 18:51:34 +01:00
Samuel Berthe
cde4e243ae
no quotes no cry 2020-03-07 17:59:42 +01:00
Samuel Berthe
0add8466c6
Merge pull request #82 from samber/feat-nodeexporter-raid
Added RAID alerts (node-exporter)
2020-03-07 17:51:39 +01:00
Samuel Berthe
ab477bb21e
Added RAID alerts 2020-03-07 17:50:41 +01:00
Danilo Magalhães
5bd2e03c51
Update rules.yml
Group by instance and name instead of only instance.  
Change from container_spec_memory_limit_bytes to correct max memory metric container_spec_memory_limit_bytes.
2020-02-27 11:08:09 +00:00
Samuel Berthe
a9c9629cb5 oops 2020-01-25 00:16:49 +01:00
Samuel Berthe
134264026a
Does not alert on tmpfs volume filling-up. Closing #77 2020-01-25 00:13:01 +01:00
iamdenchik
29b66f9b3e fix check free disk space 2020-01-15 12:40:19 +05:00
Mateusz Legięcki
a72feb4ff6
Fix Etcd rule: Insufficient Members 2020-01-03 12:58:25 +01:00
Mahesh Paolini-Subramanya
88b55f1dee Replace 'ip' by 'instance' in some rules
The metrics return 'instance', not 'ip'
This PR fixes the rules to use 'instance'
2019-12-27 09:18:16 -05:00
Rob Brown
ce51db2a6f Added Prometheus Not connected to alertmanager alert 2019-12-18 15:38:23 +00:00
Rob Brown
97ecdab26c Added "Disk will fill in 4 hours" alert 2019-12-18 15:32:52 +00:00
Rob Brown
58f843dbc6 Added hardware temperature alerts 2019-12-12 17:29:23 +00:00
Josef Kříž
d10e30aed0
Fixed rabbitmq cluster down rule 2019-12-02 13:12:02 +01:00
Maxime Brunet
1e2a35e058
elasticsearch: Alert for no new docs on data nodes only
We can have nodes that are not masters, but don not hold any data. For example the client/coordinating nodes set up by the `stable/elasticsearch` helm chart:
https://github.com/helm/charts/tree/master/stable/elasticsearch#client-and-coordinating-nodes

And we can also have nodes being data and master nodes simultaneously.
So I think, this alert has to look for `es_data_node="true"` to be correct.
2019-11-06 15:23:26 -05:00
Samuel Berthe
9306d8947f
PG: Alert in case of high rollback ratio (#64)
PG: Alert in case of high rollback ratio
2019-10-31 12:02:03 +01:00
Samuel Berthe
0c9a24a4e7 feat(pg): alert in case of high rollback ratio 2019-10-31 12:00:53 +01:00
Samuel Berthe
cca2872ade
typo 2019-10-31 11:47:57 +01:00
Samuel Berthe
768fac56ae
Merge pull request #62 from jdorel/patch-1
SllCertificateExpired synthax
2019-10-29 12:15:15 +01:00
Samuel Berthe
20744c3d3d
Update rules.yml 2019-10-29 12:12:43 +01:00
Jonas DOREL
80aebe84e9 Add Kubernetes alerts from kube-state-metric exporter 2019-10-29 11:59:14 +01:00
Jonas DOREL
267a064d26
SllCertificateExpired synthax
Match other alert names, without the `has` part.
2019-10-29 11:39:01 +01:00
Samuel Berthe
82cf3ac1ef adding cassandra 2019-10-26 17:48:22 +02:00
Samuel Berthe
4f9e88bad4 improving blackbox alerts 2019-10-26 17:43:18 +02:00
Samuel Berthe
dfa5446cd5 adding comments in data structure 2019-10-26 17:25:35 +02:00
Samuel Berthe
8f6c85774a
Clean data file 2019-09-25 16:36:10 +02:00
olivier beyler
e3628c5ba8 Add OpenEBS and Minio alert
Signed-off-by: olivier beyler <olivier.beyler@orange.com>
2019-09-25 16:13:44 +02:00
Samuel Berthe
1f4a1f8052
Updating Traefik -> Traefik v1.* 2019-09-25 14:23:16 +02:00
Andrey Dudin
6d9866cefb
Fix typo in query of PG DeadLocks 2019-09-25 02:42:44 +03:00
Samuel Berthe
f7f94ed81e
Fixed time interval (10min->10m) 2019-09-13 18:08:04 +02:00
timfeirg
37ef9a6f5c
free memory should include node_memory_Slab_bytes 2019-09-03 15:47:17 +08:00
Samuel Berthe
51e7231b3d fix(blackbox exporter): alert when http >= 400 instead of 300 2019-08-29 19:03:54 +02:00
Jonas Kongslund
9bd8b3698f Add CollectorError alert for WMI exporter 2019-08-22 13:52:15 +04:00
louis
e9f247783b add alerts for traefik 2019-08-08 14:32:47 +02:00
Jonas Kongslund
d789cc314c Add ProbeFailed alert for the Blackbox exporter 2019-07-25 13:01:47 +04:00
Dam Viet
e2c731229b
fix rule Container Volume usage 2019-07-17 16:59:56 +07:00
Dam Viet
6d6d6ac6a7 update 2019-07-15 15:13:23 +07:00
Dam Viet
db26f248f8 fix rule Container Volume usage 2019-07-15 14:56:52 +07:00
Dam Viet
4b7ecc82e2 suggest fix Container Memory usage 2019-07-15 14:54:13 +07:00
Samuel Berthe
a9019cb063
🤘 🎸 2019-07-14 20:00:55 +02:00
Samuel Berthe
3cdc7d625a
_data/rules.yml: Added CoreDNS panic alert. (#35)
_data/rules.yml: Added CoreDNS panic alert.
2019-07-14 18:06:21 +02:00
Samuel Berthe
089ab714c0
Update rules.yml 2019-07-14 18:06:08 +02:00
Samuel Berthe
e189294c94
_data/rules.yml: Added Kubernetes volume alert rule. (#32)
_data/rules.yml: Added Kubernetes volume alert rule.
2019-07-14 17:59:49 +02:00
Samuel Berthe
78dc1ba144
Update rules.yml 2019-07-14 17:59:39 +02:00
Samuel Berthe
3d6e520ac1
fix(node-exporter): better cpu load query 2019-07-14 17:51:21 +02:00
Samuel Berthe
ca22d8d3d9
Fixed windows disk usage computation 2019-07-14 17:31:52 +02:00
anon
70211339af more alerts and removed IIS Process from wmi_service_status 2019-07-14 08:46:00 +02:00
anon
f033e06045 Name feedback from samber 2019-07-12 08:57:10 +02:00
anon
3b6235ccb3 add wmi_exporter example 2019-07-09 11:56:41 +02:00
Jonathan Davies
ddc19224be _data/rules.yml: Added AlertManager config reload rule. 2019-06-25 16:06:55 +01:00
Jonathan Davies
2574946609 _data/rules.yml: Use humanize instead of % printf. 2019-06-25 15:54:47 +01:00
Jonathan Davies
c7ca57f57f _data/rules.yml: Added volume full in four days alert rule. 2019-06-25 14:45:17 +01:00
Jonathan Davies
f7e8d60800 _data/rules.yml: Added Prometheus error alerts. 2019-06-25 13:08:32 +01:00
Jonathan Davies
37109f8ccd _data/rules.yml: Added CoreDNS panic alert. 2019-06-24 22:25:40 +01:00
Jonathan Davies
3ccf6ae3d0 _data/rules.yml: Added Kubernetes volume alert rule. 2019-06-24 16:09:02 +01:00
Jonathan Davies
49d93c6f4f _data/rules.yml: Added Prometheus configuration reload alert rule. 2019-06-24 14:31:09 +01:00
anon
bb5dba262f correct wrong AND to OR 2019-06-17 14:25:43 +02:00
Jonas DOREL
e685a7ddef Add systemd failed services alerts 2019-06-06 15:44:56 +02:00
Samuel Berthe
ab6612b94f Improves Juniper rules 2019-05-21 11:59:08 +02:00
Samuel Berthe
e17edc9e99 Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2019-05-21 11:52:40 +02:00
AngelFreak
51d0357e15 Changed from 09 to 10 for 10GBit, and fix severity duplicate 2019-05-20 09:37:01 +02:00
AngelFreak
0a2a4e2aaf Remove redundant example, and changed notation for easier reading 2019-05-16 10:32:18 +02:00
AngelFreak
5e40343cbc Add Juniper rules 2019-05-15 15:20:48 +02:00
Samuel Berthe
14c34eaf1a
Merge pull request #24 from mxssl/master
Add blackbox rules
2019-03-11 09:22:36 +01:00
mxssl
8de107aeee Add blackbox rules 2019-03-03 00:05:47 +03:00
Samuel Berthe
78f26c73b0 Merge branch 'master' of github.com:samber/awesome-prometheus-alerts 2019-02-20 13:28:15 +01:00
Samuel Berthe
273fd6b9e3 Adding Etcd metrics 2019-02-20 13:28:12 +01:00
Sofrony Pavel
63cf6bd5da
explicit for names 2019-02-16 08:18:38 +03:00
Sofrony Pavel
e26d73d615
consul alerts 2019-02-15 16:37:06 +03:00
Sofrony Pavel
8136b239be
add _bytes && _total for metrics 2019-02-14 22:52:41 +03:00
Sofrony Pavel
ff7ef5f6bd
node has swap alert 2019-02-14 22:37:22 +03:00
Sofrony Pavel
51eedcf616
fix memory metric name 2019-02-14 22:37:04 +03:00
Sofrony Pavel
d889a9594f
LA (2 task per core) 2019-02-14 22:36:35 +03:00
Samuel Berthe
df6432f61e
Update rules.yml 2019-02-11 21:26:41 +01:00
Samuel Berthe
61d889767e
Update rules.yml 2019-02-11 21:26:00 +01:00
Sofrony Pavel
0999af4aa8
consistent naming for severity 2019-02-11 16:58:15 +03:00
Sofrony Pavel
eab8b1a86d
Elasticsearch Heap Usage warning (>80%) 2019-02-11 16:50:26 +03:00
Sofrony Pavel
52ce326823
Elasticsearch alert rules 2019-02-11 15:46:46 +03:00
Marcela Sena
3aa92fbc9a
Fixing in-sync replica condition
If the in-sync replicas minimum set by topic is less than 3, an alert is needed.
2018-11-01 11:21:56 -07:00
Samuel BERTHE
23e5627567
Merge branch 'master' into kafka-insync 2018-10-31 22:09:19 +01:00
MarceStarlet
81409cd1c2 Adding in-sync replica by topic metric rule 2018-10-31 13:16:29 -07:00
Carol
7899b35aaf kafka - metric consumer group 2018-10-31 12:40:08 -03:00
Samuel Berthe
0bc4a1633c Jekyll based doc 2018-10-22 00:53:32 +02:00