Adding exporters: sidekiq, pgbouncer and thanos.

Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!
This commit is contained in:
Samuel Berthe 2020-03-09 21:13:55 +01:00
parent affacde49b
commit 0b89a764ee
No known key found for this signature in database
GPG key ID: 9D7813625412A946
4 changed files with 1221 additions and 924 deletions

View file

@ -14,39 +14,60 @@ Collection available here: **[https://awesome-prometheus-alerts.grep.to](https:/
## 🚨 Rules ## 🚨 Rules
- [Prometheus internals](https://awesome-prometheus-alerts.grep.to/rules#prometheus-internals) #### Basic resource monitoring
- [Prometheus self-monitoring](https://awesome-prometheus-alerts.grep.to/rules#prometheus-internals)
- [Host/Hardware](https://awesome-prometheus-alerts.grep.to/rules#host-and-hardware) - [Host/Hardware](https://awesome-prometheus-alerts.grep.to/rules#host-and-hardware)
- [Docker Containers](https://awesome-prometheus-alerts.grep.to/rules#docker-containers) - [Docker Containers](https://awesome-prometheus-alerts.grep.to/rules#docker-containers)
- [RabbitMQ](https://awesome-prometheus-alerts.grep.to/rules#rabbitmq) - [Blackbox](https://awesome-prometheus-alerts.grep.to/rules#blackbox)
- [Windows](https://awesome-prometheus-alerts.grep.to/rules#windows-server)
#### Databases and brokers
- [MySQL](https://awesome-prometheus-alerts.grep.to/rules#mysql) - [MySQL](https://awesome-prometheus-alerts.grep.to/rules#mysql)
- [PostgreSQL](https://awesome-prometheus-alerts.grep.to/rules#postgresql) - [PostgreSQL](https://awesome-prometheus-alerts.grep.to/rules#postgresql)
- [PGBouncer](https://awesome-prometheus-alerts.grep.to/rules#pgbouncer)
- [Redis](https://awesome-prometheus-alerts.grep.to/rules#redis) - [Redis](https://awesome-prometheus-alerts.grep.to/rules#redis)
- [MongoDB](https://awesome-prometheus-alerts.grep.to/rules#mongodb) - [MongoDB](https://awesome-prometheus-alerts.grep.to/rules#mongodb)
- [RabbitMQ](https://awesome-prometheus-alerts.grep.to/rules#rabbitmq)
- [Elasticsearch](https://awesome-prometheus-alerts.grep.to/rules#elasticsearch) - [Elasticsearch](https://awesome-prometheus-alerts.grep.to/rules#elasticsearch)
- [Cassandra](https://awesome-prometheus-alerts.grep.to/rules#cassandra) - [Cassandra](https://awesome-prometheus-alerts.grep.to/rules#cassandra)
- [Zookeeper](https://awesome-prometheus-alerts.grep.to/rules#zookeeper)
- [Kafka](https://awesome-prometheus-alerts.grep.to/rules#kafka)
#### Reverse proxies and load balancers
- [Nginx](https://awesome-prometheus-alerts.grep.to/rules#nginx) - [Nginx](https://awesome-prometheus-alerts.grep.to/rules#nginx)
- [Apache](https://awesome-prometheus-alerts.grep.to/rules#apache) - [Apache](https://awesome-prometheus-alerts.grep.to/rules#apache)
- [HaProxy](https://awesome-prometheus-alerts.grep.to/rules#haproxy) - [HaProxy](https://awesome-prometheus-alerts.grep.to/rules#haproxy)
- [Traefik](https://awesome-prometheus-alerts.grep.to/rules#traefik) - [Traefik](https://awesome-prometheus-alerts.grep.to/rules#traefik)
#### Runtimes
- [PHP-FPM](https://awesome-prometheus-alerts.grep.to/rules#php-fpm) - [PHP-FPM](https://awesome-prometheus-alerts.grep.to/rules#php-fpm)
- [JVM](https://awesome-prometheus-alerts.grep.to/rules#jvm) - [JVM](https://awesome-prometheus-alerts.grep.to/rules#jvm)
- [ZFS](https://awesome-prometheus-alerts.grep.to/rules#zfs) - [Sidekiq](https://awesome-prometheus-alerts.grep.to/rules#sidekiq)
#### Orchestrators
- [Kubernetes](https://awesome-prometheus-alerts.grep.to/rules#kubernetes) - [Kubernetes](https://awesome-prometheus-alerts.grep.to/rules#kubernetes)
- [Nomad](https://awesome-prometheus-alerts.grep.to/rules#nomad) - [Nomad](https://awesome-prometheus-alerts.grep.to/rules#nomad)
- [Consul](https://awesome-prometheus-alerts.grep.to/rules#consul) - [Consul](https://awesome-prometheus-alerts.grep.to/rules#consul)
- [Etcd](https://awesome-prometheus-alerts.grep.to/rules#etcd) - [Etcd](https://awesome-prometheus-alerts.grep.to/rules#etcd)
- [Zookeeper](https://awesome-prometheus-alerts.grep.to/rules#zookeeper)
- [Kafka](https://awesome-prometheus-alerts.grep.to/rules#kafka)
- [Linkerd](https://awesome-prometheus-alerts.grep.to/rules#linkerd) - [Linkerd](https://awesome-prometheus-alerts.grep.to/rules#linkerd)
- [Istio](https://awesome-prometheus-alerts.grep.to/rules#istio) - [Istio](https://awesome-prometheus-alerts.grep.to/rules#istio)
- [Blackbox](https://awesome-prometheus-alerts.grep.to/rules#blackbox)
- [Windows](https://awesome-prometheus-alerts.grep.to/rules#windows-server) #### Network and storage
- [Juniper](https://awesome-prometheus-alerts.grep.to/rules#juniper)
- [ZFS](https://awesome-prometheus-alerts.grep.to/rules#zfs)
- [OpenEBS](https://awesome-prometheus-alerts.grep.to/rules#openebs) - [OpenEBS](https://awesome-prometheus-alerts.grep.to/rules#openebs)
- [Minio](https://awesome-prometheus-alerts.grep.to/rules#minio) - [Minio](https://awesome-prometheus-alerts.grep.to/rules#minio)
- [Juniper](https://awesome-prometheus-alerts.grep.to/rules#juniper) - [Juniper](https://awesome-prometheus-alerts.grep.to/rules#juniper)
- [CoreDNS](https://awesome-prometheus-alerts.grep.to/rules#coredns) - [CoreDNS](https://awesome-prometheus-alerts.grep.to/rules#coredns)
#### Other
- [Thanos](https://awesome-prometheus-alerts.grep.to/rules#thanos)
## 🤝 Contributing ## 🤝 Contributing
Contributions from community (you!) are most welcome! Contributions from community (you!) are most welcome!
@ -66,6 +87,10 @@ Give a ⭐️ if this project helped you!
[![support us](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/samber) [![support us](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/samber)
## 👏 Thanks
Gratitude for the Gitlab operation team that provided 50+ rules. \o/
## 📝 License ## 📝 License
[![CC4](https://mirrors.creativecommons.org/presskit/cc.srr.primary.svg)](https://creativecommons.org/licenses/by/4.0/legalcode) [![CC4](https://mirrors.creativecommons.org/presskit/cc.srr.primary.svg)](https://creativecommons.org/licenses/by/4.0/legalcode)

View file

@ -1,19 +1,25 @@
groups:
- name: Basic resource monitoring
services: services:
- name: Prometheus internals - name: Prometheus self-monitoring
exporters: exporters:
- rules: - rules:
- name: Prometheus configuration reload failure - name: Prometheus configuration reload failure
description: Prometheus configuration reload error description: Prometheus configuration reload error
query: "prometheus_config_last_reload_successful != 1" query: 'prometheus_config_last_reload_successful != 1'
severity: warning severity: warning
- name: Prometheus too many restarts - name: Prometheus too many restarts
description: Prometheus has restarted more than twice in the last 15 minutes. It might be crashlooping. description: Prometheus has restarted more than twice in the last 15 minutes. It might be crashlooping.
query: "changes(process_start_time_seconds{job=~"prometheus|pushgateway|alertmanager"}[15m]) > 2" query: 'changes(process_start_time_seconds{job=~"prometheus|pushgateway|alertmanager"}[15m]) > 2'
severity: warning severity: warning
- name: Prometheus AlertManager configuration reload failure - name: Prometheus AlertManager configuration reload failure
description: AlertManager configuration reload error description: AlertManager configuration reload error
query: "alertmanager_config_last_reload_successful != 1" query: 'alertmanager_config_last_reload_successful != 1'
severity: warning severity: warning
- name: Prometheus AlertManager E2E dead man snitch
description: Prometheus DeadManSnitch is an always-firing alert. It's used as an end-to-end test of Prometheus through the Alertmanager.
query: 'vector(1)'
severity: error
- name: Prometheus not connected to alertmanager - name: Prometheus not connected to alertmanager
description: Prometheus cannot connect the alertmanager description: Prometheus cannot connect the alertmanager
query: "prometheus_notifications_alertmanagers_discovered < 1" query: "prometheus_notifications_alertmanagers_discovered < 1"
@ -36,14 +42,22 @@ services:
severity: warning severity: warning
- name: Prometheus notifications backlog - name: Prometheus notifications backlog
description: The Prometheus notification queue has not been empty for 10 minutes description: The Prometheus notification queue has not been empty for 10 minutes
query: 'min_over_time(prometheus_notifications_queue_length[10m])' query: 'min_over_time(prometheus_notifications_queue_length[10m]) > 0'
severity: warning severity: warning
- name: Prometheus AlertManager notification failing
description: Alertmanager is failing sending notifications
query: 'rate(alertmanager_notifications_failed_total[1m]) > 0'
severity: error
- name: Prometheus target empty
description: Prometheus has no target in service discovery
query: 'prometheus_sd_discovered_targets == 0'
severity: error
- name: Prometheus target scraping slow - name: Prometheus target scraping slow
description: Prometheus is scraping exporters slowly description: Prometheus is scraping exporters slowly
query: 'prometheus_target_interval_length_seconds{quantile="0.9"} > 60' query: 'prometheus_target_interval_length_seconds{quantile="0.9"} > 60'
severity: warning severity: warning
- name: Prometheus large scrape - name: Prometheus large scrape
description: Prometheus has many scapres that exceed the sample limit description: Prometheus has many scrapes that exceed the sample limit
query: 'increase(prometheus_target_scrapes_exceeded_sample_limit_total[10m]) > 10' query: 'increase(prometheus_target_scrapes_exceeded_sample_limit_total[10m]) > 10'
severity: warning severity: warning
- name: Prometheus TSDB checkpoint creation failures - name: Prometheus TSDB checkpoint creation failures
@ -160,10 +174,14 @@ services:
description: 'At least one device in RAID array on {{ $labels.instance }} failed. Array {{ $labels.md_device }} needs attention and possibly a disk swap' description: 'At least one device in RAID array on {{ $labels.instance }} failed. Array {{ $labels.md_device }} needs attention and possibly a disk swap'
query: 'node_md_disks{state="fail"} > 0' query: 'node_md_disks{state="fail"} > 0'
severity: warning severity: warning
- name: Kernel version deviations - name: Host kernel version deviations
description: Different kernel versions are running description: Different kernel versions are running
query: 'count(sum(label_replace(node_uname_info, "kernel", "$1", "release", "([0-9]+.[0-9]+.[0-9]+).*")) by (kernel)) > 1' query: 'count(sum(label_replace(node_uname_info, "kernel", "$1", "release", "([0-9]+.[0-9]+.[0-9]+).*")) by (kernel)) > 1'
severity: warning severity: warning
- name: Host OOM kill detected
description: OOM kill detected
query: 'increase(node_vmstat_oom_kill[30m]) > 1'
severity: warning
- name: Docker containers - name: Docker containers
exporters: exporters:
@ -190,6 +208,307 @@ services:
description: Container Volume IO usage is above 80% description: Container Volume IO usage is above 80%
query: "(sum(container_fs_io_current) BY (instance, name) * 100) > 80" query: "(sum(container_fs_io_current) BY (instance, name) * 100) > 80"
severity: warning severity: warning
- name: Container high throttle rate
description: Container is being throttled
query: 'rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1'
severity: warning
- name: Blackbox
exporters:
- name: prometheus/blackbox_exporter
doc_url: https://github.com/prometheus/blackbox_exporter
rules:
- name: Blackbox probe failed
description: Probe failed
query: probe_success == 0
severity: error
- name: Blackbox slow probe
description: Blackbox probe took more than 1s to complete
query: "avg_over_time(probe_duration_seconds[1m]) > 1"
severity: warning
- name: Blackbox probe HTTP failure
description: HTTP status code is not 200-399
query: "probe_http_status_code <= 199 OR probe_http_status_code >= 400"
severity: error
- name: Blackbox SSL certificate will expire soon
description: SSL certificate expires in 30 days
query: "probe_ssl_earliest_cert_expiry - time() < 86400 * 30"
severity: warning
- name: Blackbox SSL certificate will expire soon
description: SSL certificate expires in 3 days
query: "probe_ssl_earliest_cert_expiry - time() < 86400 * 3"
severity: error
- name: Blackbox SSL certificate expired
description: SSL certificate has expired already
query: "probe_ssl_earliest_cert_expiry - time() <= 0"
severity: error
- name: Blackbox probe slow HTTP
description: HTTP request took more than 1s
query: "avg_over_time(probe_http_duration_seconds[1m]) > 1"
severity: warning
- name: Blackbox probe slow ping
description: Blackbox ping took more than 1s
query: "avg_over_time(probe_icmp_duration_seconds[1m]) > 1"
severity: warning
- name: Windows Server
exporters:
- name: martinlindhe/wmi_exporter
doc_url: https://github.com/martinlindhe/wmi_exporter
rules:
- name: Windows Server collector Error
description: "Collector {{ $labels.collector }} was not successful"
query: "wmi_exporter_collector_success == 0"
severity: error
- name: Windows Server service Status
description: Windows Service state is not OK
query: 'wmi_service_status{status="ok"} != 1'
severity: error
- name: Windows Server CPU Usage
description: CPU Usage is more than 80%
query: '100 - (avg by (instance) (irate(wmi_cpu_time_total{mode="idle"}[2m])) * 100) > 80'
severity: warning
- name: Windows Server memory Usage
description: Memory Usage is more than 90%
query: "100*(wmi_os_physical_memory_free_bytes) / wmi_cs_physical_memory_bytes > 90"
severity: warning
- name: Windows Server disk Space Usage
description: Disk Space on Drive is used more than 80%
query: "100.0 - 100 * ((wmi_logical_disk_free_bytes{} / 1024 / 1024 ) / (wmi_logical_disk_size_bytes{} / 1024 / 1024)) > 80"
severity: error
- name: Databases and brokers
services:
- name: MySQL
exporters:
- name: prometheus/mysqld_exporter
doc_url: https://github.com/prometheus/mysqld_exporter
rules:
- name: PostgreSQL
exporters:
- name: wrouesnel/postgres_exporter
doc_url: https://github.com/wrouesnel/postgres_exporter/
rules:
- name: Postgresql down
description: Postgresql instance is down
query: "pg_up == 0"
severity: error
- name: Postgresql restarted
description: Postgresql restarted
query: "time() - pg_postmaster_start_time_seconds < 60"
severity: error
- name: Postgresql exporter error
description: Postgresql exporter is showing errors. A query may be buggy in query.yaml
query: 'pg_exporter_last_scrape_error > 0'
severity: warning
- name: Postgresql replication lag
description: PostgreSQL replication lag is going up (> 10s)
query: '(pg_replication_lag > 10 and ON(instance) (pg_replication_is_replica == 1)'
severity: warning
- name: Postgresql table not vaccumed
description: Table has not been vaccum for 24 hours
query: "time() - pg_stat_user_tables_last_autovacuum > 60 * 60 * 24"
severity: warning
- name: Postgresql table not analyzed
description: Table has not been analyzed for 24 hours
query: "time() - pg_stat_user_tables_last_autoanalyze > 60 * 60 * 24"
severity: warning
- name: Postgresql too many connections
description: PostgreSQL instance has too many connections
query: 'sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) > pg_settings_max_connections * 0.9'
severity: warning
- name: Postgresql not enough connections
description: PostgreSQL instance should have more connections (> 5)
query: 'sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) < 5'
severity: warning
- name: Postgresql dead locks
description: PostgreSQL has dead-locks
query: 'rate(pg_stat_database_deadlocks{datname!~"template.*|postgres"}[1m]) > 0'
severity: warning
- name: Postgresql slow queries
description: PostgreSQL executes slow queries (> 1min)
query: 'rate(pg_slow_queries[1m]) * 60 > 10'
severity: warning
- name: Postgresql high rollback rate
description: Ratio of transactions being aborted compared to committed is > 2 %
query: 'rate(pg_stat_database_xact_rollback{datname!~"template.*"}[3m]) / rate(pg_stat_database_xact_commit{datname!~"template.*"}[3m]) > 0.02'
severity: warning
- name: Postgresql commit rate low
description: Postgres seems to be processing very few transactions
query: 'rate(pg_stat_database_xact_commit[1m]) < 10'
severity: error
- name: Postgresql low XID consumption
description: Postgresql seems to be consuming transaction IDs very slowly
query: 'rate(pg_txid_current[1m]) < 5'
severity: warning
- name: Postgresqllow XLOG consumption
description: Postgres seems to be consuming XLOG very slowly
query: 'rate(pg_xlog_position_bytes[1m]) < 100'
severity: warning
- name: Postgresql WALE replication stopped
description: WAL-E replication seems to be stopped
query: 'rate(pg_xlog_position_bytes[1m]) == 0'
severity: error
- name: Postgresql high rate statement timeout
description: Postgres transactions showing high rate of statement timeouts
query: 'rate(postgresql_errors_total{type="statement_timeout"}[5m]) > 3'
severity: error
- name: Postgresql high rate deadlock
description: Postgres detected deadlocks
query: 'rate(postgresql_errors_total{type="deadlock_detected"}[1m]) * 60 > 1'
severity: error
- name: Postgresql replication lab bytes
description: Postgres Replication lag (in bytes) is high
query: '(pg_xlog_position_bytes and pg_replication_is_replica == 0) - GROUP_RIGHT(instance) (pg_xlog_position_bytes and pg_replication_is_replica == 1) > 1e+09'
severity: error
- name: Postgresql unused replication slot
description: Unused Replication Slots
query: 'pg_replication_slots_active == 0'
severity: warning
- name: Postgresql too many dead tuples
description: PostgreSQL dead tuples is too large
query: '((pg_stat_user_tables_n_dead_tup > 10000) / (pg_stat_user_tables_n_live_tup + pg_stat_user_tables_n_dead_tup)) >= 0.1 unless ON(instance) (pg_replication_is_replica == 1)'
severity: warning
- name: Postgresql split brain
description: Split Brain, too many primary Postgresql databases in read-write mode
query: 'count(pg_replication_is_replica == 0) != 1'
severity: error
- name: Postgresql promoted node
description: Postgresql standby server has been promoted as primary node
query: 'pg_replication_is_replica and changes(pg_replication_is_replica[1m]) > 0'
severity: warning
- name: Postgresql configuration changed
description: Postgres Database configuration change has occurred
query: '{__name__=~"pg_settings_.*"} != ON(__name__) {__name__=~"pg_settings_([^t]|t[^r]|tr[^a]|tra[^n]|tran[^s]|trans[^a]|transa[^c]|transac[^t]|transact[^i]|transacti[^o]|transactio[^n]|transaction[^_]|transaction_[^r]|transaction_r[^e]|transaction_re[^a]|transaction_rea[^d]|transaction_read[^_]|transaction_read_[^o]|transaction_read_o[^n]|transaction_read_on[^l]|transaction_read_onl[^y]).*"} OFFSET 5m'
severity: warning
- name: Postgresql SSL compression active
description: Database connections with SSL compression enabled. This may add significant jitter in replication delay. Replicas should turn off SSL compression via `sslcompression=0` in `recovery.conf`.
query: 'sum(pg_stat_ssl_compression) > 0'
severity: error
- name: Postgresql too many locks acquired
description: Too many locks acquired on the database. If this alert happens frequently, we may need to increase the postgres setting max_locks_per_transaction.
query: '((sum (pg_locks_count)) / (pg_settings_max_locks_per_transaction * pg_settings_max_connections)) > 0.20'
severity: error
- name: PGBouncer
exporters:
- name: spreaker/prometheus-pgbouncer-exporter
doc_url: https://github.com/spreaker/prometheus-pgbouncer-exporter
rules:
- name: PGBouncer active connectinos
description: PGBouncer pools are filling up
query: 'pgbouncer_pools_server_active_connections > 200'
severity: warning
- name: PGBouncer errors
description: PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.
query: 'increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10'
severity: warning
- name: PGBouncer max connections
description: The number of PGBouncer client connections has reached max_client_conn.
query: 'rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0'
severity: error
- name: Redis
exporters:
- name: oliver006/redis_exporter
doc_url: https://github.com/oliver006/redis_exporter
rules:
- name: Redis down
description: Redis instance is down
query: "redis_up == 0"
severity: error
- name: Redis missing master
description: Redis cluster has no node marked as master.
query: 'count(redis_instance_info{role="master"}) == 0'
severity: error
- name: Redis too many masters
description: Redis cluster has too many nodes marked as master.
query: 'count(redis_instance_info{role="master"}) > 1'
severity: error
- name: Redis disconnected slaves
description: Redis not replicating for all slaves. Consider reviewing the redis replication status.
query: 'count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1'
severity: error
- name: Redis replication broken
description: Redis instance lost a slave
query: "delta(redis_connected_slaves[1m]) < 0"
severity: error
- name: Redis cluster flapping
description: Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).
query: 'changes(redis_connected_slaves[5m]) > 2'
severity: error
- name: Redis missing backup
description: Redis has not been backuped for 24 hours
query: "time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24"
severity: error
- name: Redis out of memory
description: Redis is running out of memory (> 90%)
query: "redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90"
severity: warning
- name: Redis too many connections
description: Redis instance has too many connections
query: "redis_connected_clients > 100"
severity: warning
- name: Redis not enough connections
description: Redis instance should have more connections (> 5)
query: "redis_connected_clients < 5"
severity: warning
- name: Redis rejected connections
description: Some connections to Redis has been rejected
query: "increase(redis_rejected_connections_total[1m]) > 0"
severity: error
- name: MongoDB
exporters:
- name: dcu/mongodb_exporter
doc_url: https://github.com/dcu/mongodb_exporter
rules:
- name: MongoDB replication lag
description: Mongodb replication lag is more than 10s
query: 'avg(mongodb_replset_member_optime_date{state="PRIMARY"}) - avg(mongodb_replset_member_optime_date{state="SECONDARY"}) > 10'
severity: error
- name: MongoDB replication headroom
description: MongoDB replication headroom is <= 0
query: '(avg(mongodb_replset_oplog_tail_timestamp - mongodb_replset_oplog_head_timestamp) - (avg(mongodb_replset_member_optime_date{state="PRIMARY"}) - avg(mongodb_replset_member_optime_date{state="SECONDARY"}))) <= 0'
severity: error
- name: MongoDB replication Status 3
description: MongoDB Replication set member either perform startup self-checks, or transition from completing a rollback or resync
query: "mongodb_replset_member_state == 3"
severity: error
- name: MongoDB replication Status 6
description: MongoDB Replication set member as seen from another member of the set, is not yet known
query: "mongodb_replset_member_state == 6"
severity: error
- name: MongoDB replication Status 8
description: MongoDB Replication set member as seen from another member of the set, is unreachable
query: "mongodb_replset_member_state == 8"
severity: error
- name: MongoDB replication Status 9
description: MongoDB Replication set member is actively performing a rollback. Data is not available for reads
query: "mongodb_replset_member_state == 9"
severity: error
- name: MongoDB replication Status 10
description: MongoDB Replication set member was once in a replica set but was subsequently removed
query: "mongodb_replset_member_state == 10"
severity: error
- name: MongoDB number cursors open
description: Too many cursors opened by MongoDB for clients (> 10k)
query: 'mongodb_metrics_cursor_open{state="total_open"} > 10000'
severity: warning
- name: MongoDB cursors timeouts
description: Too many cursors are timing out
query: "increase(mongodb_metrics_cursor_timed_out_total[10m]) > 100"
severity: warning
- name: MongoDB too many connections
description: Too many connections
query: 'mongodb_connections{state="current"} > 500'
severity: warning
- name: MongoDB virtual memory usage
description: High memory usage
query: '(sum(mongodb_memory{type="virtual"}) BY (ip) / sum(mongodb_memory{type="mapped"}) BY (ip)) > 3'
severity: warning
- name: RabbitMQ - name: RabbitMQ
exporters: exporters:
@ -241,143 +560,6 @@ services:
query: 'rate(rabbitmq_exchange_messages_published_in_total{exchange="my-exchange"}[1m]) < 5' query: 'rate(rabbitmq_exchange_messages_published_in_total{exchange="my-exchange"}[1m]) < 5'
severity: warning severity: warning
- name: MySQL
exporters:
- name: prometheus/mysqld_exporter
doc_url: https://github.com/prometheus/mysqld_exporter
rules:
- name: PostgreSQL
exporters:
- name: wrouesnel/postgres_exporter
doc_url: https://github.com/wrouesnel/postgres_exporter/
rules:
- name: Postgresql down
description: PostgreSQL instance is down
query: "pg_up == 0"
severity: error
- name: Postgresql replication lag
description: PostgreSQL replication lag is going up (> 10s)
query: "pg_replication_lag > 10"
severity: warning
comments: |
A label excluding master nodes should be added to this query,
in order to monitor lag on standby servers only.
Exporter does not guarantee a NaN value for pg_replication_log on promoted master nodes.
See https://github.com/samber/awesome-prometheus-alerts/issues/74
- name: Postgresql table not vaccumed
description: Table has not been vaccum for 24 hours
query: "time() - pg_stat_user_tables_last_autovacuum > 60 * 60 * 24"
severity: warning
- name: Postgresql table not analyzed
description: Table has not been analyzed for 24 hours
query: "time() - pg_stat_user_tables_last_autoanalyze > 60 * 60 * 24"
severity: warning
- name: Postgresql too many connections
description: PostgreSQL instance has too many connections
query: 'sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) > 100'
severity: warning
- name: Postgresql not enough connections
description: PostgreSQL instance should have more connections (> 5)
query: 'sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) < 5'
severity: warning
- name: Postgresql dead locks
description: PostgreSQL has dead-locks
query: 'rate(pg_stat_database_deadlocks{datname!~"template.*|postgres"}[1m]) > 0'
severity: warning
- name: Postgresql slow queries
description: PostgreSQL executes slow queries (> 1min)
query: 'avg(rate(pg_stat_activity_max_tx_duration{datname!~"template.*"}[1m])) BY (datname) > 60'
severity: warning
- name: Postgresql high rollback rate
description: Ratio of transactions being aborted compared to committed is > 2 %
query: 'rate(pg_stat_database_xact_rollback{datname!~"template.*"}[3m]) / rate(pg_stat_database_xact_commit{datname!~"template.*"}[3m]) > 0.02'
severity: warning
- name: Redis
exporters:
- name: oliver006/redis_exporter
doc_url: https://github.com/oliver006/redis_exporter
rules:
- name: Redis down
description: Redis instance is down
query: "redis_up == 0"
severity: error
- name: Redis missing backup
description: Redis has not been backuped for 24 hours
query: "time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24"
severity: error
- name: Redis out of memory
description: Redis is running out of memory (> 90%)
query: "redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90"
severity: warning
- name: Redis replication broken
description: Redis instance lost a slave
query: "delta(redis_connected_slaves[1m]) < 0"
severity: error
- name: Redis too many connections
description: Redis instance has too many connections
query: "redis_connected_clients > 100"
severity: warning
- name: Redis not enough connections
description: Redis instance should have more connections (> 5)
query: "redis_connected_clients < 5"
severity: warning
- name: Redis rejected connections
description: Some connections to Redis has been rejected
query: "increase(redis_rejected_connections_total[1m]) > 0"
severity: error
- name: MongoDB
exporters:
- name: dcu/mongodb_exporter
doc_url: https://github.com/percona/mongodb_exporter
rules:
- name: MongoDB replication lag
description: Mongodb replication lag is more than 10s
query: 'avg(mongodb_replset_member_optime_date{state="PRIMARY"}) - avg(mongodb_replset_member_optime_date{state="SECONDARY"}) > 10'
severity: error
- name: MongoDB replication headroom
description: MongoDB replication headroom is <= 0
query: '(avg(mongodb_replset_oplog_tail_timestamp - mongodb_replset_oplog_head_timestamp) - (avg(mongodb_replset_member_optime_date{state="PRIMARY"}) - avg(mongodb_replset_member_optime_date{state="SECONDARY"}))) <= 0'
severity: error
- name: MongoDB replication Status 3
description: MongoDB Replication set member either perform startup self-checks, or transition from completing a rollback or resync
query: "mongodb_replset_member_state == 3"
severity: error
- name: MongoDB replication Status 6
description: MongoDB Replication set member as seen from another member of the set, is not yet known
query: "mongodb_replset_member_state == 6"
severity: error
- name: MongoDB replication Status 8
description: MongoDB Replication set member as seen from another member of the set, is unreachable
query: "mongodb_replset_member_state == 8"
severity: error
- name: MongoDB replication Status 9
description: MongoDB Replication set member is actively performing a rollback. Data is not available for reads
query: "mongodb_replset_member_state == 9"
severity: error
- name: MongoDB replication Status 10
description: MongoDB Replication set member was once in a replica set but was subsequently removed
query: "mongodb_replset_member_state == 10"
severity: error
- name: MongoDB number cursors open
description: Too many cursors opened by MongoDB for clients (> 10k)
query: 'mongodb_metrics_cursor_open{state="total_open"} > 10000'
severity: warning
- name: MongoDB cursors timeouts
description: Too many cursors are timing out
query: "increase(mongodb_metrics_cursor_timed_out_total[10m]) > 100"
severity: warning
- name: MongoDB too many connections
description: Too many connections
query: 'mongodb_connections{state="current"} > 500'
severity: warning
- name: MongoDB virtual memory usage
description: High memory usage
query: '(sum(mongodb_memory{type="virtual"}) BY (ip) / sum(mongodb_memory{type="mapped"}) BY (ip)) > 3'
severity: warning
- name: Elasticsearch - name: Elasticsearch
exporters: exporters:
- name: justwatchcom/elasticsearch_exporter - name: justwatchcom/elasticsearch_exporter
@ -486,6 +668,29 @@ services:
query: 'changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1' query: 'changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1'
severity: error severity: error
- name: Zookeeper
exporters:
- name: cloudflare/kafka_zookeeper_exporter
doc_url: https://github.com/cloudflare/kafka_zookeeper_exporter
rules:
- name: Kafka
exporters:
- name: danielqsj/kafka_exporter
doc_url: https://github.com/danielqsj/kafka_exporter
rules:
- name: Kafka topics replicas
description: Kafka topic in-sync partition
query: "sum(kafka_topic_partition_in_sync_replica) by (topic) < 3"
severity: error
- name: Kafka consumers group
description: Kafka consumers group
query: "sum(kafka_consumergroup_lag) by (consumergroup) > 50"
severity: error
- name: Reverse proxies and load balancers
services:
- name: Nginx - name: Nginx
exporters: exporters:
- name: nginx-lua-prometheus - name: nginx-lua-prometheus
@ -597,6 +802,9 @@ services:
query: 'sum(rate(traefik_backend_requests_total{code=~"5.*"}[3m])) by (backend) / sum(rate(traefik_backend_requests_total[3m])) by (backend) * 100 > 5' query: 'sum(rate(traefik_backend_requests_total{code=~"5.*"}[3m])) by (backend) / sum(rate(traefik_backend_requests_total[3m])) by (backend) * 100 > 5'
severity: error severity: error
- name: Runtimes
services:
- name: PHP-FPM - name: PHP-FPM
exporters: exporters:
- name: bakins/php-fpm-exporter - name: bakins/php-fpm-exporter
@ -613,26 +821,41 @@ services:
query: 'jvm_memory_bytes_used / jvm_memory_bytes_max{area="heap"} > 0.8' query: 'jvm_memory_bytes_used / jvm_memory_bytes_max{area="heap"} > 0.8'
severity: warning severity: warning
- name: ZFS - name: Sidekiq
exporters: exporters:
- name: node-exporter - name: Strech/sidekiq-prometheus-exporter
doc_url: https://github.com/prometheus/node_exporter doc_url: https://github.com/Strech/sidekiq-prometheus-exporter
rules: rules:
- name: Sidekiq queue size
description: Sidekiq queue {{ $labels.name }} is growing
query: 'sidekiq_queue_size{} > 100'
severity: warning
- name: Sidekiq scheduling latency too high
description: Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.
query: 'max(sidekiq_queue_latency) > 120'
severity: error
- name: Orchestrators
services:
- name: Kubernetes - name: Kubernetes
exporters: exporters:
- name: kube-state-metrics - name: kube-state-metrics
doc_url: https://github.com/kubernetes/kube-state-metrics/tree/master/docs doc_url: https://github.com/kubernetes/kube-state-metrics/tree/master/docs
rules: rules:
- name: Kubernetes MemoryPressure - name: Kubernetes Node ready
description: Node {{ $labels.node }} has been unready for a long time
query: 'kube_node_status_condition{condition="Ready",status="true"} == 0'
severity: error
- name: Kubernetes memory pressure
description: "{{ $labels.node }} has MemoryPressure condition" description: "{{ $labels.node }} has MemoryPressure condition"
query: 'kube_node_status_condition{condition="MemoryPressure",status="true"} == 1' query: 'kube_node_status_condition{condition="MemoryPressure",status="true"} == 1'
severity: error severity: error
- name: Kubernetes DiskPressure - name: Kubernetes disk pressure
description: "{{ $labels.node }} has DiskPressure condition" description: "{{ $labels.node }} has DiskPressure condition"
query: 'kube_node_status_condition{condition="DiskPressure",status="true"} == 1' query: 'kube_node_status_condition{condition="DiskPressure",status="true"} == 1'
severity: error severity: error
- name: Kubernetes OutOfDisk - name: Kubernetes out of disk
description: "{{ $labels.node }} has OutOfDisk condition" description: "{{ $labels.node }} has OutOfDisk condition"
query: 'kube_node_status_condition{condition="OutOfDisk",status="true"} == 1' query: 'kube_node_status_condition{condition="OutOfDisk",status="true"} == 1'
severity: error severity: error
@ -643,7 +866,7 @@ services:
- name: Kubernetes CronJob suspended - name: Kubernetes CronJob suspended
description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended" description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended"
query: "kube_cronjob_spec_suspend != 0" query: "kube_cronjob_spec_suspend != 0"
severity: info severity: warning
- name: Kubernetes PersistentVolumeClaim pending - name: Kubernetes PersistentVolumeClaim pending
description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pending" description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pending"
query: 'kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1' query: 'kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1'
@ -654,12 +877,92 @@ services:
severity: warning severity: warning
- name: Kubernetes Volume full in four days - name: Kubernetes Volume full in four days
description: "{{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is expected to fill up within four days. Currently {{ $value | humanize }}% is available." description: "{{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is expected to fill up within four days. Currently {{ $value | humanize }}% is available."
query: "100 * (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) < 15 and predict_linear(kubelet_volume_stats_available_bytes[6h], 4 * 24 * 3600) < 0" query: 'predict_linear(kubelet_volume_stats_available_bytes[6h], 4 * 24 * 3600) < 0'
severity: error
- name: Kubernetes PersistentVolume error
description: "Persistent volume is in bad state"
query: 'kube_persistentvolume_status_phase{phase=~"Failed|Pending",job="kube-state-metrics"} > 0'
severity: error severity: error
- name: Kubernetes StatefulSet down - name: Kubernetes StatefulSet down
description: A StatefulSet went down description: A StatefulSet went down
query: "(kube_statefulset_status_replicas_ready / kube_statefulset_status_replicas_current) != 1" query: "(kube_statefulset_status_replicas_ready / kube_statefulset_status_replicas_current) != 1"
severity: error severity: error
- name: Kubernetes HPA scaling ability
description: Pod is unable to scale
query: 'kube_hpa_status_condition{condition="false", status="AbleToScale"} == 1'
severity: warning
- name: Kubernetes HPA metric availability
description: HPA is not able to colelct metrics
query: 'kube_hpa_status_condition{condition="false", status="ScalingActive"} == 1'
severity: warning
- name: Kubernetes HPA scale capability
description: The maximum number of desired Pods has been hit
query: 'kube_hpa_status_desired_replicas >= kube_hpa_spec_max_replicas'
severity: warning
- name: Kubernetes Pod not healthy
description: Pod has been in a non-ready state for longer than an hour.
query: 'min_over_time(sum by (namespace, pod, env, stage) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[1h]) > 0'
severity: error
- name: Kubernetes pod crash looping
description: Pod {{ $labels.pod }} is crash looping
query: 'rate(kube_pod_container_status_restarts_total[15m]) * 60 * 5 > 5'
severity: warning
- name: Kubernetes ReplicasSet mismatch
description: Deployment Replicas mismatch
query: 'kube_replicaset_spec_replicas != kube_replicaset_status_ready_replicas'
severity: warning
- name: Kubernetes Deployment replicas mismatch
description: Deployment Replicas mismatch
query: 'kube_deployment_spec_replicas != kube_deployment_status_replicas_available'
severity: warning
- name: Kubernetes StatefulSet replicas mismatch
description: A StatefulSet has not matched the expected number of replicas for longer than 15 minutes.
query: 'kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas'
severity: warning
- name: Kubernetes Deployment generation mismatch
description: A Deployment has failed but has not been rolled back.
query: 'kube_deployment_status_observed_generation != kube_deployment_metadata_generation'
severity: error
- name: Kubernetes StatefulSet generation mismatch
description: A StatefulSet has failed but has not been rolled back.
query: 'kube_statefulset_status_observed_generation != kube_statefulset_metadata_generation'
severity: error
- name: Kubernetes StatefulSet update not rolled out
description: StatefulSet update has not been rolled out.
query: 'max without (revision) (kube_statefulset_status_current_revision unless kube_statefulset_status_update_revision) * (kube_statefulset_replicas != kube_statefulset_status_replicas_updated)'
severity: error
- name: Kubernetes DaemonSet rollout stuck
description: Some Pods of DaemonSet are not scheduled or not ready
query: 'kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0'
severity: error
- name: Kubernetes DaemonSet misscheduled
description: Some DaemonSet Pods are running where they are not supposed to run
query: 'kube_daemonset_status_number_misscheduled > 0'
severity: error
- name: Kubernetes CronJob too long
description: CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.
query: 'time() - kube_cronjob_next_schedule_time > 3600'
severity: warning
- name: Kubernetes job completion
description: Kubernetes Job failed to complete
query: 'kube_job_spec_completions - kube_job_status_succeeded > 0 or kube_job_status_failed > 0'
severity: error
- name: Kubernetes API server errors
description: Kubernetes API server is experiencing high error rate
query: 'sum(rate(apiserver_request_count{job="apiserver",code=~"^(?:5..)$"}[2m])) / sum(rate(apiserver_request_count{job="apiserver"}[2m])) * 100 > 3'
severity: error
- name: Kubernetes API client errors
description: Kubernetes API client is experiencing high error rate
query: '(sum(rate(rest_client_requests_total{code=~"(4|5).."}[2m])) by (instance, job) / sum(rate(rest_client_requests_total[2m])) by (instance, job)) * 100 > 1'
severity: error
- name: Kubernetes client certificate expires next week
description: A client certificate used to authenticate to the apiserver is expiring next week.
query: 'apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 7*24*60*60'
severity: warning
- name: Kubernetes client certificate expires soon
description: A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.
query: 'apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 24*60*60'
severity: error
- name: Nomad - name: Nomad
exporters: exporters:
@ -740,26 +1043,6 @@ services:
query: "histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) > 0.25" query: "histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) > 0.25"
severity: warning severity: warning
- name: Zookeeper
exporters:
- name: cloudflare/kafka_zookeeper_exporter
doc_url: https://github.com/cloudflare/kafka_zookeeper_exporter
rules:
- name: Kafka
exporters:
- name: danielqsj/kafka_exporter
doc_url: https://github.com/danielqsj/kafka_exporter
rules:
- name: Kafka topics replicas
description: Kafka topic in-sync partition
query: "sum(kafka_topic_partition_in_sync_replica) by (topic) < 3"
severity: error
- name: Kafka consumers group
description: Kafka consumers group
query: "sum(kafka_consumergroup_lag) by (consumergroup) > 50"
severity: error
- name: Linkerd - name: Linkerd
exporters: exporters:
- rules: - rules:
@ -768,65 +1051,14 @@ services:
exporters: exporters:
- rules: - rules:
- name: Blackbox
exporters:
- name: prometheus/blackbox_exporter
doc_url: https://github.com/prometheus/blackbox_exporter
rules:
- name: Blackbox probe failed
description: Probe failed
query: probe_success == 0
severity: error
- name: Blackbox slow probe
description: Blackbox probe took more than 1s to complete
query: "avg_over_time(probe_duration_seconds[1m]) > 1"
severity: warning
- name: Blackbox HTTP Status Code
description: HTTP status code is not 200-399
query: "probe_http_status_code <= 199 OR probe_http_status_code >= 400"
severity: error
- name: Blackbox SSL certificate will expire soon
description: SSL certificate expires in 30 days
query: "probe_ssl_earliest_cert_expiry - time() < 86400 * 30"
severity: warning
- name: Blackbox SSL certificate expired
description: SSL certificate has expired already
query: "probe_ssl_earliest_cert_expiry - time() <= 0"
severity: error
- name: Blackbox HTTP slow requests
description: HTTP request took more than 1s
query: "avg_over_time(probe_http_duration_seconds[1m]) > 1"
severity: warning
- name: Blackbox slow ping
description: Blackbox ping took more than 1s
query: "avg_over_time(probe_icmp_duration_seconds[1m]) > 1"
severity: warning
- name: Windows Server - name: Network and storage
services:
- name: ZFS
exporters: exporters:
- name: martinlindhe/wmi_exporter - name: node-exporter
doc_url: https://github.com/martinlindhe/wmi_exporter doc_url: https://github.com/prometheus/node_exporter
rules: rules:
- name: Windows Server collector Error
description: "Collector {{ $labels.collector }} was not successful"
query: "wmi_exporter_collector_success == 0"
severity: error
- name: Windows Server service Status
description: Windows Service state is not OK
query: 'wmi_service_status{status="ok"} != 1'
severity: error
- name: Windows Server CPU Usage
description: CPU Usage is more than 80%
query: '100 - (avg by (instance) (irate(wmi_cpu_time_total{mode="idle"}[2m])) * 100) > 80'
severity: warning
- name: Windows Server memory Usage
description: Memory Usage is more than 90%
query: "100*(wmi_os_physical_memory_free_bytes) / wmi_cs_physical_memory_bytes > 90"
severity: warning
- name: Windows Server disk Space Usage
description: Disk Space on Drive is used more than 80%
query: "100.0 - 100 * ((wmi_logical_disk_free_bytes{} / 1024 / 1024 ) / (wmi_logical_disk_size_bytes{} / 1024 / 1024)) > 80"
severity: error
- name: OpenEBS - name: OpenEBS
exporters: exporters:
@ -876,3 +1108,22 @@ services:
description: Number of CoreDNS panics encountered description: Number of CoreDNS panics encountered
query: "increase(coredns_panic_count_total[10m]) > 0" query: "increase(coredns_panic_count_total[10m]) > 0"
severity: error severity: error
- name: Other
services:
- name: Thanos
exporters:
- rules:
- name: Thanos compaction halted
description: Thanos compaction has failed to run and is now halted.
query: 'thanos_compactor_halted == 1'
severity: error
- name: Thanos compact bucket operation failure
description: Thanos compaction has failing storage operations
query: 'rate(thanos_objstore_bucket_operation_failures_total[1m]) > 0'
severity: error
- name: Thanos compact not run
description: Thanos compaction has not run in 24 hours.
query: '(time() - thanos_objstore_bucket_last_successful_upload_time) > 24*60*60'
severity: error

View file

@ -24,7 +24,20 @@
</h2> </h2>
<ul> <ul>
{% for service in site.data.rules.services %} {% for group in site.data.rules.groups %}
<li style="margin-top: 30px;">
{% assign nbrRules = 0 %}
{% for service in group.services %}
{% for exporter in service.exporters %}
{% for rule in exporter.rules %}
{% assign nbrRules = nbrRules | plus: 1 %}
{% endfor %}
{% endfor %}
{% endfor %}
<h3>{{ group.name }} <small style="margin-left: 20px;">({{ nbrRules }} rules)</small></h3>
<ul>
{% for service in group.services %}
<li> <li>
<a href="/rules#{{ service.name | replace: " ", "-" | downcase }}"> <a href="/rules#{{ service.name | replace: " ", "-" | downcase }}">
{{ service.name }} {{ service.name }}
@ -32,3 +45,6 @@
</li> </li>
{% endfor %} {% endfor %}
</ul> </ul>
</li>
{% endfor %}
</ul>

View file

@ -19,8 +19,11 @@
<br> <br>
<br> <br>
<h1></h1>
<ul> <ul>
{% for service in site.data.rules.services %} {% for group in site.data.rules.groups %}
{% for service in group.services %}
{% assign serviceIndex = forloop.index %} {% assign serviceIndex = forloop.index %}
{% for exporter in service.exporters %} {% for exporter in service.exporters %}
{% assign nbrRules = exporter.rules | size %} {% assign nbrRules = exporter.rules | size %}
@ -28,8 +31,7 @@
<h2 id="{{ service.name | replace: " ", "-" | downcase }}"> <h2 id="{{ service.name | replace: " ", "-" | downcase }}">
{{ serviceIndex }}. {{ serviceIndex }}.
{{ service.name }} {{ service.name }}
{% if exporter.name %} {% if exporter.name %}:
:
{% if exporter.doc_url %} {% if exporter.doc_url %}
<a href="{{ exporter.doc_url }}"> <a href="{{ exporter.doc_url }}">
{{ exporter.name }} {{ exporter.name }}
@ -40,6 +42,9 @@
{% endif %} {% endif %}
{% if nbrRules > 0 %} {% if nbrRules > 0 %}
<small style="font-size: 60%; vertical-align: middle; margin-left: 10px;">
({{ nbrRules }} rules)
</small>
<span class="clipboard-multiple" data-clipboard-target-id="service-{{ serviceIndex }}">[copy all]</span> <span class="clipboard-multiple" data-clipboard-target-id="service-{{ serviceIndex }}">[copy all]</span>
{% endif %} {% endif %}
</h2> </h2>
@ -70,8 +75,7 @@
{% highlight yaml %} {% highlight yaml %}
{% for comment in comments %}# {{ comment | strip }} {% for comment in comments %}# {{ comment | strip }}
{% endfor %} {% endfor %}- alert: {{ ruleNameCamelcase | remove: ' ' }}
- alert: {{ ruleNameCamelcase | remove: ' ' }}
expr: {{ rule.query }} expr: {{ rule.query }}
for: 5m for: 5m
labels: labels:
@ -93,4 +97,5 @@
</li> </li>
{% endfor %} {% endfor %}
{% endfor %} {% endfor %}
{% endfor %}
</ul> </ul>