doc: improve alertmanager.html page for debugging notification delays

This commit is contained in:
Samuel Berthe 2020-12-31 00:26:13 +01:00
parent 3a352d08dc
commit 91023e6ec4
No known key found for this signature in database
GPG key ID: 64863511FFBD0E3C

View file

@ -1,12 +1,20 @@
<h2> <h1 style="text-align: center;">
Prometheus configuration Global configuration
</h2> </h1>
If you notice a delay between an event and the first notification, read the following blog post => [https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html](https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html).
## Prometheus configuration
{% highlight yaml %} {% highlight yaml %}
# prometheus.yml # prometheus.yml
global: global:
scrape_interval: 15s scrape_interval: 20s
# A short evaluation_interval will check alerting rules very often.
# It can be costly if you run Prometheus with 100+ alerts.
evaluation_interval: 20s
... ...
rule_files: rule_files:
@ -35,9 +43,7 @@ groups:
{% endhighlight %} {% endhighlight %}
<h2> ## AlertManager configuration
AlertManager configuration
</h2>
{% highlight yaml %} {% highlight yaml %}
{% raw %} {% raw %}
@ -53,7 +59,7 @@ route:
# When the first notification was sent, wait 'group_interval' to send a batch # When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group. # of new alerts that started firing for that group.
group_interval: 5m group_interval: 30s
# If an alert has successfully been sent, wait 'repeat_interval' to # If an alert has successfully been sent, wait 'repeat_interval' to
# resend them. # resend them.
@ -92,3 +98,14 @@ receivers:
{% endraw %} {% endraw %}
{% endhighlight %} {% endhighlight %}
## Troubleshooting
If the notification takes too much time to be triggered, check the following delays:
- `scrape_interval = 20s` (prometheus.yml)
- `evaluation_interval = 20s` (prometheus.yml)
- `increase(mysql_global_status_slow_queries[1m]) > 0` (alerts/example-mysql.yml)
- `for: 5m` (alerts/example-mysql.yml)
- `group_wait = 10s` (alertmanager.yml)
Also read [https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html](https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html).